π§ OpenAI HealthBench, π€ AlphaEvolve for new algorithms, π Continuous Thought Machines, ποΈ Aya Vision, ποΈ OpenMemory MCP and many more news π
AI Connections #51 - a weekly newsletter about interesting blog posts, articles, videos, and podcast episodes about AI
NEWS π
βOpenAI Introduced HealthBenchβ - blog post by OpenAI: READ
This blog is about: the launch of HealthBench, a physician-designed benchmark by OpenAI to evaluate AI model performance in real-world healthcare scenarios using 5,000 multilingual medical conversationsβaiming to improve safety, reliability, and clinical relevance as AI becomes a critical tool in advancing global health.
βAlphaEvolve: a Gemini-powered coding agent for algorithm discoveryβ - blog post by Google: READ
This blog post is about: AlphaEvolve, a new AI agent from Google that combines Gemini language models with automated evaluators to evolve algorithmsβboosting performance in data centers, chip design, AI model training, and even solving open math problemsβby automatically discovering, verifying, and optimizing code with measurable impact across computing.
βSharing new breakthroughs and artifacts supporting molecular property prediction, language processing, and neuroscienceβ - blog post by Meta: READ
This blog post is about: Meta FAIR's latest open science releasesβincluding a massive quantum chemistry dataset (OMol25), a universal atomic-scale ML model (UMA), a new reward-driven generative algorithm (Adjoint Sampling), and a neuroscience study decoding how children learn languageβadvancing Metaβs mission toward Advanced Machine Intelligence (AMI) through open, collaborative research across AI, chemistry, and brain science.
Introducing Continuous Thought Machines - blog post by Sakana AI: READ
This blog post is about: Sakana AIβs release of the Continuous Thought Machine (CTM)βa novel AI model inspired by biological brains that uses synchronized neuron timing for interpretable, step-by-step reasoning, showing enhanced problem-solving and efficiency while bringing AI closer to human-like thought processes.
βDemocratizing AI: The Psyche Network Architectureβ - blog post by Nous Research Team: READ
This research blog post is about: Psyche, a decentralized AI training platform that coordinates large-scale LLM training across idle hardware worldwide using an efficient optimizer called DisTrOβcompressing gradient data via frequency-domain techniquesβand launching with a 40B parameter model called Consilience, aiming to democratize AI development outside of centralized corporate control.
βHighly Opinionated Advice on How to Write ML Papersβ - blog post by Neel Nanda: READ
This blog post is about: how to write a compelling research paper by crafting a clear, concise narrativeβone built around 1β3 specific claims, rigorously supported by evidence, well-contextualized in existing literature, and clearly motivated to show why the insights matterβfocusing not just on discovery, but on making others understand, believe, and build upon your work.
βLLM Inference Economics from First Principlesβ- blog post by Piotr Mazurek and Felix Gabriel: READ
This blog post is about: breaking down the economics of LLM inference, using Llama 3.3β70B as a case study to explain how compute costs, GPU throughput, and model architecture define the cost per generated tokenβhighlighting how these cost structures shape profitability for AI labs and accessibility for end users as we approach more powerful, everyday AI systems.
New Releases π
SB-1 Infinite Soundboard by ElevenLabsβan AI-powered tool that generates sound effects, ambient noise, and drum patterns from text prompts using their new Text-to-SFX model - TRY
New Dance of Optimus
mem0ai released OpenMemory MCP - a private memory for MCP-compatible clients: TRY
Gemini 2.5 Pro Preview Canvas using the Maps API - You can build super cool, visually stunning web apps, with just a single prompt!
Tencent released HunyuanCustom - I turns any photo, voice or video prompt to cinema ready clips
RESEARCH PAPERS π
βInsights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architecturesβ- research paper by DeepSeek: READ
This research paper is about: how DeepSeek-V3 leverages hardware-aware model co-designβincluding Multi-head Latent Attention, Mixture of Experts, FP8 training, and custom network topologyβto overcome hardware bottlenecks in large-scale LLM training and inference, offering a blueprint for future AI infrastructure.
βQwen3 Technical Reportβ - research paper by Qwen Team: READ
This research paper is about: Qwen3, a new family of large language models ranging up to 235B parameters, which introduces dynamic reasoning modes, a novel "thinking budget" for adaptive inference, and significantly expanded multilingual supportβachieving state-of-the-art results across reasoning, coding, and agent tasks while remaining open-source under Apache 2.0.
βParallel Scaling Law for Language Modelsβ - research paper by Qwen Team: READ
This research paper is about: a new inference-efficient scaling method called ParScale, which boosts model performance by parallelizing computation with minimal memory and latency costsβoffering an alternative to traditional parameter or token scaling.
βAya Vision: Advancing the Frontier of Multilingual Multimodalityβ - research paper by Cohere Labs: READ
This research paper is about: Aya-Vision, a multilingual multimodal language model that introduces a synthetic annotation framework and a novel cross-modal merging technique to overcome data scarcity, preserve text-only capabilities, and achieve state-of-the-art performanceβeven outperforming much larger models in vision-language tasks across multiple languages.
βxGen-small Technical Reportβ - research paper by Salesforce: READ
This research paper is about: xGen-small, a family of 4B and 9B Transformer decoder models optimized for long-context tasks (up to 128k tokens), using a vertically integrated pipeline combining domain-aware data curation, multi-stage pretraining, and advanced post-training methods to achieve strong resultsβespecially in math, coding, and long-context benchmarks.