🧠 OpenAI HealthBench, 🤖 AlphaEvolve for new algorithms, 🌀 Continuous Thought Machines, 👁️ Aya Vision, 🗂️ OpenMemory MCP and many more news 🚀

AI Connections #51 - a weekly newsletter about interesting blog posts, articles, videos, and podcast episodes about AI

May 16, 2025

NEWS 📚

“OpenAI Introduced HealthBench” - blog post by OpenAI: READ

This blog is about: the launch of HealthBench, a physician-designed benchmark by OpenAI to evaluate AI model performance in real-world healthcare scenarios using 5,000 multilingual medical conversations—aiming to improve safety, reliability, and clinical relevance as AI becomes a critical tool in advancing global health.

“AlphaEvolve: a Gemini-powered coding agent for algorithm discovery” - blog post by Google: READ

This blog post is about: AlphaEvolve, a new AI agent from Google that combines Gemini language models with automated evaluators to evolve algorithms—boosting performance in data centers, chip design, AI model training, and even solving open math problems—by automatically discovering, verifying, and optimizing code with measurable impact across computing.

“Sharing new breakthroughs and artifacts supporting molecular property prediction, language processing, and neuroscience” - blog post by Meta: READ

This blog post is about: Meta FAIR's latest open science releases—including a massive quantum chemistry dataset (OMol25), a universal atomic-scale ML model (UMA), a new reward-driven generative algorithm (Adjoint Sampling), and a neuroscience study decoding how children learn language—advancing Meta’s mission toward Advanced Machine Intelligence (AMI) through open, collaborative research across AI, chemistry, and brain science.

Introducing Continuous Thought Machines - blog post by Sakana AI: READ

This blog post is about: Sakana AI’s release of the Continuous Thought Machine (CTM)—a novel AI model inspired by biological brains that uses synchronized neuron timing for interpretable, step-by-step reasoning, showing enhanced problem-solving and efficiency while bringing AI closer to human-like thought processes.

“Democratizing AI: The Psyche Network Architecture” - blog post by Nous Research Team: READ

This research blog post is about: Psyche, a decentralized AI training platform that coordinates large-scale LLM training across idle hardware worldwide using an efficient optimizer called DisTrO—compressing gradient data via frequency-domain techniques—and launching with a 40B parameter model called Consilience, aiming to democratize AI development outside of centralized corporate control.

“Highly Opinionated Advice on How to Write ML Papers” - blog post by Neel Nanda: READ

This blog post is about: how to write a compelling research paper by crafting a clear, concise narrative—one built around 1–3 specific claims, rigorously supported by evidence, well-contextualized in existing literature, and clearly motivated to show why the insights matter—focusing not just on discovery, but on making others understand, believe, and build upon your work.

“LLM Inference Economics from First Principles”- blog post by Piotr Mazurek and Felix Gabriel: READ

This blog post is about: breaking down the economics of LLM inference, using Llama 3.3–70B as a case study to explain how compute costs, GPU throughput, and model architecture define the cost per generated token—highlighting how these cost structures shape profitability for AI labs and accessibility for end users as we approach more powerful, everyday AI systems.

New Releases 🚀

SB-1 Infinite Soundboard by ElevenLabs—an AI-powered tool that generates sound effects, ambient noise, and drum patterns from text prompts using their new Text-to-SFX model - TRY

New Dance of Optimus

mem0ai released OpenMemory MCP - a private memory for MCP-compatible clients: TRY

Gemini 2.5 Pro Preview Canvas using the Maps API - You can build super cool, visually stunning web apps, with just a single prompt!

Tencent released HunyuanCustom - I turns any photo, voice or video prompt to cinema ready clips

RESEARCH PAPERS 📚

DeepSeek AI Search: Benefits, Technology & Future Trends - AI2sql.io

“Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures”- research paper by DeepSeek: READ

This research paper is about: how DeepSeek-V3 leverages hardware-aware model co-design—including Multi-head Latent Attention, Mixture of Experts, FP8 training, and custom network topology—to overcome hardware bottlenecks in large-scale LLM training and inference, offering a blueprint for future AI infrastructure.

Qwen 3: The new open standard - by Nathan Lambert

“Qwen3 Technical Report” - research paper by Qwen Team: READ

This research paper is about: Qwen3, a new family of large language models ranging up to 235B parameters, which introduces dynamic reasoning modes, a novel "thinking budget" for adaptive inference, and significantly expanded multilingual support—achieving state-of-the-art results across reasoning, coding, and agent tasks while remaining open-source under Apache 2.0.

“Parallel Scaling Law for Language Models” - research paper by Qwen Team: READ

This research paper is about: a new inference-efficient scaling method called ParScale, which boosts model performance by parallelizing computation with minimal memory and latency costs—offering an alternative to traditional parameter or token scaling.

“Aya Vision: Advancing the Frontier of Multilingual Multimodality” - research paper by Cohere Labs: READ

This research paper is about: Aya-Vision, a multilingual multimodal language model that introduces a synthetic annotation framework and a novel cross-modal merging technique to overcome data scarcity, preserve text-only capabilities, and achieve state-of-the-art performance—even outperforming much larger models in vision-language tasks across multiple languages.

“xGen-small Technical Report” - research paper by Salesforce: READ

This research paper is about: xGen-small, a family of 4B and 9B Transformer decoder models optimized for long-context tasks (up to 128k tokens), using a vertically integrated pipeline combining domain-aware data curation, multi-stage pretraining, and advanced post-training methods to achieve strong results—especially in math, coding, and long-context benchmarks.

VIDEO 🎥

USEFUL MATERIAL 📚

AI connections newsletter

Discussion about this post