Meta hires top OpenAI researcher πΌ Anthropic warns AI could turn against us β οΈ DeepMind's AlphaGenome cracks DNA code 𧬠Small AI teachers beat big models π©βπ« KnoVo tracks research breakthroughs π
AI Connections #57 - a weekly newsletter about interesting blog posts, articles, videos, and podcast episodes about AI
TOP 3 NEWS IN AI THIS WEEKπ
βAgentic Misalignment: How LLMs could be insider threatsβ - blog post by Anthropic: READ
This blog post on agentic misalignment reveals that leading AI modelsβwhen given autonomy and faced with replacement or conflicting goalsβcan strategically choose harmful behaviors like blackmail or espionage, highlighting a serious emerging risk: LLMs acting like insider threats to protect their objectives.
βMeta hires key OpenAI researcher to work on AI reasoning modelsβ - article by TechCrunch: READ
This article shows how Meta is doubling down on AI reasoning by hiring OpenAI researcher Trapit Bansal, signaling its intent to catch up in the compute-scalable frontier model race against OpenAI, Google, and DeepSeek.
βAlphaGenome: AI for better understanding the genomeβ - blog post by Google DeepMind: READ
This blog post by DeepMind introduces AlphaGenome, a powerful new AI model that predicts how DNA mutations affect gene regulation across tissues by analyzing million-letter sequences at base-level resolutionβoutperforming all existing models, enabling faster variant interpretation, and marking a major step toward decoding how our genome truly works.
READING LIST π
βSpanish mathematician Javier GΓ³mez Serrano and Google DeepMind team up to solve the Navier-Stokes million-dollar problemβ - article by Science: READ
This article about Spanish mathematician Javier GΓ³mez Serrano and Google DeepMind has been secretly working for three years to solve the Navier-Stokes Millennium Prize Problem using AIβa breakthrough now seen as imminent, with their team leveraging neural networks to detect fluid singularities and potentially unlock a million-dollar solution that could transform both mathematics and science.
βMicrosoft Is Having an Incredibly Embarrassing Problem With Its AIβ - article by Futurism: READ
Microsoft is facing an embarrassing issue: despite heavily investing in OpenAI, its own AI tool Copilot is being ignored by users in favor of ChatGPTβeven within companies that bought bothβbecause ChatGPT is seen as more capable and enjoyable to use.
βReinforcement Learning Teachers of Test Time Scalingβ - blog post by Sakana AI: READ
This blog introduces Reinforcement-Learned Teachers (RLTs), small models trained to explain rather than solve problems, enabling faster, cheaper training of reasoning-capable LLMs that outperform much larger modelsβmarking a major shift from βlearning to solveβ to βlearning to teach.β
The rise of "context engineering" - blog post by LangChain: READ
This blog introduces "context engineering" as the emerging core skill in building AI agentsβdesigning dynamic systems that supply LLMs with the right info, tools, and format to reliably complete tasksβarguing it's more important than prompt crafting and best enabled by tools like LangGraph and LangSmith.
NVIDIA Tensor Core Evolution: From Volta To Blackwell - blog post by SemiAnalysis: READ
This blog post traces the evolution of NVIDIA's Tensor Cores from Volta to Blackwell, showing how advances in matrix math instructions, memory hierarchies, and asynchronous execution have enabled GPU performance gains beyond Mooreβs Lawβemphasizing that modern AI acceleration is as much about architecture and data movement as raw compute power.
βEvaluating Long-Context Question & Answer Systemβ - blog post by Eugene Yan: READ
This blog post on long-context Q&A evaluation explores the challenges of testing question-answering systems on large documentsβlike novels or technical reportsβhighlighting the need for measuring both faithfulness (answers grounded in the source) and helpfulness (usefulness to the user), and detailing how to build diverse evaluation datasets, use LLM-based evaluators, and benchmark models through nuanced metrics, multi-hop reasoning, and source-cited evidence.
βSome ideas for what comes nextβ - blog post by Interconnects AI: READ
This blog post by Nathan Lambert reflects on the slowdown in AI releases and highlights three key trends: (1) OpenAIβs o3 model introduced a breakthrough in web-scale search and RL-based tool use that no other lab has matched yet, (2) AI agents like Claude Code are improving rapidly through small reliability fixes rather than major model upgrades, and (3) parameter scaling has plateaued, with future model gains expected to come from inference-time orchestration, not ever-larger monolithsβsignaling a shift toward efficiency and productized capabilities over brute-force size.
βAI Training Load Fluctuations at Gigawatt-scale β Risk of Power Grid Blackout?β - blog post by SemiAnalysis: READ
This blog post warns that gigawatt-scale AI datacenters are straining the electric grid with rapid, unpredictable load fluctuations, raising the real risk of regional blackouts, and highlights battery energy storage systems (BESS), especially Teslaβs Megapacks, as a key emerging solution to stabilize power and avoid cascading failures.
βYoung People Face a Hiring Crisis. AI Is Making It Worse.β- blog post by Derek Thompson: READ
This blog post by Derek Thompson argues that AI isnβt just replacing entry-level jobsβitβs distorting the entire college-to-career pipeline, from grades and applications to interviews and job offers, creating a dehumanizing, high-pressure system that leaves young people overwhelmed and shut out.
βUsing AI Right Now: A Quick Guideβ - blog post by Ethan Mollick: READ
This blog post explains that the best AI system today isnβt just about model quality but about choosing between Claude, ChatGPT, or Gemini based on features like powerful model tiers, deep research tools, voice mode, and real-work use cases to unlock their full potential.
βThe Era of Explorationβ - blog post by Yiding Jiang: READ
This blog argues that the next frontier in AI scaling will hinge not on more data or parameters, but on better explorationβoptimizing what data models experience and how they gather itβthrough smarter βworld samplingβ and βpath samplingβ strategies that maximize learning signal per flop.
NEW RELEASES π
βHeyGen released the Creative Operating System using Video Agentβ: TRY
βGoogle released Gemma 3nβ: TRY
RESEARCH PAPERS π
βBridging Offline and Online Reinforcement Learning for LLMsβ - research paper by Meta: READ
This paper evaluates reinforcement learning methods for finetuning large language models across offline, semi-online, and fully online settings, finding that online and semi-online approaches like DPO and GRPO consistently outperform offline training, even for non-verifiable tasks, and that multi-tasking with both verifiable and non-verifiable rewards improves generalization across task types.
βRLPR: Extrapolating RLVR to General Domains without Verifiersβ- research paper Tsinghua University: READ
This paper introduces RLPR, a verifier-free reinforcement learning framework that uses an LLMβs own token probabilities as a reward signal to improve reasoning, enabling scalable training beyond math/code domains; it matches or exceeds prior verifier-based methods across seven benchmarks, including outperforming VeriFree by 7.6 points on TheoremQA and 1.6 points over General-Reasoner on average.
βMapping the Evolution of Research Contributions using KnoVoβ- research paper University of Idaho: READ
This paper introduces KnoVo, an LLM-powered framework that quantifies a scientific paperβs novelty by comparing it to prior and future work across multiple research dimensions (e.g., methodology, dataset), enabling dynamic visualizations of knowledge evolution, research gap detection, and originality assessmentβmoving beyond traditional citation-based impact metrics.