Qwen-3 🧠, ChatGPT Shopping feature 🛍️, The Leaderboard Illusion 🏆, Visa launches AI agents for shopping 🛒, Sleep-time Compute 💤

AI Connection is back! AI Connections #49 - a weekly newsletter about interesting blog posts, articles, videos, and podcast episodes about AI

May 01, 2025

NEWS 📚

“Qwen3: Think Deeper, Act Faster” - blog post by Qwen Team: READ

Qwen3 is the latest open-weight large language model series from the Qwen team, featuring both dense and Mixture-of-Experts models like Qwen3-235B-A22B, which rival top-tier models in coding, math, and reasoning. It introduces hybrid “thinking” and “non-thinking” modes for flexible inference, supports 119 languages, and significantly improves agentic capabilities and efficiency. Trained on 36 trillion tokens and available under Apache 2.0, Qwen3 models are easy to deploy using tools like Hugging Face, vLLM, and Qwen-Agent.

Google CEO Sundar Pichai's security cost the company over Rs 67 crore in 2024 - India Today

Google Q3 earnings call: CEO’s remarks about AI - blog post by Google: READ

Alphabet had a strong Q3 driven by rapid AI innovation, with its full-stack AI approach—spanning infrastructure, research, and global product reach—powering major product launches and operational efficiencies, including a 90% reduction in AI Overview costs and widespread adoption of Gemini models. Google Cloud revenue grew 35% YoY to $11.4B, fueled by demand for its AI infrastructure and platforms, while YouTube surpassed $50B in annual ad and subscription revenue and Waymo became the first autonomous vehicle company to exceed 1 million fully autonomous miles driven weekly.

ChatGPT now can help you shop

ChatGPT is rolling out new shopping features to help users find, compare, and buy products more easily, including improved product results, visuals with pricing and reviews, and direct purchase links. These features are not ads and are being gradually released to all user tiers, including Plus, Pro, Free, and logged-out users.

CEO Speaker Series With Dario Amodei of Anthropic | Council on Foreign Relations

The urgency of AI interpretability – blog post by Dario Amodei (Anthropic): READ

Dario Amodei warns that AI interpretability is not keeping pace with rapidly advancing capabilities and calls for urgent investment to avoid dangerous blind spots. He highlights progress in mechanistic interpretability—mapping features and reasoning circuits in models like Claude—as a path to building an “MRI for AI” before highly autonomous systems emerge by 2026–2027.

Table showing percentages of different patterns of AI use for software and non-software applications.

AI’s disruption of software jobs: new insights from Anthropic’s Economic Index – blog post by Anthropic: READ

Anthropic’s analysis shows developers increasingly use Claude—especially Claude Code—for automating coding tasks, with UI and web app work most affected. Startups lead adoption, and the trend may accelerate AI progress while reshaping software roles.

The Mechanics of Mafia – blog post by Peter Thiel: READ

Peter Thiel reflects on building the "PayPal Mafia" and argues that great company culture isn’t built with perks but with deep alignment on mission and team. He emphasizes hiring people who genuinely want to work together on a unique problem, assigning each person one clear responsibility, and fostering strong internal bonds that resemble cult-like dedication—minus the crazy. The best startups, he says, aren't collections of talent but tightly knit tribes, fanatically right about something the world has missed.

Visa launches Intelligent Commerce: AI agents that shop for you

Visa unveiled Intelligent Commerce, a new AI-powered system of agents that can autonomously discover, shop, and buy on behalf of consumers—handling everything from product discovery to post-purchase support. The goal is to create a more personalized and secure shopping experience by streamlining the entire consumer journey with intelligent automation.

The Always-On Economy: AI and the Next 5-7 Years | Sequoia Capital

The Always-On Economy – blog post by Sequoia Capital: READ

In the next 5–7 years, AI won’t just automate tasks—it will eliminate time constraints, ushering in an “always-on” economy where sectors like healthcare, security, education, and customer service operate 24/7. Hybrid human/AI systems will enhance access, efficiency, and global competition, with startups already leading the shift in areas like diagnostics, documentation, and support. Buhler argues this transition will redefine work patterns and business models, giving a massive edge to organizations that embrace continuous, AI-powered operations.

RESEARCH PAPERS 📚

“Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory”- research paper by the Mem0 AI team: READ

Mem0 is a scalable memory architecture that enables LLMs to maintain long-term conversational coherence by dynamically extracting and retrieving key information, with a graph-based variant capturing complex relational structures. It outperforms six major baselines on the LOCOMO benchmark while reducing latency and token costs by over 90%, making it both more accurate and efficient for multi-session dialogue.

“Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning” - research paper by University of California: READ

Researchers introduced MINDcraft, a platform for testing how LLM agents collaborate in the open-world game Minecraft, and MineCollab, a benchmark to evaluate embodied, multi-agent reasoning. Experiments show that current LLMs struggle with collaborative tasks, especially due to inefficiencies in natural language communication—causing up to a 15% performance drop. The study highlights that today's LLM agents are not well-optimized for embodied collaboration and require approaches beyond in-context or imitation learning.

The Leaderboard Illusion – research paper by Cohere Labs: READ

Cohere Labs reveals flaws in Chatbot Arena, showing that private testing practices and selective score disclosures by major providers like Meta, Google, and OpenAI distort leaderboard fairness. Closed models are sampled more often and retain Arena presence longer, giving them disproportionate data access—leading to overfitting on Arena-specific dynamics rather than true model quality. The report calls for reforms to promote transparency and fairness in AI benchmarking.

Welcome to the Era of Experience – research paper by David Silver & Richard Sutton: READ

Silver and Sutton propose a shift in AI toward agents that learn primarily from experience, marking the dawn of a new era of superhuman capability. Rather than relying on static data, these next-gen systems will develop intelligence through interaction and continual learning, echoing the way humans learn over time.

AI connections newsletter

Discussion about this post