The Sequence Radar #807: Last Week in AI: From Mega-Rounds to Mathematical Breakthrough
More chinese models releases, Anthropic monster round and some major breakthroughs from DeepMind.
Next Week in The Sequence:
Our series about world models continues diving into JEPA.
AI of the week dives into DeepMind’s amazing Aletheia math research agent.
We have an awesome interview for next week.
The opinion section discusses a new idea: interpretability in post-training.
Subscribe and don’t miss out:
📝 Editorial: Last Week in AI: From Mega-Rounds to Mathematical Breakthrough
The narrative of AI progress reached several new milestones this week, continuing a trend of increased technical specialization and reasoning depth. Rather than a singular pivot, the current moment represents a steady accumulation of capabilities, supported by substantial financial backing and a suite of new model releases. These developments—ranging from Anthropic’s massive funding to the arrival of autonomous math agents—collectively highlight a transition toward agentic systems capable of navigating complex, long-horizon tasks.
The industry’s financial backbone was reinforced by Anthropic’s $30 billion Series G funding round, which valued the firm at $380 billion post-money. This massive capital injection is aimed at fueling frontier research and infrastructure expansions. A primary driver of this growth is Claude Code, which has reached a $2.5 billion annualized revenue run rate. Anthropic is positioning itself as the “enterprise-first” frontier lab, with over 500 customers now spending more than $1 million annually on its platforms.
Simultaneously, OpenAI released GPT-5.3-Codex and its ultra-fast counterpart, GPT-5.3-Codex-Spark. By partnering with Cerebras Systems and running on the Wafer Scale Engine 3 (WSE-3), the Spark model achieves inference speeds exceeding 1,000 tokens per second. Notably, GPT-5.3-Codex is OpenAI’s first model instrumental in creating itself, having been used to debug its own training processes. Internally, 100% of pull requests at OpenAI are now handled by the AI before human review.
A significant challenge to Western dominance arrived on February 11 with the launch of GLM-5 by Zhipu AI. This 744B-parameter Mixture-of-Experts (MoE) model utilizes 40B active parameters and was trained entirely on Huawei Ascend chips, signaling China’s growing independence from U.S. hardware. Released under a permissive MIT license, GLM-5 achieved a 77.8% on SWE-bench Verified, placing it in direct competition with proprietary models like Gemini 3 Pro and approaching the performance of Claude Opus 4.5. This release marks a “generational leap” in open-weight capabilities, particularly in complex systems engineering and long-horizon planning.
Parallel to these scaling efforts, the release of MiniMax M2.5 further underscores the arrival of high-efficiency models designed for agentic workloads. This 230B-parameter MoE model—utilizing only 10B active parameters—achieved a state-of-the-art 80.2% on SWE-Bench Verified. By delivering high throughput at a fraction of the cost of its competitors, M2.5 is positioned as a high-efficiency workhorse for professional tasks in finance and law.
However, the most profound technical milestone arrived from Google DeepMind, which published a framework for professional-grade mathematical research. DeepMind’s new system, Aletheia, powered by an advanced version of Gemini Deep Think, utilizes a “Generator-Verifier-Reviser” agentic harness. This architecture allows the model to recognize and correct its own hallucinations through internal natural language verification.
Aletheia achieved a 95.1% accuracy on the advanced IMO-Proof Bench and has begun to bridge the gap into professional research, autonomously generating a paper on “eigenweights” in arithmetic geometry (Feng26) and solving four open questions from the Erdős Conjectures database. Despite these leaps, DeepMind’s new “Autonomous Mathematics Research Level” taxonomy clarifies that while AI can now produce “Publication Grade” research (Level 2), it has yet to achieve a landmark breakthrough (Level 4) on par with once-in-a-generation human achievements. We are entering an era where AI is an “enhancer rather than a replacement,” leveraging inference-time scaling to unlock scientific problems that have long remained out of reach.
🔎 AI Research
How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs
AI Lab: Allen Institute for AI (Ai2)
Summary: The authors introduce a scalable framework to mine 351K real-world procedures from the web to improve and evaluate how-to generation in large language models. It features “How2Score,” an LLM-as-a-judge protocol that identifies critical failures in generated steps, achieving 80.5% agreement with human annotators.
CausalArmor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution
AI Lab: Google Cloud AI Research
Summary: This paper proposes a selective defense framework to protect AI agents from Indirect Prompt Injection (IPI) attacks by detecting “dominance shifts” in causal attribution. By triggering sanitization only when untrusted content dominates user intent, CausalArmor maintains near-zero attack success rates while preserving the utility and low latency of the agent.
AgenticPay: A Multi-Agent LLM Negotiation System for Buyer-Seller Transactions
AI Lab: SafeRL-Lab (UC Berkeley)
Summary: AgenticPay is a benchmark and simulation framework designed to evaluate how autonomous LLM agents handle multi-round, language-mediated economic negotiations between buyers and sellers. Testing reveals significant performance gaps between proprietary and open-weight models, specifically highlighting challenges in long-horizon strategic reasoning and systematic disadvantages for buyer roles.
Towards Autonomous Mathematics Research
AI Lab: Google DeepMind
Summary: Researchers introduce “Aletheia,” a math research agent powered by Gemini Deep Think that iteratively generates, verifies, and revises mathematical proofs in natural language. The agent has achieved several milestones, including the autonomous generation of a research paper and the resolution of several long-standing open problems from the Erdős Conjectures database.
Voxtral Realtime: High-Fidelity Streaming Speech Recognition with Sub-Second Latency
AI Lab: Mistral AI
Summary: The authors present Voxtral Realtime, a natively streaming automatic speech recognition model that achieves offline transcription quality with a latency of only 480ms. By using a Delayed Streams Modeling framework and a new causal audio encoder, the model matches the performance of top-tier offline systems like Whisper while remaining open-source under the Apache 2.0 license.
GAIA2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments
AI Lab: Meta SuperIntelligence Labs
Summary: This paper introduces GAIA2, a benchmark designed to evaluate AI agents in realistic, asynchronous environments where conditions evolve independently of the agent’s actions. The evaluation reveals that even state-of-the-art models like GPT-5 struggle with these temporal constraints and noisy events, with the highest overall success rate reaching only 42%.
🤖 AI Tech Releases
GPT-5.3-Codex-Spark
OpenAI released a preview version of GPT-5.3-Codex-Spark, a version of GPT-5.3 Codex optimized for Cerebras hardware.
GLM-5
Z.ai released GLM-5, the new version of its marquee agentic model.
MiniMax-M2.5
MiniMax released M2.5, a new model optimized to work on complex agentic environments.
📡AI Radar
Cohere Revenue & IPO: Cohere has reportedly surpassed its 2025 revenue targets and is reaching a $240 million run rate as it prepares for a potential 2026 IPO.
Anthropic Series G: Anthropic officially announced a record-breaking $30 billion Series G round led by GIC and Coatue, valuing the company at $380 billion post-money.
Modal Labs Valuation: AI inference startup Modal Labs is in early discussions to raise fresh capital at a $2.5 billion valuation, more than doubling its previous mark in five months.
Meridian AI Seed: Meridian emerged from stealth with a $17 million seed round to build an IDE-based “agentic spreadsheet” workspace for deterministic financial modeling.
Vega Series B: Cybersecurity firm Vega secured $120 million in Series B funding to scale its autonomous threat detection and response platform for enterprises.
GitHub CEO New Round: Former GitHub CEO Thomas Dohmke raised a historic $60 million seed round at a $300 million valuation for a new startup focused on autonomous software engineering agents.
Runway Series D/E: AI video pioneer Runway raised $315 million at a $5.3 billion valuation to accelerate the development of cinematic “world models” for creative professionals.
Crypto.com AI Domain: Crypto.com CEO Kris Marszalek acquired the ai.com domain for $70 million to launch a consumer platform for creating personalized autonomous AI agents.
Benchmark & Cerebras: Benchmark Capital raised $225 million through special-purpose vehicles to lead a $1 billion investment into AI chipmaker Cerebras Systems at a $23 billion valuation.
Shield AI Talks: Defense technology firm Shield AI is in negotiations to raise up to $1 billion in new financing that would value the autonomous pilot developer at $12 billion.
Tencent Stock Slump: Tencent’s market value dropped by $173 billion following a stock rout as investors grew concerned over the company’s AI progress relative to domestic rivals.
Nvidia Data Center: Nvidia is moving to lease a 200-megawatt Nevada data center financed by a $3.8 billion junk-bond issuance backed by Tract Capital.
nScale Financing: GPU cloud provider nScale secured a $1.4 billion deferred draw term loan from PIMCO and Blue Owl Capital to expand its European AI infrastructure footprint.
Simile Stealth Launch: Stanford-affiliated startup Simile raised $100 million to build foundation models capable of simulating and predicting human behavior for enterprise decision-making.
Mistral Infrastructure: Mistral AI announced a €1.2 billion investment to build a major AI data center in Sweden to strengthen European technological sovereignty.

