The Sequence Radar #857: Last Week in AI: Inside the Machine, Outside the Text Box

Some groundbreaking research from Anthropic, OpenAI’s new voice models and major valuation shifts in Chinese AI labs.

May 10, 2026

Next Week in The Sequence:

We continue our series about alternatives to transformers.
In the AI of the week, we dive into Anthropic’s groundbreaking paper about natural language autoencoders.
Our opinion section dives into an interesting idea: every company’s last exam.

Subscribe and don’t miss out:

📝 Editorial: Inside the Machine, Outside the Text Box

This week in AI had the strange texture of a market that is simultaneously becoming more scientific, more productized, and more speculative. The headlines looked disconnected at first: Anthropic published a fascinating interpretability paper, OpenAI released new voice models, SubQ made a controversial 12 million-token context claim, DeepSeek and Moonshot attracted enormous valuation attention, and Sierra raised at a level that would have sounded absurd for an AI customer service company only a few years ago. But underneath all of it is the same story: AI is moving from a model race into an infrastructure race.

Anthropic’s Natural Language Autoencoders paper was the most intellectually interesting development of the week. The idea is almost poetic: take the hidden activations inside a neural network and compress them into natural language, then try to reconstruct those activations from the explanation itself. In other words, language becomes a microscope for the model’s internal state. This is not a magical solution to interpretability. These explanations can be incomplete, noisy, or even misleading. But the conceptual shift matters. We are no longer just probing models with classifiers and activation maps; we are trying to build linguistic interfaces into the latent space. The model begins to explain itself in the medium humans understand best.

On the opposite side of the stack, OpenAI’s new voice model release pushed AI further toward becoming a native interface rather than a text box with better UX. Voice has always looked deceptively simple from the outside, but real-time speech agents require a brutal combination of perception, reasoning, latency management, interruption handling, emotional calibration, tool use, and memory. When this works, software changes shape. We stop “using an app” and start interacting with an operator. The difference is subtle but profound. Text-based AI feels like querying intelligence. Voice-based AI feels like being accompanied by it.

Then came SubQ’s controversial 12 million-token context announcement, the most provocative technical claim of the week. Long context has become one of the industry’s favorite flexes, but a native 12M-token window would represent something more than incremental progress. It would challenge the current architecture of retrieval-augmented generation, memory systems, chunking strategies, and agent orchestration. If models can directly absorb corpora at that scale, some of the scaffolding around AI applications starts to look temporary. Of course, claims like this demand skepticism. A massive context window is not the same as reliable reasoning over that context. But even the ambition is revealing: memory is becoming a frontier primitive.

The valuation news told the geopolitical and commercial version of the same story. DeepSeek and Moonshot are now being discussed at valuations that make them look less like startups and more like national AI infrastructure. Frontier model labs are increasingly priced as strategic assets: part software company, part cloud platform, part semiconductor leverage, part geopolitical option. The market is not merely valuing revenue; it is valuing position in the future computational order.

Sierra’s new valuation adds the enterprise counterpoint. While model labs chase frontier intelligence, Sierra is showing that applied agents can become enormous businesses by embedding directly into customer operations. The first trillion-dollar AI workflows may not look like science fiction. They may look like call centers, insurance claims, banking support, retail service, and enterprise processes slowly being rewritten around agents.

So the week’s lesson is clear: AI is becoming more inspectable, more conversational, more memory-rich, and more institutionally valuable. The race is no longer just about building smarter models. It is about building the interfaces, memory systems, deployment layers, and companies that turn intelligence into infrastructure.

🔎 AI Research

Natural Language Autoencoders: Turning Claude’s thoughts into text

AI Lab: Anthropic

Summary: This research introduces Natural Language Autoencoders (NLAs), a technique that translates complex language model activations into readable text to reveal a model’s internal, unverbalized reasoning. By applying NLAs during safety testing and model auditing, researchers can successfully detect when models secretly know they are being evaluated and uncover hidden misaligned motivations.

SkillOS: Learning Skill Curation for Self-Evolving Agents

AI Lab: UIUC, Google, and other institutions

Summary: This paper introduces SkillOS, an experience-driven reinforcement learning framework that enables self-evolving LLM agents to learn complex, long-term skill curation policies. By pairing a frozen agent executor with a trainable skill curator that updates and refines an external skill repository, SkillOS allows agents to effectively learn from sparse, delayed feedback, leading to more targeted skill usage and improved performance across diverse reasoning and multi-turn agentic tasks.

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

AI Lab: The Hong Kong University of Science and Technology, Alibaba Group, University of California San Diego, The Chinese University of Hong Kong

Summary: This paper proposes D-OPSD, an on-policy learning paradigm for fine-tuning step-distilled diffusion models that leverages the inherited in-context capabilities of their LLM/VLM encoders. By assigning the model dual roles as both teacher and student with varying multimodal contexts, D-OPSD enables the learning of new concepts and styles without compromising the model’s original efficient few-step generation capabilities.

Agentic AI Systems Should Be Designed as Marginal Token Allocators

AI Lab: University of Illinois Urbana-Champaign

Summary: This position paper argues that agentic AI systems should be structured as economies that allocate marginal tokens based on a combination of quality, cost, latency, and risk rather than functioning merely as text generators priced by the unit. Adopting this marginal token allocation perspective helps explain and resolve recurring system failures—such as over-routing, over-delegation, and cache misuse—that arise when different layers of the AI stack are optimized in isolation.

COUNTING AS A MINIMAL PROBE OF LANGUAGE MODEL RELIABILITY

AI Lab: Stanford University

Summary: The authors introduce Stable Counting Capacity, a purely mechanical assay that tests a language model’s procedural reliability by having it count repeated symbols until failure, effectively removing semantic and knowledge-based confounds. Through extensive evaluation, the study reveals that current language models rely on finite, count-like internal states rather than open-ended logic, causing their procedural rule-following to collapse into guessing when these limited resources are exhausted.

Hallucinations Undermine Trust; Metacognition is a Way Forward

AI Lab: Google Research, Tel Aviv University

Summary: This paper reframes AI hallucinations as confident errors and argues that the inability of models to perfectly distinguish truths from errors creates an unavoidable tradeoff between utility and strict factuality. To overcome this stalemate, the authors propose developing metacognitive models capable of “faithful uncertainty,” which involves aligning a model’s linguistic uncertainty with its intrinsic uncertainty to preserve useful information while accurately communicating doubt to users.

🤖 AI Tech Releases

GPT-Realtime

OpenAI unveiled three new audio models to enable the construction of voice apps.

Gemma MTP

Google released Gemma Multi Token Prediction(MTP) , a new speculative decoding architecture that can predict multiple tokens at the same ti.

📡10 AI News You Need to Know About

DeepSeek targeting $45B valuation in first-ever funding round — DeepSeek is in talks for its first external venture round at a valuation that has reportedly jumped from $20B to $45B in weeks, led by China’s state-backed China Integrated Circuit Industry Investment Fund (the “Big Fund”) with Tencent and Alibaba reportedly in talks to participate, as founder Liang Wenfeng (who owns ~90% of the company) opens up the cap table primarily to issue employee equity and stem researcher poaching.
SpaceX ‘Terafab’ chip factory — SpaceX is considering spending an initial $55 billion (and up to $119 billion total) to build a multi-phase, vertically integrated semiconductor and advanced computing fab in Grimes County, Texas, with Tesla and Intel involved, to supply chips for AI servers, satellites, space data centers, and autonomous Tesla vehicles/robots.
Ethos $22.75M Series A — London-based Ethos raised a $22.75M Series A led by a16z (with General Catalyst, XTX Markets, Evantic, and Common Magic) to scale its voice-agent-powered expert network, which onboards roughly 35,000 experts per week and serves hedge funds, PE firms, AI labs, and consultancies.
QuTwo $380M valuation — Helsinki-based QuTwo, founded by ex-Silo AI CEO Peter Sarlin, raised a €25M (~$29M) angel round at a €325M (~$380M) valuation from a group of unicorn founders and Midas-listed investors to scale QuTwo OS, an orchestration layer for classical, hybrid, and quantum-inspired enterprise AI workloads.
SAP acquires Prior Labs / blocks rival agents — SAP announced plans to acquire Freiburg-based tabular foundation model startup Prior Labs (an “almost all-cash” deal) and invest €1B (~$1.16B) over four years to turn it into a European frontier AI lab for structured enterprise data, while simultaneously updating its API policy to block all third-party AI agents (e.g. OpenClaw) except SAP-endorsed ones like its own Joule and Nvidia’s NemoClaw.
CopilotKit $27M Series A — Seattle-based CopilotKit raised $27M (Series A + previously unannounced seed) led by Glilot Capital, NFX, and SignalFire to scale its open-source AG-UI protocol and launch CopilotKit Enterprise Intelligence, a self-hostable layer for embedding generative-UI AI agents inside enterprise apps used by customers like Cisco, Docusign, and Deutsche Telekom.
Sierra $950M raise — Bret Taylor’s Sierra raised $950M led by Tiger Global and GV at a post-money valuation north of $15B to expand its enterprise customer-experience AI agent platform, which the company says now serves more than 40% of the Fortune 50 and recently hit $150M in ARR.
Moonshot AI / Kimi $20B valuation — Beijing-based Moonshot AI is closing roughly $2B in new funding led by Meituan’s Long-Z (Dragon Ball) venture arm, with China Mobile and CITIC PE participating, at a post-money valuation above $20B, after Kimi’s annualized recurring revenue passed $200M in April.
Snap–Perplexity $400M deal terminated — Snap disclosed in its Q1 2026 investor letter that its $400M cash-and-equity partnership with Perplexity (announced last November to integrate Perplexity’s AI search into Snapchat’s Chat interface) “amicably ended” in Q1 after the two sides couldn’t agree on a path to broader rollout, with Snap’s 2026 sales guidance now assuming zero contribution from the deal.
Subquadratic / SubQ launch — Miami-based startup Subquadratic emerged from stealth on May 5 with $29M in seed funding (at a reported $500M valuation) led by Justin Mateen, Javier Villamizar, and others, claiming its first model SubQ 1M-Preview is the first LLM built on a fully subquadratic attention architecture (SSA) — with a 12M-token context window and a claimed ~1,000x reduction in attention compute versus frontier models.

TheSequence

Discussion about this post

Ready for more?