The Sequence Radar #759: Grok 4.1, Gemini 3 Pro and the Agentic Stack, Plus a Personal Note

A great week in model releases.

Nov 23, 2025

Next Week in The Sequence:

We contiune our series about synthetic data generation with an exploration of the current types of generative synthesis. In the AI of the week, we will dive into teh amazing Olmo 3 stack. In our opinion section we are going to dive into the state of open source AI.

Subscribe and don’t miss out:

📝 Editorial: Last Week in AI: Grok 4.1, Gemini 3 Pro and the Agentic Stack

Today I’d like to start with a personal note before diving into this week’s AI developments. Two years ago, I co-founded a company called NeuralFabric to pretrain, post-train, distill and fine-tune small frontier models. We were convinced that smaller, efficient models are a critical ingredient for embodied AI, mobile and IoT workloads, and many enterprise AI workflows that will never run a 400B-parameter beast in production. We built a state-of-the-art platform for training, distillation, and advanced adaptation techniques for these models—and fairly quickly, to our surprise, NeuralFabric began attracting serious attention from large enterprise players. Last week, Cisco completed the acquisition of NeuralFabric and announced its intention to use our technology as part of its next-generation enterprise AI capabilities. For me, this journey has been an incredible learning experience about the real state of the AI market.

It’s also what makes this newsletter special: much of what you read here is grounded in building and deploying AI systems in the wild, not just reciting papers or headlines.

With that context in mind, let’s turn to the broader developments of the week.

This week in AI felt less like a batch of model drops and more like a glimpse of the emerging agentic stack: brains, tools, visuals, and a working environment all snapping into place.

On the “brain” side, xAI pushed Grok 4.1, a focused upgrade that leans hard into usability rather than just leaderboard flexing. The new release is positioned as a better “everyday” model: stronger reasoning, more consistent long-form writing, and a noticeable reduction in off-the-rails hallucinations. It ships with both a more deliberate “thinking” mode and a faster interactive mode, signaling that Grok isn’t just a quirky side project anymore but something you can actually plug into production workflows.

Google answered on the flagship front with Gemini 3 Pro, designed as a general-purpose reasoning and agentic coding model. It’s natively multimodal and built for long-context work: full repos, multi-document analysis, and sessions that blend text, images, and other media. Under the hood, Gemini 3 Pro is clearly optimized around tool use and multi-step workflows rather than just chat completion. You can think of it less as “a chatbot” and more as a planning-and-execution engine that happens to speak natural language very well.

On the visual side, Google extended the ecosystem with NanoBanana, its new image generation and editing stack. The base capabilities focus on speed and editability: character consistency, local edits, fast iterations. The more advanced tier pushes into higher resolution, more reliable text rendering inside images, better control over style and lighting, and tighter coupling with language models so you can drive fairly complex visual transformations with simple prompts. Images stop being one-shot samples and become an interactive design loop.

All of this comes together in Antigravity, a new “agent-first” development environment. Instead of the usual coding assistant that sprinkles suggestions into your IDE, Antigravity treats the model as a first-class actor: an AI that can manage an editor, a terminal, and a browser; plan multi-step tasks; execute code; run tests; and leave behind an auditable trail of what it did and why. It’s designed to be model-agnostic but is deeply wired into the Gemini stack from day one.

Taken together, Grok 4.1, Gemini 3 Pro, NanoBanana, Antigravity—and, in a different corner of the ecosystem, NeuralFabric’s journey into Cisco—point in the same direction: away from “smart autocomplete” and toward agentic systems that can reason over huge contexts, act across tools, and express themselves in both code and pixels.

🔎 AI Research

Mixture of States: Routing Token-Level Dynamics for Multimodal Generation

AI Lab: Meta AI & KAUST

Summary: This paper introduces Mixture of States (MoS), a multimodal diffusion framework with a lightweight token-wise router that dynamically selects hidden states across text and vision towers based on the denoising timestep and input content. With 3–5B parameter models, MoS-Image and MoS-Edit achieve state-of-the-art text-to-image and image-editing performance, matching or surpassing models up to four times larger while remaining highly efficient.

Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning

AI Lab: Moonshot AI & Tsinghua University

Summary: Seer is a synchronous RL system for LLMs that exploits intra-group similarities in GRPO-style rollouts, introducing divided rollout with a global KV cache, context-aware scheduling, and adaptive grouped speculative decoding to reduce long-tail latency and improve hardware utilization. On large production RL workloads (Moonlight, Qwen2-VL-72B, Kimi-K2), it boosts rollout throughput by 74–97% and cuts long-tail latency by 75–93% compared to a strong synchronous baseline.

ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries

AI Lab: ARC Lab, Tencent

Summary: ARC-Chapter is a multimodal video chaptering model trained on VidAtlas, a new large-scale dataset of over 400,000 hours of videos with hierarchical annotations, combining ASR transcripts, visual captions, and OCR to produce timestamped titles, structured chapter summaries, and dense video descriptions. The authors also propose the GRACE metric and show that ARC-Chapter sets new state-of-the-art results on VidChapters-7M and transfers strongly to dense video captioning benchmarks like YouCook2 and ActivityNet Captions.

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

AI Lab: Meta SuperIntelligence Labs & FAIR at Meta

Summary: This work proposes Soup Of Category Experts (SoCE), a model-souping method that selects “expert” checkpoints per weakly correlated benchmark category and combines them with optimized non-uniform weights instead of simple uniform averaging. SoCE yields consistent gains across tool-calling, multilingual math, and long-context benchmarks, including state-of-the-art results on the Berkeley Function Calling Leaderboard and improved robustness and correlation across evaluation categories.

Real-time speech-to-speech translation

AI Lab: Google DeepMind & Google Core ML

Summary: This work presents an end-to-end streaming speech-to-speech translation model that can translate in real time in the speaker’s own voice with about a two-second delay, using a transformer-based audio-to-audio architecture with RVQ audio tokens and the AudioLM/SpectroStream stack. A scalable time-synchronized data pipeline plus low-bit quantization and inference optimizations enable robust performance across multiple language pairs, powering features in Google Meet and on-device Pixel Voice Translate.

Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

AI Lab: NVIDIA Research

Summary: This paper introduces Nemotron Elastic, an elastic training framework for hybrid Mamba–Attention LLMs that embeds multiple nested submodels (6B, 9B, 12B) inside a single parent reasoning model via an end-to-end learned router and structured masking. Using only 110B tokens, it simultaneously produces competitive 6B and 9B variants from a 12B teacher, achieving up to 360× training cost reduction versus training separate model families and enabling constant deployment memory through zero-shot slicing of all submodels from one checkpoint.

🤖 AI Tech Releases

Gemini 3

Google launched Gemini 3, the latest version of its marquee model with capabilities optimized for reasoning and agentic workflows.

Grok 4.1

xAI released Grok 4.1 with impressive results in top benchmarks.

Google Antigravity

Goolge released Antigravity, a new IDE for agentic software development.

Nano Banana Pro

Google also released Nano Banana Pro, its next generation image generation model.

SAM3

Meta released Segment Anything 3(SAM 3), its object segmentation and tracking model, they also released the SAM 3 playground.

Olmo 3

The Allen Institute for AI(AI2) released Olmo3, a completely open source family of models, datasets and training stack.

📡AI Radar

NestAI, a Finnish “physical AI” startup, raised €100M and entered a strategic partnership with Nokia to build AI systems for defense and other real-world infrastructure applications
Nvidia reported record quarterly revenue of $57.0B, driven by $51.2B in data center sales, and guided next quarter to $65B, reinforcing its view that AI demand is nowhere near peaking.
Function Health raised $298M at a $2.5B valuation and launched its Medical Intelligence Lab, which uses AI over longitudinal labs, imaging, wearables and records to surface early health risks and personalized interventions.
Adobe agreed to acquire AI-powered SEO and marketing analytics platform Semrush for $1.9B in cash to deepen its data and measurement capabilities across the Adobe Experience Cloud.
Intuit signed a multi-year $100M+ deal with OpenAI to embed frontier models into its GenOS platform and bring products like TurboTax, QuickBooks, Credit Karma and Mailchimp directly into ChatGPT as AI agents.
Quora’s Poe app introduced group chat, letting up to 200 people collaborate in a single room while jointly invoking more than 200 different text, image, audio and video models and custom bots.
Databricks is reportedly in talks to raise $3–5B at a valuation north of $130B, underscoring how investor appetite for AI data and model infrastructure remains strong despite bubble concerns.
Peec AI raised a $21M Series A to help brands understand and optimize how they appear in AI search answers across systems like ChatGPT, tracking visibility, sentiment and drivers of reach.
Jeff Bezos has taken on a co-CEO role at Project Prometheus, a $6.2B-funded AI startup building models for the “physical economy” in sectors like aerospace and manufacturing, marking his first major operating job since leaving Amazon.
Luminal raised $5.3M in seed funding to build a new GPU programming framework aimed at simplifying and optimizing how developers write high-performance AI code on modern accelerators.
Runlayer, a security startup built around the Model Context Protocol, emerged from stealth with an $11M seed to provide observability and policy controls for AI agents and MCP-based apps across multiple “unicorn” customers.
Sakana AI announced a ¥20B (~$135M) Series B round at a ~$2.65B valuation to scale its Japan-focused, sovereign AI models and expand from finance into defense, manufacturing and government use cases.
Japan’s state-backed chip foundry Rapidus outlined plans to IPO around fiscal 2031 as it ramps 2nm manufacturing, framing the listing as part of a broader national push for domestic advanced semiconductor capacity.
Meta is moving into wholesale power trading, aiming to directly contract and finance new U.S. power plants to meet the enormous electricity demand of its AI data centers.
OpenAI and Foxconn announced a partnership to co-design and manufacture U.S.-based data-center hardware—racks, power and networking gear—intended to shore up the AI supply chain and domestic infrastructure build-out .
Physical Intelligence, a startup building AI software that teaches robots diverse real-world skills, raised about $600M at a $5.6B valuation in one of the largest recent funding rounds in robotics software (Axios coverage). Axios
Citadel Securities launched AI-driven bond trading baskets tied to long-dated debt from Big Tech names like Microsoft, Amazon, Alphabet and Meta, giving investors a new way to hedge or express macro views on the AI trade.
The founders of Tome are pivoting away from their viral AI presentation app (20M+ users) to build Lightfield, an AI-native CRM that automates data capture and aims to challenge incumbents like Salesforce and HubSpot.

TheSequence

Discussion about this post

Ready for more?