The Sequence Radar #739: Last Week in AI: From Vibes to Verbs: Agent Skills, Haiku 4.5, Veo 3.1, and nanochat
Lots of fun developments for practical AI applications.
Next Week in The Sequence:
A few fun things: we will continue our series about AI interpretability. We will be releasing a long piece about fine-tuning vs. reinforcement learning that you cannot miss and will dive into Anthropic’s new Agent Skills.
Subscribe Now to Not Miss Anything:
📝 Editorial: Last Week in AI: From Vibes to Verbs: Agent Skills, Haiku 4.5, Veo 3.1, and nanochat
This week in AI was just a lot of fun: the frontier is racing, but the tooling is finally congealing into something you can depend on. Fewer magic tricks, more scaffolding. You can feel the distance compress between an idea, a script, and a shipped product.
Anthropic’s Agent Skills shift agents from “one giant brain” to a set of precisely scoped capabilities you can load on demand. Instead of a universal assistant improvising across everything, Claude can snap into a well-defined mode—say, Excel analyst, RFP writer, or procurement agent—each packaged with instructions, tools, and resources. That sounds mundane, but real enterprises run on checklists, templates, and compliance. By turning those artifacts into first-class skills, you get repeatability, auditability, and fewer accidental side quests. In practice this looks like clean interfaces: a skill declares what it can do, which APIs it can call, and how outputs are formatted. This also reduces context bloat: you don’t stuff the model with the whole company; you mount the one binder that matters and detach it when you’re done.
Alongside that procedural upgrade, Claude Haiku 4.5 leans into the small-but-capable regime. The appeal is not just latency or price—it’s the idea that most work doesn’t need Olympian IQ, it needs a fast, reliable contributor who shows up instantly and follows the playbook. Haiku 4.5 claims near-Sonnet coding quality at a fraction of the cost with materially lower time-to-first-token. When you pair Haiku with Agent Skills, you start designing systems around time-to-useful: a lightweight model spins up, mounts two or three skills (style guide, spreadsheet ops, vendor database), executes with crisp boundaries, then gets out of the way. This is how you scale to thousands of concurrent, low-variance tasks without melting your budget.
On the creative side, Google DeepMind Veo 3.1 nudges video generation from “cool clips” toward directable sequences. The headline is control. You can specify characters, locations, objects, transitions, and iterate toward continuity normally earned in an editor. Audio gets cleaner, motion is more stable, and the model is less surprised by your intentions. The important mental shift is to treat video synthesis as a programmable pipeline, not prompt roulette. The more granular the handles—shot duration, camera intent, scene constraints—the more you can unit-test narrative structure the same way you test code paths. For teams building ads, explainers, or product demos, this moves generative video from whimsical novelty into an iterative craft.
Finally, Andrej Karpathy’s “nanochat” is this week’s best educational artifact. It’s an end-to-end ChatGPT-style system distilled to the essentials: tokenizer, pretraining, SFT, RL, eval, inference, and a minimal web UI. The superpower here is line-of-sight: every stage is short enough to read and cheap enough to run, so the path from blank GPU to functional chat agent is hours, not weeks. That lowers the barrier for students and teams alike: clone, run, modify, measure. Want to experiment with a custom reward model? Swap a few lines. Curious about inference quirks? Tweak the sampler and observe. In a field that often hides complexity behind opaque stacks, nanochat is a public service—an opinionated baseline you can reason about and extend.
If there’s a theme, it’s specialization with handles: scoped agency that loads the right binder, compact models that cut latency, video systems that expose practical levers, and a reference stack you can actually read. Less spectacle, more engineering. That’s progress.
🔎 AI Research
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search
AI Lab: Apple, Johns Hopkins University
Summary: DeepMMSearch-R1 is a multimodal LLM that performs on-demand, multi-turn web searches with dynamic query generation and cropped image-based search, trained via SFT followed by online RL. It also introduces the DeepMMSearchVQA dataset and a three-tool pipeline (text search, grounding/cropping, image search) to enable self-reflection and state-of-the-art results on knowledge-intensive VQA.
Robot Learning: A Tutorial
AI Lab: University of Oxford & Hugging Face
Summary: This tutorial surveys the shift from classical, model-based control to data-driven robot learning, and walks through RL, behavioral cloning, and emerging generalist, language-conditioned robot policies. It also presents the open-source lerobot stack and LeRobotDataset with practical, ready-to-run examples across the robotics pipeline.
Tensor Logic: The Language of AI
AI Lab: Pedro Domingos, University of Washington
Summary: The paper proposes “tensor logic,” a programming model that unifies neural and symbolic AI by expressing rules as tensor equations (einsum) equivalent to Datalog operations, enabling learning and inference in a single framework. It demonstrates how to implement neural nets, symbolic reasoning, kernel machines, and graphical models, and discusses scaling via Tucker decompositions and GPU-centric execution.
Agent Learning via Early Experience
AI Lab: Meta Superintelligence Labs, FAIR at Meta, The Ohio State University,
Summary: The authors introduce “early experience,” a reward-free training paradigm where agents use the consequences of their own exploratory actions as supervision, with two concrete strategies—implicit world modeling and self-reflection. Across eight environments, these methods improve effectiveness and OOD generalization and provide strong initializations for downstream RL.
Qwen3Guard Technical Report
AI Lab: Qwen
Summary: Qwen3Guard introduces multilingual safety guardrail models in two variants—Generative (instruction-following tri-class judgments: safe/controversial/unsafe) and Stream (token-level, real-time moderation for streaming)—released in 0.6B/4B/8B sizes with support for 119 languages. It reports state-of-the-art prompt/response safety classification across English, Chinese, and multilingual benchmarks and is released under Apache 2.0.
🤖 AI Tech Releases
Claude Haiku 4.5
Anthropic released Claude Haiku 4.5, its latest small model that showcases performance comparable to Sonnet 4.
Veo 3.1
Google DeepMind released the new version of its marquee video generation model.
Qwen3-VL
Alibaba Qwen released Qwen3-VL 4B and 8B, two small models optimized for reasoning and instruction following.
nanochat
Andrej Karpathy released nanochat, an open source training and inference pipeline similar to ChatGPT.
Agent Skills
Anthropic released Agent Skills to specialize Claude tasks with script, resources and instructions.
📡AI Radar
Anthropic and Salesforce deepen ties: Claude becomes a preferred model in Agentforce, with Salesforce also rolling out Claude Code internally for faster engineering in regulated industries.
A BlackRock-led consortium will acquire Aligned Data Centers for about $40B to turbo-charge the AI infrastructure build-out.
Huawei-linked firms showcased chip tools and software in a show of domestic semiconductor momentum despite tightening U.S. curbs.
KAYAK adds “AI Mode,” letting travelers ask questions and book flights, hotels, and cars through a conversational interface on web.
General Intuition raises $133.7M seed to train agents’ spatial reasoning from billions of gameplay clips via Medal.
Jack & Jill secures $20M to pair conversational agents for candidates and employers, expanding its AI recruiter across markets.
Liberate lands $50M at a $300M valuation to automate P&C insurance workflows—sales, service, and claims—with voice and agentic AI.
Nscale signs an expanded deal with Microsoft involving ~200k Nvidia GPUs across U.S. and EU sites to scale Azure AI capacity.
Google commits $15B (2026–2030) for a Vizag AI hub with a gigawatt-scale data center and a new subsea connectivity gateway.
Salesforce unveils Agentforce 360 to build, deploy, and manage enterprise AI agents across Customer 360 and Slack.
Prezent raises $30M to buy AI services firms—starting with the founder’s other company—to power an all-in-one presentations platform.
Strella raised $14 million for its AI customer understanding platform.