TheSequence

TheSequence

The Sequence AI of the Week #757: 3D World Models in Action: Inside DeepMind’s SIMA 2 Architecture

An agent that can generate interactive 3D environments.

Nov 19, 2025
∙ Paid
Create Using GPT-5

World models are becoming a reality in front of our eyes! Today, we would like to dive into one of the most exciting ones.

DeepMind’s SIMA 2 is best understood as a systems project disguised as a gaming demo: it is a full-stack embodied agent that wraps a Gemini model in a visuomotor control loop, trains it across many 3D games, and then lets it improve itself through model-driven task generation and self-play. Rather than proposing a new neural building block in isolation, SIMA 2 offers a reference architecture for how a large multimodal model can perceive, reason, and act in complex simulated worlds using exactly the same interface as a human player.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Jesus Rodriguez
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture