The Sequence Opinion #758: From Language to Landscape: The Age of Spatially Intelligent AI
One of the most promising category in frontier models.
I spent last week researching world models like Marble and SIMA2 and decided to put together a insanely long essay. But I think its worth reading.
Artificial intelligence has made remarkable strides in language understanding and computer vision, yet today’s AI remains largely “blind” to the physical 3D world around it. Most models can analyze text or flat images, but they lack awareness of the spatial, tactile reality we live in. This gap has led researchers to identify spatial intelligence – the ability for machines to perceive, navigate, and interact with the three-dimensional world as effectively as humans or animals – as the next frontier in AI. Achieving spatial intelligence would enable AI not just to talk about the world, but to truly understand and operate within it. Central to this vision is the development of world models: AI systems that build internal representations of environments and use them for perception, prediction, and planning. Just as large language models generate text and image models create pictures, world models can generate entire virtual spaces complete with objects, physics, and dynamic events. In essence, these models allow AI to imagine environments and simulate how those environments change when an agent takes actions – providing a kind of interactive sandbox or “Holodeck” for AI agents (and humans) to learn and explore.
In this essay, we survey the current landscape and future potential of world models as a path toward spatial intelligence. We begin with the technical foundations of world models and their key capabilities. We then examine major areas of progress – including embodied learning, multi-modal perception, long-term memory, and simulation-based reasoning – that are pushing AI to be more spatially aware. Next, we highlight some leading projects at the forefront of this field, such as World Labs’ Marble and DeepMind’s Genie and SIMA, along with other emerging efforts. We then explore real-world applications across gaming, robotics, design, education, and more, illustrating how these technologies are beginning to make an impact. Finally, we discuss the key challenges that remain – from scaling up models and grounding them in physical reality to achieving broad generalization – and consider future directions toward overcoming these hurdles. Throughout, we emphasize why world models are important for building more general AI, and how they tie together advances in perception, interaction, and imagination within virtual worlds.

