TheSequence

TheSequence

The Sequence Opinion #722: From Language to Action: Transformer Architectures as Robotic Foundation Models

Can we have generalists models for robotics?

Sep 18, 2025
∙ Paid
9
1
Share
Generated image
Created Using GPT-5

Building a transformer-based model for robotics holds great promise for generalizing across multiple tasks and robot embodiments. In recent years, researchers have begun applying the same Transformer architecture that revolutionized NLP and vision to robotics, aiming to create foundation models for robots. Such models would learn from large-scale, diverse data and potentially perform many tasks (and even work with different types of robots) without retraining from scratch for each new skill. This essay surveys the opportunities and challenges in that direction, with a focus on how transformer models can generalize across tasks, what makes that hard (data, training, safety), and which lines of work have pushed the frontier.

Introduction and Background

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Jesus Rodriguez
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture