The Sequence Opinion #718: From Scale to Skill: The Rise of Post‑Training
Explaining one of the most monumental transitions in modern AI.
Modern “frontier” AI models – spanning language, vision, and multimodal systems – are now built in two major phases. First comes pretraining, where a large model (often called a foundation model) is trained on broad data to acquire general knowledge. Next is post-training, a suite of refinements (fine-tuning, alignment, etc.) applied after the base model is built. In this essay, we explore the transition from pretraining to post-training for cutting-edge AI models across modalities. We define the distinction between these phases, examine why post-training techniques are increasingly crucial (driven by needs for alignment, safety, controllability, efficiency, and more), and survey key methods like instruction tuning, reinforcement learning from feedback, preference modeling, supervised fine-tuning, and tool-use augmentation. We illustrate these concepts with case studies – including DeepSeek-R1, GPT-4, Google’s Gemini, and Anthropic’s Claude – highlighting how post-training strategies are implemented and what effects they have. A dedicated section delves into reinforcement learning in post-training (especially RLHF and the newer RLAIF), discussing benefits and current limitations. We close with reflections on how post-training is shaping the future of deploying and researching frontier AI models.