TheSequence

TheSequence

The Sequence Opinion #718: From Scale to Skill: The Rise of Post‑Training

Explaining one of the most monumental transitions in modern AI.

Sep 11, 2025
∙ Paid
12
Share
Generated image
Created Using GPT-5

Modern “frontier” AI models – spanning language, vision, and multimodal systems – are now built in two major phases. First comes pretraining, where a large model (often called a foundation model) is trained on broad data to acquire general knowledge. Next is post-training, a suite of refinements (fine-tuning, alignment, etc.) applied after the base model is built. In this essay, we explore the transition from pretraining to post-training for cutting-edge AI models across modalities. We define the distinction between these phases, examine why post-training techniques are increasingly crucial (driven by needs for alignment, safety, controllability, efficiency, and more), and survey key methods like instruction tuning, reinforcement learning from feedback, preference modeling, supervised fine-tuning, and tool-use augmentation. We illustrate these concepts with case studies – including DeepSeek-R1, GPT-4, Google’s Gemini, and Anthropic’s Claude – highlighting how post-training strategies are implemented and what effects they have. A dedicated section delves into reinforcement learning in post-training (especially RLHF and the newer RLAIF), discussing benefits and current limitations. We close with reflections on how post-training is shaping the future of deploying and researching frontier AI models.

Pretraining vs. Post-Training: Two Distinct Phases

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Jesus Rodriguez
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture