TheSequence

TheSequence

Share this post

TheSequence
TheSequence
The Sequence Opinion #480: What is GPT-o1 Actually Doing?

The Sequence Opinion #480: What is GPT-o1 Actually Doing?

Some ideas about how reasoning works in the OpenAI models.

Jan 30, 2025
∙ Paid
24

Share this post

TheSequence
TheSequence
The Sequence Opinion #480: What is GPT-o1 Actually Doing?
1
Share
Created Using Midjourney

These days we can only talk about DeepSeek-R1 and the reasoning capaiblities of foundation models. The reasoning race was initially triggered by the release of GPT-o1 followed by the announcement of the upcoming release of GPT-o3. Despite the hype, we know very little about how models like o1 actually work. Some research indicates that these models are no longer limited to generating text based on static training data—they are actively reasoning, synthesizing programs, and refining their outputs through reinforcement learning. By exploring hypotheses about how these models work internally, we can better understand their mechanisms and the breakthroughs they represent. This essay delves into three critical aspects of these models: reasoning hypothesis search, program synthesis, and the innovative reinforcement learning techniques introduced by DeepSeek-R1.

Reasoning Hypothesis Search: Structured Problem Solving

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Jesus Rodriguez
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share