The Sequence Opinion #480: What is GPT-o1 Actually Doing?
Some ideas about how reasoning works in the OpenAI models.
These days we can only talk about DeepSeek-R1 and the reasoning capaiblities of foundation models. The reasoning race was initially triggered by the release of GPT-o1 followed by the announcement of the upcoming release of GPT-o3. Despite the hype, we know very little about how models like o1 actually work. Some research indicates that these models are no longer limited to generating text based on static training data—they are actively reasoning, synthesizing programs, and refining their outputs through reinforcement learning. By exploring hypotheses about how these models work internally, we can better understand their mechanisms and the breakthroughs they represent. This essay delves into three critical aspects of these models: reasoning hypothesis search, program synthesis, and the innovative reinforcement learning techniques introduced by DeepSeek-R1.