The Sequence Opinion #691: The Thought Police: Should We Monitor AI’s Inner Dialogue?
Some reflections about one of the most important papers of recent times.
Sunday we discussed a new paper about ch mains of thought (CoTs) ed monitoring created by top AI labs like OpenAI, Anthropic, DeepMind and others. I’ve read the paper multiple times and wanted to share some of my thoughts.
In Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety, the AI labs propose that the reasoning steps LLMs output—known as chains of thought (CoTs)—offer a practical channel for spotting and stopping misaligned or malicious behavior in advanced AI systems before it occurs.
The paper, endorsed by leaders like Geoffrey Hinton and Ilya Sutskever, frames CoT monitoring as a complement to existing safety techniques. However, the authors warn that as models evolve, this transparency can erode, making CoTs less useful over time.