The Sequence #710: Learning About DeepSeek v3.1 in 10 Key Points
The new model combines generalist MoE, a reasoner, and an agent stack
DeepSeek-V3.1 is not a new model family so much as a decisive unification of three capabilities that have usually lived apart: a high-throughput generalist LLM, an explicit “thinking” reasoner, and an execution-competent agent. It sits on the DeepSeek-V3 base—an economical but large Mixture-of-Experts (MoE) transformer with Multi-Head Latent Attention (MLA)—and extends it along three axes: (1) hybrid inference that supports both “thinking” and “non-thinking” modes in a single checkpoint via chat-template control; (2) long-context and tool/agent upgrades driven by additional continued pretraining and post-training; and (3) operational polish (function-calling strictness, API compatibility) that makes it easier to embed V3.1 inside real agentic stacks. In effect, V3.1 collapses the V3 (generalist) and R1 (slow-thinker) lines into one deployable artifact while keeping the performance envelope of the base architecture.