TheSequence

TheSequence

The Sequence AI of the Week #733: DeepSeek 3.2 Makes Long Context Cheap

The model introduced a new attention architecture and many optimizations.

Oct 08, 2025
∙ Paid
Created Using GPT-5

DeepSeek’s “3.2” release is not a wholesale reinvention of its V‑series so much as a deliberate, experimental branch designed to de‑risk a set of architectural ideas before they migrate into the next production generation. The public artifact—often referred to as DeepSeek 3.2—centers on a new DeepSeek Sparse Attention (DSA) mechanism that aggressively lowers compute and memory overhead for long‑context prefill and decode while aiming to preserve quality. Around that nucleus, the release also pushes on platform pragmatism: first‑class support for Chinese accelerators and vendor stacks, together with runtime integrations that make those hardware choices deployable in mainstream inference engines. The guiding thesis is simple: scale is constrained, so the path forward is smarter attention, cheaper tokens, and broader hardware optionality.

From V3/V3.1 to 3.2: What Actually Changed?

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Jesus Rodriguez · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture