TheSequence

TheSequence

The Sequence Knowledge #854: Return of the King: Unrolling the xLSTM Architecture

An unexpected alternative to transformers.

May 05, 2026
∙ Paid

💡 AI Concept of the Day: Return of the King: Unrolling the xLSTM Architecture

If you were training sequence models circa 2015, your entire mental model of the world was shaped by the Long Short-Term Memory (LSTM) network. Invented in the 1990s by Sepp Hochreiter and Jürgen Schmidhuber, the LSTM was the undisputed workhorse of deep learning. It translated our text, recognized our speech, and powered the first generation of Large Language Models.

Then came 2017. “Attention Is All You Need” dropped, and the entire AI ecosystem pivoted. We traded the deep, architectural elegance of the LSTM for the brute-force, highly parallelizable matrix multiplications of the Transformer. The Transformer won the hardware lottery because it allowed us to map the entire sequence onto a GPU grid and train it all at once.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Jesus Rodriguez · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture