TheSequence

TheSequence

Share this post

TheSequence
TheSequence
🤖Edge#142: How Microsoft Built a 530 Billion Parameter Model

🤖Edge#142: How Microsoft Built a 530 Billion Parameter Model

Nov 18, 2021
∙ Paid
5

Share this post

TheSequence
TheSequence
🤖Edge#142: How Microsoft Built a 530 Billion Parameter Model
Share

What’s New in AI, a deep dive into one of the freshest research papers or technology frameworks that is worth your attention. Our goal is to keep you up to date with new developments in AI to complement the concepts we debate in other editions of our newsletter.

Share

💥 What’s New in AI: How Microsoft built Megatron-Turing NLG, one of the largest language models in history  

Another month and another big transformer model becomes available. This time, it was Microsoft’s turn. In collaboration with NVIDIA, the Redmon giant announced a 530 billion parameter model called Megatron-Turing Natural Language Generation (MT-NLG). The model is a successor of Turing-NLG, which, a few months ago, was considered the biggest language model in the world. 

Large pretrained language models are always impressive, but they have become sort of the norm in the NLP space. In that sense, it’s worth looking into the unique aspects of a model that are uniquely differentiated compared to alternatives.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Jesus Rodriguez
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share