TheSequence

TheSequence

Share this post

TheSequence
TheSequence
Edge 288: Inside DeepSpeed-Chat: Microsoft’s New Framework to Create ChatGPT-Like Models Based on Human Feedback

Edge 288: Inside DeepSpeed-Chat: Microsoft’s New Framework to Create ChatGPT-Like Models Based on Human Feedback

The new framework builds on the scalability capabilities of DeepSpeed to fine tune LLMs using RLHF.

May 04, 2023
∙ Paid
29

Share this post

TheSequence
TheSequence
Edge 288: Inside DeepSpeed-Chat: Microsoft’s New Framework to Create ChatGPT-Like Models Based on Human Feedback
1
Share
Created Using Midjourney

Reinforcement learning with human preferences(RLHF) has become one of the cornerstones of the new generation of large language models(LLMs). RLHF-based models such as InstructGPT became the foundation of ChatGPT and have inspired alternatives such as Databricks’s Dolly. Despite its unquestionable value, fine-tuning LLMs using the RLHF pipeline remains a very difficult task due to the absence of mainstream frameworks. Recently, Microsoft Research opened sourced DeepSpeed-Chat, a framework for democratizing access to RLHF pipelines.

It is not a surprise that Microsoft decided to build on the capabilities of the DeepSpeed framework. Released a few years ago, DeepSpeed has become one of the most adopted stacks for the high-scale training of LLMs. Using that foundation for RLHF pipelines seems like a natural fit.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Jesus Rodriguez
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share