Edge 288: Inside DeepSpeed-Chat: Microsoft’s New Framework to Create ChatGPT-Like Models Based on Human Feedback

The new framework builds on the scalability capabilities of DeepSpeed to fine tune LLMs using RLHF.

May 04, 2023

∙ Paid

Reinforcement learning with human preferences(RLHF) has become one of the cornerstones of the new generation of large language models(LLMs). RLHF-based models such as InstructGPT became the foundation of ChatGPT and have inspired alternatives such as Databricks’s Dolly. Despite its unquestionable value, fine-tuning LLMs using the RLHF pipeline remains a very difficult task due to the absence of mainstream frameworks. Recently, Microsoft Research opened sourced DeepSpeed-Chat, a framework for democratizing access to RLHF pipelines.

It is not a surprise that Microsoft decided to build on the capabilities of the DeepSpeed framework. Released a few years ago, DeepSpeed has become one of the most adopted stacks for the high-scale training of LLMs. Using that foundation for RLHF pipelines seems like a natural fit.

TheSequence

Edge 288: Inside DeepSpeed-Chat: Microsoft’s New Framework to Create ChatGPT-Like Models Based on Human Feedback

The new framework builds on the scalability capabilities of DeepSpeed to fine tune LLMs using RLHF.

This post is for paid subscribers