Edge 288: Inside DeepSpeed-Chat: Microsoft’s New Framework to Create ChatGPT-Like Models Based on Human Feedback
The new framework builds on the scalability capabilities of DeepSpeed to fine tune LLMs using RLHF.
Reinforcement learning with human preferences(RLHF) has become one of the cornerstones of the new generation of large language models(LLMs). RLHF-based models such as InstructGPT became the foundation of ChatGPT and have inspired alternatives such as Databricks’s Dolly. Despite its unquestionable value, fine-tuning LLMs using the RLHF pipeline remains a very difficult task due to the absence of mainstream frameworks. Recently, Microsoft Research opened sourced DeepSpeed-Chat, a framework for democratizing access to RLHF pipelines.
It is not a surprise that Microsoft decided to build on the capabilities of the DeepSpeed framework. Released a few years ago, DeepSpeed has become one of the most adopted stacks for the high-scale training of LLMs. Using that foundation for RLHF pipelines seems like a natural fit.