The Sequence Chat: Hugging Face's Lewis Tunstall on ZEPHYR , RLHF and LLM Innovation

One of the creators of ZEPHYR discusses ideas and lessons learned building LLMs at scale.

Dec 13, 2023

Quick bio

Lewis Tunstall is a Machine Learning Engineer in the research team at Hugging Face and is the co-author of the bestseller “NLP with Transformers” book. He has previously built machine learning-powered applications for start-ups and enterprises in the domains of natural language processing, topological data analysis, and time series. He holds a PhD in Theoretical Physics, was a 2010 Fulbright Scholar and has held research positions in Australia, the USA, and Switzerland. His current work focuses on building tools and recipes to align language models with human and AI preferences through techniques like reinforcement learning.

Please tell us a bit about yourself. Your background, current role and how did you get started in AI?

My path to working in AI is somewhat unconventional and began when I was wrapping up a postdoc in theoretical particle physics around 2016. At the time, a friend of mine was studying algorithms to estimate the background for proton collisions at the Large Hadron Collider, and one day he showed me a script of TensorFlow code that trained a neural network to classify these events. I was surprised to learn that a few lines of code could outperform features that had been carefully designed by physicists over many years. This sparked my curiosity, and I started poking around trying to understand what this deep learning stuff was all about.

Since I didn’t have much programming experience (theorists only need pen and paper!), I teamed up with a few physics friends to enter a Kaggle competition on predicting Russian housing prices. This was a great learning experience and taught me a lot about Python and XGBoost -- in those days, most Kaggle competitions were tabular! I had so much fun tinkering with code and data that I decided to pivot from academia to industry and haven’t looked back. Currently I am a machine learning engineer in the research team at Hugging Face, where I focus on aligning language models to follow human instructions via techniques like Reinforcement Learning from Human Feedback (RLHF).

🛠 AI Work

You are one of the co-creators of the ZEPHYR model. Can you tell us about the vision and inspiration for the project?

Zephyr was inspired by two trends which emerged in the AI community over the last few months. On the one hand, people figured out that you could fine-tune a pretty good chat model by distilling a dataset of conversations from more capable models like GPT-3.5 or GPT-4. This meant you could skip the costly human annotation step altogether and focus on generating data for specific tasks like coding or function calling.

In parallel, many researchers were exploring simpler alternatives to RLHF, which is the alignment technique behind ChatGPT and Claude. A team at Stanford proposed a novel technique called Direct Preference Optimization (DPO), which removed reinforcement learning entirely from the alignment process and required far less compute to run.

We thought it was interesting to combine these ideas and apply DPO to a dataset called UltraFeedback, which contains a diverse set of model responses that are ranked by GPT-4 according to criteria like helpfulness. The result was Zephyr 7B, which was a surprisingly capable model for its size.

ZEPHYR is based on Mistral-7B. Were there any specific characteristics about this model that made it a good candidate for alignment fine-tuning? What sets Mistral apart among open-source LLMs?

When Mistral 7B was released, we knew from various benchmarks that it was the best base model at the 7B parameter scale, which is great for fine-tuning because you can iterate fast and even run the models on your laptop! And in our initial experiments, we found that Mistral chat models were far more fluent and capable than previous iterations we’d trained with Llama2 and Falcon.

However, as I write this, the latest release from Mistral is Mixtral 8x7B, which appears to be the first open model to truly match the performance of GPT-3.5. It seems likely that a clever mix of fine-tuning and data distillation will produce a whole new set of capable chat models built on Mixtral, which is a very exciting development in the community.

Can you describe the training and evaluation process of ZEPHYR, emphasizing the logic behind the different decisions?

Most alignment techniques for language models involve two steps; first you teach a base model to follow instructions, followed by a second step where you optimize the model to according to a set of ranked preferences and techniques like reinforcement learning or DPO.

In the case of Zephyr, we first fine-tuned Mistral 7B on a dataset called UltraChat, which simulates millions of conversations between two GPT-3.5 models. However, we found that the resulting model had an annoying personality (i.e. it would often refuse to answer simple commands), so we heavily filtered the dataset to focus on helpful responses. We then took this model and optimized it with DPO on the UltraFeedback dataset I referred to earlier.

Now, evaluating chat models is a tricky business and the gold standard is human evaluation which is very costly to perform. Instead, we adopted what is now becoming a common practice to evaluate chat models with GPT-4. Although this method has various flaws, it does provide a decent proxy for human evaluation, and we used the popular MT-Bench and AlpacaEval benchmarks to guide our experiments.

One of the primary contributions of ZEPHYR was the incorporation of AI feedback via teacher models for the alignment tasks. Why did you choose this approach over more established human feedback mechanisms?

Earlier in the year, we had actually experimented with collecting human feedback from a data vendor, but found the process was both time consuming and costly to oversee. Based on this experience, we felt AI feedback was a more accessible route for both our small team and as a means to popularize a method that the community could also adopt.

How does ZEPHYR ultimately differ from InstructGPT?

InstructGPT was trained in a few different ways to Zephyr. For one, the InstructGPT datasets were single-turn human-annotated instructions, while Zephyr was trained on a large corpus of synthetic multi-turn dialogues. Another difference is that InstructGPT was aligned along various axes like helpfulness, honesty, and harmlessness, which often leads to a tension between the model’s capabilities and its tendency to hedge answers. By contrast, we focused on training Zephyr for helpfulness, which tends to also be what the community enjoys about open chat models.

With ambition in mind, could you speculate about the future of fine-tuning and alignment over the next three to five years?

Haha, with the current rate of progress it’s hard enough to predict one week into the future! But if I have to look into a crystal ball, then my current best guess is that we’ll see synthetic data become an integral part of how we fine-tune and pretrain language models. It’s also pretty clear that multimodality is the next frontier, both to instill new capabilities in models, but also as a potent source of new data from images, audio, and video. Figuring out how to align these models to a set of preferences across multiple modalities will take some tinkering to work out but is certainly a fun challenge!

You are the co-author of the Natural Language Processing with Transformers Book. Why another Transformers book, and what sets this one apart?

Although there are now quite a few technical books covering transformers, our book was written with AI developers in mind, which means we focus on explaining the concepts through code you can run on Google Colab. Our book is also perhaps the only one to cover pretraining a language model in depth, which was rather prescient since we wrote it a year before the open LLM revolution kicked off. Thom Wolf is also a co-author, so where better to learn transformers than from the person who created the Hugging Face Transformers library?

💥 Miscellaneous – a set of rapid-fire questions

What is your favorite area of research outside of generative AI?

As a former physicist, I find applications of deep learning to accelerate scientific discovery to be especially exciting! Chris Bishop has a wonderful lecture on this topic where he frames AI as the “fifth paradigm” of science, with a focus on using AI to accelerate numerical simulations for complex systems like the weather. If I wasn’t so busy playing with LLMs, I would likely be working in this field.

Who is your favorite mathematician and computer scientist, and why?

My favorite mathematician is John von Neumann, mostly because I didn't really understand quantum mechanics until I read his excellent textbook on the subject.

TheSequence

Discussion about this post