TheSequence

TheSequence

Share this post

TheSequence
TheSequence
Edge 372: Learn About CALM, Google DeepMind's Method to Augment LLMs with Other LLMs

Edge 372: Learn About CALM, Google DeepMind's Method to Augment LLMs with Other LLMs

Just like RAG but with LLMs!

Feb 22, 2024
∙ Paid
14

Share this post

TheSequence
TheSequence
Edge 372: Learn About CALM, Google DeepMind's Method to Augment LLMs with Other LLMs
3
Share
Imagine a futuristic scene inside a large, open-concept laboratory filled with advanced technology. In the center, a group of humanoid robots are gathered around a holographic display that projects a complex, interconnected network of nodes and data streams, symbolizing a shared knowledge representation. These robots are actively engaging with the hologram, pointing at specific nodes, and exchanging digital data through beams of light connecting their heads. Each robot represents a different AI language model, and they are working together to analyze and solve a complex task. The environment is filled with screens displaying code, charts, and 3D models, emphasizing the collaborative effort and the cutting-edge nature of their task. The lighting is dynamic, with shades of blue and green highlighting the technology and the holographic display.
Created Using DALL-E

Knowledge augmentation is one of most important topics in LLM-based applications. Over the last few months, we have seen a proliferation of augmentation techniques such as retrieve-augmented-generation(RAG) that attempt to expand LLM knowledge with access to external tools or data. However, can we augment LLMs with other LLMs? This seems like an area worth exploring and is the subject of a new paper by Google DeepMind.

The idea of augmenting LLMs with LLMs ties directly into the area of model composition. The key principle to explore is whether it’s possible to combine a general-purpose anchor model with a specialized model to create new abilities. For example, could the code understanding ability of one model be merged with the language generation skill of another to facilitate code-to-text generation? Typically, the solution involves additional training or fine-tuning of the anchor model using the data from the specialized model. However, this approach can be computationally expensive and sometimes impractical due to data privacy concerns and organizational limits.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Jesus Rodriguez
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share