Welcome to the World of Small(er) Language Models
Smaller, highly specialized and cost-effective LLMs are a trend to track in generative AI.
Friday there was a small glitch in our editorial process and some of you might have received this edition in advance. Apologies for that.
Next Week in The Sequence:
Edge 347: Our series about fine-tuning dives into Anthropic’s Constitutional AI, reviews the original paper about this idea and explores the HumanLoop platform for fine-tuning.
Edge 348: We deep dive into Fuyu-8B, the multimodal model open sourced created by Adept.ai.
You can subscribe below:
📝 Editorial: Welcome to the World of Small(er) Language Models
Large language models (LLMs) have led the generative AI revolution in recent years. Questions related to the scaling limits of LLMs and whether scaling is the only path forward are sources of constant debate in the generative AI community. Recently, we have seen the emergence of another term that attempts to counter the thesis that "bigger is better" when it comes to LLMs: small ( or smaller) language models (SLMs).
The SLM thesis centers around the viability of smaller, highly specialized, more affordable models for specific use cases. This movement has partly been catalyzed by the rise of open-source generative AI models. When theorizing about the future of open source vs. closed source models, there are two main universes to explore:
Open source LLMs matching or surpassing the performance of closed source ones. Example: Llama 3 surpasses GPT-5.
Open source LLMs becoming the foundation for fine-tuned models or agents in highly specialized scenarios.
SLMs are the first manifestation of the second theory. Most companies can sacrifice a bit of the quality of models like GPT-4 or Claude in order to gain more control over the fine-tuning and optimization of LLMs and also optimize costs. Microsoft and Meta have emerged as champions of the SLM movement. In the last two weeks, the Redmond giant announced the release of Phi-2, an SLM highly specialized in mathematical reasoning, which is the second iteration of the ideas outlined in the "Textbooks are all You Need" paper. Microsoft also announced Orca2, an SLM hyper-optimized for reasoning tasks such as common sense reasoning, math problem solving, reading comprehension, and several others.
SLMs are likely to become a force to be reckoned with in generative AI. As LLMs keep pushing the scaling laws and become bigger and bigger, we should ask ourselves: how small is really small for an SLM?
🤖 Build Real-Time AI Applications Using Only Python
Did you know you can now use Python only to infuse real-time AI decisioning into all your applications? Tecton’s new proprietary compute engine Rift makes building real-time AI applications easier and faster than ever before!
Sign up for a Rift private preview now!
Or join us for an interactive workshop on Wednesday, December 13, to see Rift in action.
🔎 ML Research
Orca 2
Microsoft Research published a paper detailing Orca 2, the second version of a small language model that exhibit stronger reasoning capabilities that much larger alternatives. The model is created by fine-tuning Llama 2 with a sophisticated synthetic reasoning dataset —> Read more.
Transformers and Composability
Researchers from the Allen Institute for Artificial Intelligence published a paper exploring the limits of transformer models in compositional problems. The paper explores tasks such as multiplication, logic grid puzzles, and a classic dynamic programming problem that have traditionally resulted challenging for transformers —> Read more.
LLM Editing
Microsoft Research published a paper exploring three fundamental types of LLM editing techniques. These methods target small modifications in LLMs that can optimize the behavior of models without changing their fundamental architecture —> Read more.
ChatAnything
Researchers from Bytedance and Nankai University published a paper detailing ChatAnything, a model to generate anthropomorphized personas for LLM-based characters. The model incorporates in-context learning capabilities for features such as personality, tone and visual appearence —> Read more.
Lookahead Decoding
LMSys published the research behind lookahead decoding, a parallel decoding algorithm that can accelerate LLM inference. The method is already implemented in tne Hugging Face’s Transformers library and leads to significant performance improvements in token generation —> Read more.
🤖 Cool AI Tech Releases
Claude 2.1
Anthropic released a new version of Claude with an astonishing 200k token window —> Read more.
Stable Video
Stability AI open source Stable Video, a generative video model based on Stable Diffusion —> Read more.
Phi-2
Microsoft Phi-2 model for mathematical reasoning is now available —> Read more.
🛠 Real World ML
Python at Meta
Meta discusses some insights about the architecture and best practices supporting high scale Python workloads —> Read more.
📡AI Radar
The OpenAI drama dominated the headlines this week with the happy conclusion of Sam Altman’s return as CEO and the formation of a new board.
AI21 Labs completed a $208 million series C with an addition of $53 million.
NVIDIA delivered strong Q3 results.
Rockset added vector search capabilities to its database engine.
French startup Osium AI raised $2.6 million for applying AI to material sciences.
AI-ecommerce startup Birdeye announced a $3 million seed round.
Self-driving vehicle guru Anthony Levandowski rebooted his famous Churd of AI.