Welcome to the World of Small(er) Language Models

Smaller, highly specialized and cost-effective LLMs are a trend to track in generative AI.

Nov 24, 2023

A creative illustration showing programmers opting to use smaller open-source artificial intelligence language models instead of larger, closed-source AI models like OpenAI. The scene is set in a modern, collaborative workspace filled with diverse programmers, each engrossed in their work on computers. Some programmers are young, some older, with a mix of genders and ethnicities, reflecting inclusivity and diversity. Each workstation prominently displays visuals or symbols of open-source AI models, glowing on their screens, contrasting with a large, shadowy figure of a closed-source AI model like OpenAI looming in the background. The environment is lively and positive, highlighting the enthusiasm and commitment of programmers to open-source technology. — Created Using DALL-#

Friday there was a small glitch in our editorial process and some of you might have received this edition in advance. Apologies for that.

Next Week in The Sequence:

Edge 347: Our series about fine-tuning dives into Anthropic’s Constitutional AI, reviews the original paper about this idea and explores the HumanLoop platform for fine-tuning.
Edge 348: We deep dive into Fuyu-8B, the multimodal model open sourced created by Adept.ai.

You can subscribe below:

📝 Editorial: Welcome to the World of Small(er) Language Models

Large language models (LLMs) have led the generative AI revolution in recent years. Questions related to the scaling limits of LLMs and whether scaling is the only path forward are sources of constant debate in the generative AI community. Recently, we have seen the emergence of another term that attempts to counter the thesis that "bigger is better" when it comes to LLMs: small ( or smaller) language models (SLMs).

The SLM thesis centers around the viability of smaller, highly specialized, more affordable models for specific use cases. This movement has partly been catalyzed by the rise of open-source generative AI models. When theorizing about the future of open source vs. closed source models, there are two main universes to explore:

Open source LLMs matching or surpassing the performance of closed source ones. Example: Llama 3 surpasses GPT-5.
Open source LLMs becoming the foundation for fine-tuned models or agents in highly specialized scenarios.

SLMs are the first manifestation of the second theory. Most companies can sacrifice a bit of the quality of models like GPT-4 or Claude in order to gain more control over the fine-tuning and optimization of LLMs and also optimize costs. Microsoft and Meta have emerged as champions of the SLM movement. In the last two weeks, the Redmond giant announced the release of Phi-2, an SLM highly specialized in mathematical reasoning, which is the second iteration of the ideas outlined in the "Textbooks are all You Need" paper. Microsoft also announced Orca2, an SLM hyper-optimized for reasoning tasks such as common sense reasoning, math problem solving, reading comprehension, and several others.

SLMs are likely to become a force to be reckoned with in generative AI. As LLMs keep pushing the scaling laws and become bigger and bigger, we should ask ourselves: how small is really small for an SLM?

🤖 Build Real-Time AI Applications Using Only Python

Did you know you can now use Python only to infuse real-time AI decisioning into all your applications? Tecton’s new proprietary compute engine Rift makes building real-time AI applications easier and faster than ever before!

Or join us for an interactive workshop on Wednesday, December 13, to see Rift in action.

🔎 ML Research

Orca 2

Microsoft Research published a paper detailing Orca 2, the second version of a small language model that exhibit stronger reasoning capabilities that much larger alternatives. The model is created by fine-tuning Llama 2 with a sophisticated synthetic reasoning dataset —> Read more.

Transformers and Composability

Researchers from the Allen Institute for Artificial Intelligence published a paper exploring the limits of transformer models in compositional problems. The paper explores tasks such as multiplication, logic grid puzzles, and a classic dynamic programming problem that have traditionally resulted challenging for transformers —> Read more.

LLM Editing

Microsoft Research published a paper exploring three fundamental types of LLM editing techniques. These methods target small modifications in LLMs that can optimize the behavior of models without changing their fundamental architecture —> Read more.

ChatAnything

Researchers from Bytedance and Nankai University published a paper detailing ChatAnything, a model to generate anthropomorphized personas for LLM-based characters. The model incorporates in-context learning capabilities for features such as personality, tone and visual appearence —> Read more.

Lookahead Decoding

LMSys published the research behind lookahead decoding, a parallel decoding algorithm that can accelerate LLM inference. The method is already implemented in tne Hugging Face’s Transformers library and leads to significant performance improvements in token generation —> Read more.

🤖 Cool AI Tech Releases

Claude 2.1

Anthropic released a new version of Claude with an astonishing 200k token window —> Read more.

Stable Video

Stability AI open source Stable Video, a generative video model based on Stable Diffusion —> Read more.

Phi-2

Microsoft Phi-2 model for mathematical reasoning is now available —> Read more.

🛠 Real World ML

Python at Meta

Meta discusses some insights about the architecture and best practices supporting high scale Python workloads —> Read more.

📡AI Radar

The OpenAI drama dominated the headlines this week with the happy conclusion of Sam Altman’s return as CEO and the formation of a new board.
AI21 Labs completed a $208 million series C with an addition of $53 million.
NVIDIA delivered strong Q3 results.
Adobe is acquiring generative AI startup Rephrase.ai.
Rockset added vector search capabilities to its database engine.
French startup Osium AI raised $2.6 million for applying AI to material sciences.
AI-ecommerce startup Birdeye announced a $3 million seed round.
Self-driving vehicle guru Anthony Levandowski rebooted his famous Churd of AI.

TheSequence

Discussion about this post