The Llama 2 Effect

Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.

Jul 23, 2023

Next Week in The Sequence:

Edge 311: Our series about foundation models continues with ReAct, a technique that combines reasoning and acting in LLMs. We review Google’s original ReAct paper and the Haystack framework for LLM-based search.
Edge 312: We review Microsoft’s groundbreaking paper: “Textbooks is All You Need”.

Go Subscribe!

📝 Editorial: The Llama 2 Effect

The debate between open source and closed source foundation models has become as interesting as ever, and the open source space has found an unlikely champion: Meta. The “accidental leak” of the weights of the Llama model sparked a tremendous level of innovation in open source foundation models, triggering the creation of models such as Vicuna, Koala, Red Pajama, MPT, Alpaca, Gorilla, and many others. Last week, Meta announced the open-source release and commercial availability of Llama 2 and a distribution partnership with none other than Microsoft.

Llama 2 was trained on a dataset over 40% larger than its predecessor, using 2 trillion pretraining tokens. The model was released in three main versions with 7B, 13B, and 70B parameters, respectively. Another solid improvement was the use of reinforcement learning with human feedback (RLHF) and proximal policy optimization (PPO) to improve the usefulness of the responses. The model was evaluated across many LLM benchmarks and performed very strongly relative to the recent generation of open-source LLMs.

And then there is the partnership with Microsoft.

As part of their strategic alliance, Microsoft announced support for Llama 2 on Azure and Windows. The Azure support includes the ability to deploy and fine-tune all versions of Llama 2 from the Azure AI Model Catalog. The Windows support enables the local execution of Llama 2 models using DirectML. Beyond the initial set of capabilities, Microsoft’s endorsement of Llama 2 represents a strong validation for the viability of open source foundation models. Together with Databricks’ acquisition of MosaicML and the recent funding rounds by companies like Stability AI, this event is signaling to the market that open source foundation models are a force to be reckoned with.

The Llama effect was about unlocking innovation in the open-source LLM space. The Llama 2 effect is about robustness and commercial readiness at the highest level.

💡Report: State of Applied Machine Learning 2023

We surveyed over 1700 ML practitioners for this inaugural report on the state of applied machine learning. It provides a comprehensive overview of applied ML, and shares the challenges and opportunities in the space, along with common trends across a diverse set of ML initiatives.

Download the full report for key findings, recommendations, and a deeper dive into the trends that will shape the future of applied ML!

GET THE REPORT

🔎 ML Research

CM3leon

Meta AI Research published a paper introducing CM3leon a text-to-image and image-to-text foundation model. CM3leon was trained with including a large-scale retrieval-augmented pre-training stage and a second multitask supervised fine-tuning (SFT) stage and achieve state of the art results in both modalities —> Read more.

Diffusion Model Fine Tuning with RL

Researchers from Berkeley AI Research(BAIR) lab published a paper detailing a reinforcement learning method used to fine tune diffusion models. The method fine tunes Stable Diffusion on different objective such as image compressibility, human-perceived aesthetic quality, and prompt-image alignment —> Read more.

SimPer

Google Research published a paper detailing SimPer, a self-supervised model for periodic data. SimPer uses contrastive learning to learn temporal properties of periodic target —> Read more.

Consistent Reasoning in LLMs

Amazon Science published a paper outlining a new chain-of-thought reasoning method for LLMs. The core idea is to use a teacher-student model that leverages knowledge distillation in question-answer pairs to improve the reasoning chain —> Read more.

Flash Attention-2

Researchers from Stanford University and Princeton published a paper FlashAttention-2, an IO-aware attention mechanism. FlashAttention-2 builds on its predecessor by adding several optimizations that reduce the FLOPs and parallelize attention computations —> Read more.

🤖 Cool AI Tech Releases

LLama 2

Meta AI released LLama 2, the next version of their marquee LLM now with commercial support —> Read more.

ChatGPT Custom Instructions

OpenAI released ChatGPT Custom Instructions, which allow users to set preferences that ChatGPT should consider when producing outputs —> Read more.

MPT-7B-8K

Mosaic ML unveiled MPT-7B-8k, a new LLM with an 8k context window —> Read more.

🛠 Real World ML

Prompt Engineering at GitHub

The GitHub engineering team discusses prompt engineering best practices —> Read more.

Time Series Analysis at Pinterest

The Pinterest engineering team shares some details about their architecture and techniques for time series analysis —> Read more.

📡AI Radar

AI financial planning platform Runway raised $27.5 million in a new round.
MLCommons released a new platform to evaluate AI medical models.
AI chatbot platform Character.ai could be raising a new round of funding.
AI21 Labs released Contextual Answers, a generative AI platform for organizational knowledge.
German robotic startup Neura Robotics raised $55 million to power its cognitive robotics platform
Language learning app Preply raised $120 million to expand its AI capabilities.
Qualtrics announced that it will invest $500 million in AI innovation over the next four years.
Splunk launched Splunk AI, a generative AI platform to machine observability.
Futureverse announced a $54 million funding round to scale its AI metaverse technology.
AI-based business process outsourcing platform Gushworks raised $2.1 million pre-seed round.
China’s OpenAI challenger Zhipu AI received a capital infusion from food delivery giant Meituan.
Decentralized MLOps platform FedML raised $11.5 million in new funding.
Enterprise LLM platform Unstructured.io raised $25 million.
Cognaize raised $18 million to enable LLM capabilities for the financial sector.
Cerebras unveiled one of the biggest AI clusters in the world.
Cleanlab announced $5 million in seed funding to enable high quality datasets for LLMs.

TheSequence

Discussion about this post