Falcon-180B Takes Open Source LLMs Closer to GPT-4

Sep 10, 2023

Next Week in The Sequence:

Edge 325: We conclude our longest and most sucessful series about new techniques in foundation models with a comprehensive summary. I can’t wait to tell you about our next series.
Edge 326: Deep dives into SDXL 1.0, the new text-to-image super model released by Stability AI.

Go subscribe!

📝 Editorial: Falcon-180B Takes Open Source LLMs Closer to GPT-4

A few months ago, The Technology Innovation Institute (TII) in the United Arab Emirates (UAE) took the world of foundation models by storm with the release of the Falcon LLM model. At the time, Falcon was the biggest LLM ever released, with versions of 1B, 7B, and 40B, respectively. The model showed that massively large open-source LLMs that rivaled commercial alternatives such as GPT-4, PaLM2, and Anthropic were a real possibility. Building on the initial success of Falcon, last week, TII open-sourced a new version that showcases an astonishing 180B parameters.

Falcon 180B was trained on an unfathomable 3.5 trillion tokens using 4096 GPUs and 7M GPU hours. This effectively represents 2.5 times the size of Llama2 and 4 times the computing power. The released model is fine-tuned on instructional and conversational datasets and definitely represents a completely different level of scale. At over 2.5 times the size of Llama2, Falcon 180B easily topped the open LLM leaderboard, outperforming all other models in tasks such as reasoning, coding proficiency, and knowledge tests. Furthermore, Falcon 180B outperforms GPT-3.5 on different benchmarks, clearly outlining how quickly open source has bridged the gap with closed models.

Falcon 180B represents yet another important milestone for the open-source momentum in foundation models. A movement that started with Stable Diffusion and has been actively continued by Llama, Falcon and dozens of other models has sparked a tremendous level of innovation. At this pace, it is not inconceivable to expect open-source models that outperform GPT-4 in the next few months. The momentum in open-source foundation models is real and is not showing any signs of slowing down.

🔎 ML Research

TSMixer

Google Research published a paper detailing TSMixer, a long-term forecasting time series model. TXMixer is a multivariate model that leverages linear features to address the requirements of long term forecasts —> Read more.

AI Compilers

Microsoft Research published four papers introducing different AI compilers. The paper includes Rammer for parallelism, Roller for computational efficiency, Welder for memory usage and Grinder for hardware acceleration —> Read more.

Qwen-VL

Alibaba Cloud published a paper introducing Qwen-VL, a set of vision-language model that mastered different tasks across those domains. Specifically, the paper discusses Qwen-VL and Qwen-VL-Chat and their performance in tasks such as zero-shot captioning, visual or document visual question answering, and grounding —> Read more.

Frontiers of Multimodal Learning

Micrsooft Research published a summary of recent papers detailing their responsible approach to multimodal learning. The research cover aspects such as scaling, risks, scoring methods and other methods relevant in multimodal learning —> Read more.

RLAIF

Google Research published a paper discussing an AI-based alternatives to reinforcement learning with human feedback(RLHF). Called reinforcement learning with AI feedback(RLAIF), the method uses LLMs for labeling the outputs as an alternative to humans —> Read more.

🤖 Cool AI Tech Releases

Falcon 180B

The new version of the Falcon LLM has been released easily topping the open LLM leaderboard —> Read more.

IBM Granite

IBM announced Granite, a new series of foundation models for the WatsonX platform —> Read more.

🛠 Real World ML

ML at Pinterest

The Pinterest engineering team discusses MLEnv, their standarized engine for ML workloads —> Read more.

Walmart’s ML Platform

Walmart Global Tech provides details about Element ML, its internal ML platform —> Read more.

📡AI Radar

Anthropic announced Claude Pro, its paid plan for Claude.ai.
AI lab Imbue raised $200M to build AI agents that can reason.
Intuit announced a new generative AI powered assistant.
AI chip startup D-Matrix raised $110 million.
500 Global closed a $143 million fund to invest primarly in AI startups in southeast Asia.
Kindo raised $7 million for its AI business productivity platform. .
Green announced a $4.9 million raise to accelerate its customer support platform powered by generative AI.
AI-powered anti money laundering platform ThetaRay raised $57 million.
Air Street Capital raised $121 million to invest in AI startups.
Jack Ma’s Ant Group unveiled an LLM fine-tuned for wealth management and insurance.
AI reading app Ello raised $15 million.

TheSequence

Discussion about this post

Ready for more?