You Need to Know About Groq

A $640 million funding round to accelerate its fast inference chips.

Aug 11, 2024

Next Week in The Sequence:

Edge 421: We start a new ( and short) series about state space models which are considered the main viable alternative to transformers. This issue includes a reviews of the famous “Transformers are SSMs” paper and the DeepChecks framework for testing, evaluating monitoring SSMs.
Edge 422: We dive into the fascinating NuminaMath model that just won first prize in the AI Math Olympiad.

You can subscribe to The Sequence below:

📝 Editorial: Groq’s Massive Milestone

Making inference fast is one of the north stars for the next generation of AI infrastructure providers. The default assumption is that the AI inference market will be dominated by NVIDIA, and while that might turn out to be correct, there is innovation happening across all levels of the AI infrastructure stack. One of the most intriguing newcomers in this space is Groq, a startup that has emerged to become synonymous with fast AI inference.

Groq is the maker of the Language Processing Unit (LPU), a chip optimized for fast AI inference. The Groq LPU is a single-core processor designed for LLMs, interconnected with a fast switchless routing fabric using 288 QSFP28 optical cables. A rack is built from 9 GroqNode 1 servers (with 1 server acting as a redundant resource), featuring a fully connected internal RealScale network delivering accelerated compute performance of up to 48 PetaOPs (INT8) or 12 PFLOPs (FP16). Groq Cloud is an LPU-based cloud offering that already includes many of the top generative AI models and counts over 350,000 developers. Groq is fast. In Groq Cloud, models can perform at over 500 tokens per second, with a 10x comparable improvement over GPT-4.

The rise of Groq might seem sudden given the frantic pace of the AI market. However, the startup has been working on LPUs since 2016. Their perseverance has been rewarded. Last week, Groq announced a $640 million funding round led by Blackrock, along with Neuberger Berman, Type One Ventures, Cisco, KDDI, and Samsung Catalyst Fund.

AI hardware is a tough market, but Groq now has the resources to innovate and compete. AI inference is still NVIDIA’s world, but Groq is a fascinating new player in it.

🔎 ML Research

Robots for Table Tennis

Google DeepMind published a paper introducing the techniques building the first robot agent to achieved human competitive level in table tennis. The paper details different techniques such as hierarchical policy learning, sim-to-real and many others that are combined in a very clever way —> Read more.

CodexGraph

Aliaba Research published a paper introducing CodexGraph, a system that integrates LLMs with a graph database based on code repositories. The graph model allows LLMs to navigate more sophisticated code structures adn tackle more complex tasks —> Read more.

GENEVA

Microsoft Research published a paper introducing GENEVA, a tool that can generate rich narrative graphs based on a high level description and a set of constraints. GENEVA, explores different narractive paths through a visual, graph interface —> Read more.

RAG Foundry

Researchers from Intel published a paper detailing RAG Foundry, a framework for streamlining RAG use cases. RAG Foundry enables capabilities such as data creation, inference, evaluation in a single workflow —> Read more.

LLM Scaling Without Increasing Parameters

Google DeepMind published a paper discussing the importance of scaling test time computation in order to scale LLMs. The paper explores increasing test-time computation by searching against dense, process-based verifier reward models and updating the model distribution based on prompts at test time —> Read more.

Self-Taught Evaluators

Meta AI published a paper proposing an LLM-as-a-judge technique to improve evaluators withouht using human synthetic data. The method trains an LLM to produce reasoning traces and final judgments and repeats that process to obtain improved prediction —> Read more.

🤖 AI Tech Releases

OpenAI API Structured Outputs

OpenAI unveiled a new feature that enables structured JSON outputs as part of its API —> Read more.

🛠 Real World AI

Prompt Poet

Character.ai open sourced Prompt Poet, their framework for prompt design —> Read more.

Smart Notifications at Pinterest

Pinterest discusses the machine learning techniques used in their notification systems —> Read more.

📡AI Radar

Groq raised $640 million for its fast AI inference platform.
Important departures at OpenAI.
OpenAI added a CMU professor to their board of directors.
TMSC announced an astonishig sales growth of 45% based on AI chip demand.
HuggingFace acquired XetHub to boost its ML infrastructure.
Google DeepMind developed a robot table tennis player.
Autonomous agent platform Bardeen.ai raised $15.3 million in a new round.
Perceptive Space raised $2.8 million to improve space weather predictions.
Microsoft and Palantir announced a new strategic alliance to deliver AI capabilities to the intelligence community.
Box acquired AlphaMoon to improve its document intelligence capabilities.
Autonomous vehicle startup WeRide is getting ready for an IPO.
SoundHound acquired Amelia AI to boost its conversational capabilities.
Placer.ai raised at a $1.5 billion valuation for its AI-based, location-specific market research platform.

TheSequence

Discussion about this post

Ready for more?