One Week, 7 Major Foundation Model Releases
Apple, HuggingFace, OpenAI, Mistral, Groq all released innovative models in the same week.
Next Week in The Sequence:
Edge 415: Our series about autonomous agents dives into procedural memory. We review Microsoft’s JARVIS-1 memory-augmented agent adn dive into the Zep framework for memory management in LLMs.
Edge 416: We deep dive into Apple’s amanzing 4M-21 multimodal model.
You can subscribe to The Sequence below:
📝 Editorial: What a Week for Foundation Models
Building high-quality, large-scale foundation models is hard. Just a year ago, it seemed that the foundation model space was going to be highly fragmented, with new models coming to market literally every week. After the high computational and capital realities became obvious, the space seems to have consolidated into a dozen or so relevant models per modality, with a few more in the language space. At the moment, two trends seem to be emerging to catalyze the next generation of foundation models:
Domain Specialization: Models more specialized in horizontal domains such as coding, function calling, math, etc.
Small Models: 500M-10B parameter models that can run inference on commodity hardware, IoT, or mobile devices.
Last week was exceptional in terms of model releases in these areas. Just to list a few:
Mistral released two new models covering areas such as math and coding.
Mistral and NVIDIA also released a new small model optimized for enterprise environments.
OpenAI unveiled a smaller, cheaper version of its flagship GPT-4 model.
Apple open sourced a series of small models that outperform Mistral-7B.
Groq open-sourced a series of 7B models that seem best in class in function calling.
HuggingFace open-sourced a series of small, high-performance LLMs.
As you can see, the releases emphasize the domain specialization and small model trends. Even by the crazy standards of the generative AI market, last week was a remarkable week in terms of model releases.
📽 [Virtual Talk] Supercharge Production AI with Features as Code
On July 24, at 9 AM | 12PM ET, join us to discuss how declarative frameworks are transforming production AI. Sergio Ferragut, Principal Developer Advocate at Tecton, will show how to enhance collaboration, automate feature materialization, and support diverse data types. Discover how to improve feature reusability, eliminate training-serving skew, and simplify complex feature development. He will also cover how these frameworks automate production-ready pipelines, speeding up AI projects and making AI-powered applications more intelligent.
Key topics include:
Seamless collaboration between data scientists and ML engineers
Reuse features and eliminate training-serving skew
Automation of streaming, batch and real-time feature pipelines
🔎 ML Research
Winning the AI Math Olympiad
The teams from Numina and HuggingFace published a detailed blog post about NuminaMath 7B TIR, the model that achieved the first prize in the AI Math Olypimpiad. NuminaMath 7B TIR is based on a combination of an LLM reasoning agent and code generation and the architecture is totally fascinating —> Read more.
Proven-Verifier Games in LLMs
OpenAI published a paper unveiling a prover-verifier game to improve the legibility of LLM outputs. The core idea is to train large models in producing outputs that can be verified by weaker models —> Read more.
LLMs for Spreadsheets
Microsoft Research published a paper detailing SPREADSHEETLLM, an encoding method for manipulating spreadsheets with LLMs. SPREADSHEETLLM includes a multi-step encoding framework that include capabilities such as tructural-anchor-based compression, inverse index translation, and data-format-aware aggregation —> Read more.
Gen AI for Databases
Researchers from MIT, CMU and other AI labs published a paper detailing GenSQL, a generative AI system for databases. GenSQL extends SQL with several probabilistic primitives that automate tasks such as predictions, anomaly detection, guess missing values, fix errors, or synthetic data generation —> Read more.
Qwen2
Alibaba published a research paper diving into Qwen2, a series of languave and multimodal models ranging from 500M to 72B parameters. The Qwen2 family includes different architectures including dense and MoE models and shows strong performance across different benchmarks —> Read more.
Long Video Understanding
Researchers from King Abdullah University of Science and Technology and Harvard University published a paper introducing Godlfish, a method for long form video understanding. Goldfish takes an instruction as input and then gathers the top-k more important video clips relative to that instruction and uses those to generate a response —> Read more.
🤖 Cool AI Tech Releases
GPT-4o Mini
Open AI released a smaller and most cost efficient version of GPT-4o —> Read more.
Apple DCLM
Apple open sourced a new series of small models that seem to outperform some of the best open source alternatives in the market —> Read more.
Llama-3-Groq-Tool-Use
Groq open sourced Llama-3-Groq-Tool-Use, a series of models optimized for function calling —> Read more.
Mathstral
Mistral released Mathstral, a model specialized in math and scientific discovery.
Codestral Mamba
Mistral also released Codestral Mamba, an SSM based model for code generation.
Mistral NeMo
NVIDIA and Mistral collaborated in the release of Mistral NeMo, a 12B parameter LLM optimized for enterprise scenarios —> Read more.
SmolLM
HuggingFace open sourced SmolLM, a series of small, high performance LLMs —> Read more.
Cohere Toolkit
Cohere open sourced new additions such as HTML UI generation or authentication to its Toolkit framework —> Read more.
🛠 Real World AI
Moving ML Fast at Meta
Meta engineering shares some of the best practices for iterating fast in ML engineering —> Read more.
Text to Image at Pinterest
Pinterest discusses some details about Canvas, its text-to-image model —> Read more.
📡AI Radar
Former OpenAI and Tesla AI lead Andrej Karpathy is launching a new AI education startup called Eureka Labs.
Anthropic and Menlo Ventures are partnering for launching a new AI fund.
Salesforce released Einstein Service Agent for customer service scenarios.
Google unveiled new generative AI projects at I/O Bengaluru.
Small language model platform Arcee AI raised a $24 million Series A.
AI services firm Tribe AI raised its first round of funding after years of bootstrapping.
Cohere and Fujitsu announced a strategic alliance.
Echo Chunk raised $1.4 million to build AI puzzle games.
Kindo raised $20.6 million for its AI security platform.
Microsoft announced the general availability of Purview, a data governance solution.
AI healthcare company Huma Therapeutics raised $80 million in new funding.
Briefly Bio raised $1.2 million for a platform that uses AI to reproduce science experiments.