The Model Solving Geometry Problems at the Level of a Math Olympiad Gold Medalist

DeepMind's AlphaGeometry represents another breakthrough in AI reasoning.

Jan 21, 2024

A futuristic scene depicting an advanced artificial intelligence model, represented as a sleek, metallic humanoid robot, sitting at a desk in an international math olympiad setting. The robot is intently focused on solving complex geometry problems displayed on a digital screen in front of it. Around its neck hangs a shining gold medal, symbolizing its achievement in the competition. The background features other competitors, human and robotic, in a large, modern auditorium filled with mathematical symbols and equations floating in the air, adding to the intellectual ambiance. — Created Using DALL-E

Next Week in The Sequence:

Edge 365: Our series about LLM reasoning continues with the famous ReAct technique including a review of the original paper by Google Research. We also explore Helicone to monitor LLMs.
Edge 366: Reviews COSP and USP: Google Research New Methods to Advance Reasoning in LLMs

You can subscribe below!

📝 Editorial: The Model Solving Geometry Problems at the Level of a Math Olympiad Gold Medalist

A few months ago, the International Mathematical Olympiad announced the AIMO Prize, a $10 million award for an AI model that can achieve a gold medal in an International Math Olympiad (IMO). IMOs are elite high school competitions where the top six students from each participating country must answer six different questions over two days, with a four-hour time limit each day. Some of the most renowned mathematicians of the past few decades have been medalists in IMO competitions. Geometry, an important and one of the hardest aspects of IMO tests, combines visual and mathematical challenges. We might intuitively think that this would be the hardest type of problem for AI models to solve.

Well, not anymore.

Last week, Google DeepMind published a paper unveiling AlphaGeometry, a model capable of solving geometry problems at the level of an IMO gold medalist.

The most interesting aspect of AlphaGeometry is its architecture, which combines a Large Language Model (LLM) with a symbolic model. Neuro-symbolic architectures have long attempted to bridge the gap between the two most established machine learning schools: neural networks and rule-based models. While LLMs excel at identifying patterns in data and reasoning through problems, they struggle with the systematic, multi-step reasoning required in complex geometry problems. Symbolic models, which solve problems using rules, can only operate in very constrained settings. How did AlphaGeometry apply neuro-symbolic models to geometry? The model, based on an LLM and a symbolic rules engine, first uses the symbolic model to attempt a solution. If unsuccessful, the LLM suggests new constructs that open new reasoning paths for the symbolic model. This is an oversimplification, but this is a short editorial after all. 😉

In a benchmark test of 30 IMO problems, AlphaGeometry solved 25 within the standard time limits. This achievement is nothing short of remarkable. Google DeepMind continues to impress in this field. Just a few weeks ago, they unveiled FunSearch, capable of discovering new algorithms in math and computer science. Now, with AlphaGeometry solving IMO-caliber geometry problems, one wonders what could be next?"

🔎 ML Research

AlphaGeometry

Google DeepMind published a paper detailing AlphaGeometry, a model that is able to solve geometry problems at the math olympiad level. The model combines a neural language model and rule-based deduction engine —> Read more.

TrustLLM

Researchers from top universities and tech companies published a comprehensive study of trustworthiness in LLMs. The paper includes a framework that quantifies trustworthiness in LLMs across five different dimensions —> Read more.

LLMs Self-Correcting Mistakes

Google Research published a paper that tests LLMs in mistake findings and correction. The paper also introduces a new benchmark for mistake identification —> Read more.

Training on Easy Data

Researchers from the Allen Institute for AI(AI2) published a paper outlining the thesis that LLMs can perform well in highly specialized takss while training on “easy” data in that domain. By “easy”, AI2 refers to data that is accesible but its enough for the models to generalize —> Read more.

Selective Prediction in LLMs

Google Research published a paper introducing ASPIRE, a framework for improve the confidence of LLM answers. The method is based on a selective prediction technique that assigns a confidence score to each answer that indicates the probability that the answer is correct —> Read more.

SGLang

UC Berkeley published Structured Generation Language(SGLang) for LLM, a technique for faster and more expressive LLM inference. SGLang combines both frontend and backend optimizations that enable the creation of complex LLM programs —> Read more.

🤖 Cool AI Tech Releases

Stable Code 3B

Stability AI open sourced Stable Code 3B, a new coding model that matches the performance of models 2.5x larger —> Read more.

Pinecone Serverless

The leading vector database provider released a new version of its platform with a simpler interface and a 50x cost reduction —> Read more.

DataStax RAG API

DataStax unveiled a new Data API to streamline the development of RAG applications —> Read more.

🛠 Real World ML

GitHub and AI

GitHub published the results of detailed interviews about the productivity impact that its AI tools is having in developers —> Read more.

LinkedIn Gen AI Playbook

LinkedIn shared some of the ideas that its engineering leaders are evaluating to fully leverage the advancements in generative AI —> Read more.

📡AI Radar

AI startups Sakana raised $30 million to focus on smaller and more efficient LLMs.
Rabbit and Perplexity AI announced a partnership to use the Perplexity API as the core q&a engine for r1 devices.
Microsoft introduced Copilot Pro that brings Office integration capabilities and is available to individuals.
Samsung will incorporate Google’s Gemini into the Galaxy S24 Series.
Udacity unveiled a new nanodegree for generative AI.
Digital Ocean unveiled a new offering for NVIDIA H100 GPUs.
OpenAI issued grants for 10 teams in order to implement ideas that should help govern AI systems.
BMW gets into the Humanoid robot space with a partnership with Figure.
AI security startup Vicarius raised $30 million in new funding.
AI graphic design platform Recraft announced a $12 million series A.
Amazon announced a $1 billion fund for industrial innovation with autonomous vehicles as a core element of the investment thesis.
Thomvest Ventures announced a $250 million new fund with AI as one of the core investment theses.
Vertice raised $25 million to use AI to help companies manage software spend.
Databricks partnered with Credo AI to streamline AI compliance.
Meta says that the use of AI tools improved ad campaign returns by 32%.
AI digital notetaking platform Goodnotes acquired Dropthebit to accelerate its AI features.
AI retain security startup, Spot Technologies raised $2 million in new funding.
Briq, a company that applies AI to financial workflows in the construction industry, announced an $8 million extension of its previous financing round.
PatSnap released a CoPilot platform for IP research.

TheSequence

Discussion about this post