The AI Scientist

A model that can produce novel AI papers plus some really cool papers and tech releases this week.

Aug 18, 2024

Next Week in The Sequence:

Edge 423: We explore the fundamentals of state space models including the fmaous S4 paper. The tech section provides an overview of NVIDIA’s NIM framework.
Edge 424: We dive into the DeepMind’s amazing AlphaProof and AlphaGeometry-2 that achieved silver medal in the latest international math olympiad.

You can subscribe to The Sequence below:

📝 Editorial: The AI Scientist

If you read this newsletter, you know that I firmly believe discovering new science might be the ultimate test for AGI. While we are still far from having AI that can formulate something like the Riemann Hypothesis or the Theory of General Relativity, we have made tremendous progress in proving and validating scientific ideas across disciplines such as mathematics, physics, biology, chemistry, and others.

The reason science presents such a challenging bar for AI is that it involves aspects like long-term planning, creativity, multidisciplinary knowledge, multi-step fact-checking, and many other components that are still in the very early stages of development in generative AI.

However, progress is being made.

This week, the Japanese AI startup Sakana AI, in collaboration with several other AI labs, published a paper detailing The AI Scientist, a framework for open-ended scientific discovery. The AI Scientist is capable of conducting open-ended research, executing experiments, generating code, visualizing results, and even presenting them in full reports. In the initial demonstrations, The AI Scientist made several contributions across different areas of AI research, including diffusion models, transformers, and grokking.

The core ideas behind The AI Scientist resemble models such as DeepMind’s Alpha Geometry, AlphaProof, or the NuminaMath model that recently won first prize in the AI Math Olympiad. These models use an LLM for idea formulation, combined with more symbolic models for experimentation. The biggest challenge with this approach is whether the idea-generation portion will quickly hit its limits. Some of the most groundbreaking scientific discoveries in history seem to involve a component of human ingenuity that doesn’t yet appear to be present in LLMs. However, this path holds great potential for exploring new ideas in scientific research.

For now, The AI Scientist represents an exciting advancement in open-ended scientific research.

🔎 ML Research

The AI Scientist

Researchers from Sakana AI, Oxford, University of British Columbia and several other institutions published a paper unveiling the AI Scientist, a pipeline for open ended scientific research using LLMs. The AI Scientist injects AI in different area of scientific research such as ideation, a literature search, experiment planning, experiment iterations, manuscript writing, and peer reviewing —> Read more.

Imagen 3

Google published the technical report of Imagen 3, their marquee text-to-image model. The paper details the training and evaluation details behind Imagen 3 as well as some of the challenges around safety —> Read more.

Mitigating Hallucinations

Google Research published a paper detailing HALVA, a contrastive tuning method that can mitigate hallucinations in language and image assistants. Like other contrastive learning methods, HALVA generates alternative representations of factual tokens with the objective of boosting the probability of the model identifying the correct token —> Read more.

Your Context is Not an Array

Qualcomm Research published a paper that explores the limitations of transformers. The paper suggest that some of the generalization challenges of transformers are related with the inability to perform random memory access within its context window —> Read more.

Mutual Reasoning in LLMs

Microsoft Research published a paper introducing rStar, a self-play multi reasoning approach that seems to improve reasoning capabilities in small language models. rStar uses a generation-discrimination process to decouple the different steps in the reasoning process —> Read more.

Pretraining vs. Fine Tuning

Researchers from Johns Hopkins University published a paper exploring the relationship between pretraining and fine-tuning in LLMs. The paper explores the diminishing returns of fine-tuning after certain scale —> Read more.

🤖 AI Tech Releases

Grok-2

xAI unveiled a new version of Grok that matches the performance of top open source models —> Read more.

SWE-Bench

OpenAI released a subset of the famous SWE-Bench benchmark with human verification —> Read more.

Claude Prompt Caching

Anthropic unveiled prompt caching capabilities for Claude 3.5 Sonnet and Claude 3 Haiku —> Read more.

Airflow 2.10

Apache Airflow 2.10 arrived with a strong focu on AI workflows —> Read more.

AI Risks Database

MIT open sourced a database of over 700 AI risks across different categories —> Read more.

🛠 Real World AI

Image Animation at Meta

Meta discusses the AI techniques used for image animation at scale —> Read more.

Model Reliability at Salesforce

Salesforce discusses the methods used to ensure AI model reliability and performance in their internal pipelines —> Read more.

📡AI Radar

Fei-Fei Li’s World Labs raised $100 million at a $1 billion valuation.
Decentralized AI startup Sahara AI raised $43 million in new funding.
Snowflake announced its Cortex Analyst solution to power self-service analytics with AI.
AI observaility platform Goodfire raised $7 million in new funding.
AI-focused VC Radical Ventures raised a new $800 million fund.
Raunway Gen-3 Turbo showcased very impressive capabilities.
AI-based stock evaluator TipRanks was acquired for $200 million.
Real Estate AI company EliseAI raised $75 million at $1 billion valuation.
Encord, an AI data development platform, raised a $30 million Series B.
RAG as a service platform Ragie raised $5.5 million.
CodeRabbit raised $16 million for using AI to automate code reviews.
AI-based scientific research platform Consensus raised an $11.5 million Series A.

TheSequence

Discussion about this post

Ready for more?