The Sequence Chat: Oren Etzioni – Allen AI, About Advancing Research in Foundation Models
An AI legend discusses cutting edge research in foundation models.
Dr. Oren Etzioni is also a Venture Partner at the Madrona Venture Group, and a Technical Director at the AI2 Incubator. He was the Founding CEO of the Allen Institute for AI. His awards include AAAI Fellow and Seattle’s Geek of the Year. He founded several startups including Farecast (acquired by Microsoft). Etzioni has written over 200 technical papers, garnering several awards including the ACL Test of Time Award in 2022. He has also authored commentary for The New York Times, Harvard Business Review, and Nature.
Quick bio
This is your second interview at The Sequence. Please tell us a bit about yourself. Your background, current role and how did you get started in AI?
I was a professor for most of my career focused on AI, NLP, and Web Search. I launched The Allen Institute of AI (AI2) in 2014 for the late Paul Allen and it’s grown to 250+ and over $100M in annual funding. I’ve always been fascinated by startups, having launched several AI-based startups over the years. AI2 has also created and spun out an incubator that is approaching $1B in the total valuation of startup financing rounds and acquisitions.
🛠 AI Work
The Allen Institute for AI (AI2) is widely recognized as one of the top research labs in foundation models. Could you please tell us about some of the recent research you have been working on in this area?
Recently the PRIOR team at AI2 released Unified-IO, the first neural model to perform a large and diverse set of AI tasks spanning classical computer vision, image synthesis, vision-and-language, and natural language processing (NLP). We are continuing work in this area and will have a new version of this model available in the near future. We are also investing in generative language models with our new initiative AI2 OLMo, a uniquely open language model intended to benefit the research community by providing access and education around all aspects of model creation.
AI2 recently published work on methods to simulate fast and slow thinking with LLMs in order to solve complex tasks. Could you please provide more details on the inspirations and ideas behind this work?
This work out of AI2’s Mosaic common sense AI team was inspired by the dual process theory of human cognition featured in the well-known book Thinking, Fast and Slow by Daniel Kahneman. This theory proposes two distinct human thinking systems: one characterized by rapid and intuitive thought, and another that emphasizes analytical and deliberate reasoning. Our goal is ultimately to be able to tackle intricate real-world problems using AI models in a much more cost-effective manner than is currently possible, so by building a model framework that could integrate both of these approaches, we thought we might be able to optimize an AI agent’s potential for planning complex interactive tasks while minimizing the cost of its reasoning.
Many tasks do not require a deliberate and detailed analysis to perform successfully, so by leveraging a “fast”-thinking approach when appropriate, we thought we could improve the model’s speed and performance overall. We called the framework we designed “SwiftSage.” The “Swift” module is an encoder-decoder based language model designed to quickly process short-term memory content such as previous actions, current observations, and the environment state, which simulates the fast, intuitive thinking characteristics found in the “Fast” mode of thinking. The second module, “Sage,” represents the deliberate process of the second mode of thinking by harnessing the power of large language models (LLMs) like GPT-4. A heuristic algorithm plays a crucial role in the system by determining when our framework should activate or deactivate the Sage module. Our intuitions about this approach were correct; thanks to its dual-system design for fast and slow thinking, SwiftSage dramatically reduces the number of tokens necessary for each action in LLM inference, making it more cost-effective and efficient than the next best system.
Improving the openness and transparency of foundation models is a regular area of interest for AI2. In line with this, you recently unveiled the Open Language Model (OLMo) project. Could you please elaborate on the goals and characteristics of this project?
AI2 OLMo is a language model currently under development at AI2 that is being deliberately designed to support research by providing access to every element of the system we create, from data to training and evaluation code to model weights. We recognize a real dearth of accessible, understandable language models which is holding the AI research community back from understanding and advancing this critical new technology. The most powerful language models today are released by for-profit organizations and with limited insight into the data and methods used to create them; we aim to change that with OLMo. Our goal is to democratize access to systems like these and advance their development and safety for everyone.
Reasoning is considered one of the new frontiers for large language models (LLMs). Could you share some of the new techniques and projects that you are excited about in this area?
While the performance of modern LLMs is stunning, it is hard to tell how they arrive at their answers, and if their internal reasoning even makes sense. (There have been numerous cases where LLMs' poor reasoning and hallucinations led people into trouble). We have been developing techniques (e.g., Entailer, Reflex) for uncovering the LLMs' "beliefs", and how its answers follow from them via systematic chains of reasoning, which can then be viewed by a user - in other words, providing users a view of the "mental model" that the LLM has about a current problem. This allows us to spot and reject answers derived from faulty chains of reasoning or faulty beliefs, helping engender trust in the model's answers. And it paves the way for users to correct any erroneous LLM beliefs that were uncovered, as users can now see relevant model beliefs, allowing them to teach the system when it goes wrong and improve over time.
In addition, modern LLMs still struggle with complex tasks that require multiple, specialized steps to be chained together, for example mathematics (where different math operations are needed), or other data manipulation queries, e.g., "What do the third letters of the words in 'John Greg Calvin Melville Rhon' spell?" (answer: "hello"). However, while LLMs struggle to answer such questions directly, they are proficient at breaking up complex tasks into smaller tasks - a process called task decomposition. And in many cases, the LLM itself can solve those smaller tasks. We have recently developed a technique called Decomposed Prompting to control this process, so the LLM repeatedly decomposes problems where it is stuck, and solves the simpler pieces when it is not, putting a new class of previously unsolvable problems within reach.
AI2 recently revealed details about Beaker, which appears to be an important component of AI2's internal ML infrastructure. Could you please provide some information about the infrastructure, tools, and processes used to scale ML research and development at AI2?
Beaker was first launched internally at AI2 in 2017 as an experimentation platform for AI2 researchers to run jobs on the cloud and organize their experiments. Beaker made it substantially easier to manage cloud instances and run large-scale jobs over thousands of nodes, as well as to make sense of the large volume of experiments at AI2. With the rise of deep learning, Beaker evolved to primarily support GPU jobs and manage workloads across our dedicated GPU cluster.
As a non-profit, making our work accessible and transparent is part of our mission, and so we’ve also invested in infrastructure to make it easy to develop and host demos of our research. We have released an internal library on top of Material UI to make it easy to create web applications with an AI2 look-and-feel, as well as infrastructure on top of GKE to make it seamless to deploy and update cloud applications.
More recently, we’ve developed the open-source AI2 Tango to organize complex research workflows. Tango replaces messy directories and spreadsheets full of file versions by organizing experiments into discrete steps that can be cached and reused throughout the lifetime of a research project. It also seamlessly integrates with Beaker so researchers can develop workflows locally, but easily run them at scale for the final experiments.
💥 Miscellaneous – a set of rapid-fire questions
What are the next milestones or potential research breakthroughs for the next generation of foundation models? Building software agents that utilize tools and learn on the user’s behalf.
Building software agents that utilize tools and learn on the user’s behalf.
Is creating new scientific advancements the ultimate benchmark for foundation models and AGI?
It would certainly be a huge benefit to humanity as we fight climate change, the next pandemic, and more.
Are there any new techniques that you believe can surpass the transformer as the preferred architecture for modern AI models?
The transformer is a simple method that is highly scalable. I do believe we will see new architectures in the next three years but will be a bit coy in predicting which one is next.
AI2 actively contributes to open-source AI projects. How do you perceive the balance between open source and closed-source/API-based distribution for foundation models? Who emerges as the victor in the end?
I think we will see balance—just as we’ve seen in operating systems between Windows and Linux. There will both open and closed models.