🎙Oren Etzioni/CEO of Allen Institute for AI (AI2) on advancing AI research for the common good

It’s so inspiring to learn from practitioners. Getting to know the experience gained by researchers, engineers, and entrepreneurs doing real ML work is a great source of insights and inspiration. Please share this interview if you find it enriching. No subscription is needed.


You can also leave a comment or ask a question in the comment section below.

*Thanks to my colleagues, Michael Schmitz and Swabha Swayamdipta, for contributing to the answers below. 

👤 Quick bio / Oren Etzioni

Tell us a bit about yourself. Your background, current role and how did you get started in machine learning? 

Oren Etzioni (OE): I’m the CEO of the Allen Institute for AI, previously a Professor at the University of Washington, and the founder of several AI startups. I completed my first machine learning project back in 1988 as part of my master’s thesis at Carnegie Mellon University.     

🛠 ML Work  

You are the CEO of one of the most important, and yet not very well known, organizations advancing AI research and development. Can you tell us a bit about the mission and current work of the Allen Institute for AI (AI2)? 

OE: Our mission is to contribute to humanity through high-impact research and engineering. We are relatively unique because we are a non-profit that runs the gamut from foundational research to applied research to user-facing products. In the AI2 Incubator, we help entrepreneurs create AI-first startups1; our founders and technologists work together to develop product ideas that leverage AI to solve real problems and enhance people’s lives. We support the most promising ideas with funding, business development guidance, and input from our AI experts.

A few key examples of successful startups we’ve incubated include sophisticated novel voice generation by our graduated startup WellSaid, and an AI-backed smart repository for notoriously complex, voluminous legal contracts and agreements created by graduated startup Lexion.  

Allen NLP has evolved into one of the most advanced NLP frameworks in the deep learning space specializing in cutting edge NLP models. What are some of the key capabilities of Allen NLP and, at the current pace of innovation in the NLP space, how challenging it is to deliver state-of-the-art NLP models to developers? 

OE: AllenNLP's biggest advantage for practitioners is the reference implementations of state-of-the-art models from recent publications, as well as the ability to modify these model architectures to be retrained and tuned for particular applications. While many NLP systems have off-the-shelf models, more often than not they are based on outdated approaches. Therefore it's difficult to experiment with model architecture to get the best performance for your application. In AllenNLP, models are defined by a high-level configuration language, which not only communicates the model architecture but also makes it easy to experiment with changes, such as using a different pre-trained transformer as a contextual embedding or swapping out a particular layer for something else. 

More recently, large language models have demonstrated huge advantages across many applications. Training a model with billions of parameters isn't easy, but with AllenNLP we're adding tools so anyone can train or fine-tune large models across several GPUs. Before it would have been hard for someone to fine-tune T5-11B, but once we release this new feature in the next few weeks, there will be a clear solution to solving these types of problems at scale. 

One of the things I find intriguing about AI2 is that it combines advanced AI research with practical implementations. In your opinion, how big is the gap between AI research and engineering and what are good practices to bridge it?  

OE: We have seen an unusually rapid transition from research to practice (just think of products and startups being built around GPT-3 and similar models). Research needs to have a healthy respect for simplicity – many research models are overly complex for the sake of impressing reviewers or for garnering very minor accuracy gains – and for scalability to large datasets. 

Engineers need to view research results with healthy skepticism and ask: Are these results robust? Is the model trained on data that reflects the data it will see in practice? What are the ethical implications of the work?

You have been very involved in the area of AI fairness and bias. Can you share some practical advice for data scientists looking to improve the fairness of their machine learning solutions? Any new interesting research in this area? 

OE: Undesirable biases can be present all along the AI pipeline – in the data, in the algorithms/machine learning models, and also in the evaluation. Current approaches to improve fairness in AI focus on individual components of the AI pipeline. However, widely known failures of AI, such as algorithmic racial profiling in Chicago policing or even Google captioning images of Black people with racial slurs can rarely be amended by focusing simply on the data or the algorithms.  

When it comes to building fair datasets, practitioners must take care to identify the target users of the end application or those who are most affected by model decisions or errors. 

Practitioners can achieve this by creating data statements or datasheets for datasets, which explicitly outline data selection, annotation, and curation processes, as well as the motivation for dataset creation. Biases in existing datasets can be discovered via the creation of visual maps for datasets, and reduced via stratified sampling, reweighting techniques keeping in mind protected attributes, or via adversarial filtering of dataset biases. Several of these have been developed at the Allen Institute for AI. 

In addition to addressing biases in data, fairness can be enforced in AI algorithms and models. A fair algorithm treats the general population statistically similarly to a protected class. The adversarial learning approach, where the adversary tries to learn a protected attribute in the data and the overall algorithm aims to defeat this adversary, has become a popular algorithmic approach to improve fairness. Such approaches can be used to debias word embeddings or directly in predictive classifiers where the adversary is part of an ensemble of models. Moreover, documentation approaches for models (similar to datasheets) can help practitioners follow best practices and promote fairness. 

Last, but not least, the evaluation of AI models and algorithms needs to be multifaceted. It is no longer sufficient to simply report the accuracy of a method. We also need to evaluate if the algorithm performs similarly on different populations. We have developed such evaluations for gender biases in machine translation as well as racial biases in hate speech detection.  

What are some of the biggest milestones that machine learning should achieve in the next 3-5 years and how do they translate into the AI2 roadmap?

OE: There are many milestones, I will highlight two that AI2 is actively working on: 

  1. An emphasis on reducing the carbon footprint of training and utilizing ML models. See: https://cacm.acm.org/magazines/2020/12/248800-green-ai/fulltext

  2. Building general-purpose systems capable of zero-shot, transfer learning, and more sophisticated behavior than the typical “AI savants” that are trained on a task-specific basis. A great example, in the computer vision domain, is here: https://prior.allenai.org/projects/gpv

Finally, I would add that our ML systems are largely devoid of common sense, and the MOSAIC project at AI2 is working to address that long-term challenge. 

Do you like TheSequence? Consider subscribing to support our mission to simplify AI education, one newsletter at a time. You can also give TheSequence as a gift.

💥 Miscellaneous – a set of rapid-fire questions  

Is the Turing Test still relevant? Any clever alternatives ?

OE: As John Markoff quipped, the Turing Test is a test of human gullibility. We need a strong alternative but don’t have one yet.    

Favorite math paradox? 

OE: The Liar's Paradox: this sentence is a lie. 

Any book you would recommend to aspiring data scientists?  

OE: The information you want isn’t in any book.  

Is P equals NP?

OE: Of course not. However, the proof is too long to provide here.



A note from the editor: One thing about AI2 Incubator really surprised us. To apply, you don’t even need to have an idea! Here is a quote from their website: “We are looking for smart people who want to build the companies of tomorrow using AI.” Sounds like a great opportunity.