Diplomacy: The AI Benchmark that Gets Us Closer to the Turing Test

Dec 11, 2022

A few days ago, we discussed the release of CICERO, a language model created by Meta AI that was able to master the complex game of Diplomacy. Last week, DeepMind published a paper oin the Nature journal proposing a technique for cooperation of AI agents in Diplomacy. Little by little, Diplomacy is becoming one of the most interesting benchmarks for reasoning capabilities in large language models.

What makes Diplomacy so fascinating is that the game requires players to negotiate, form-betray alliances, cooperate and compete in an immensely large, seven-player action space. Differently from other game environments, Diplomacy does not rely just on moves on the board but on language interactions between the players. Computational approaches to solve Diplomacy have been tries since the 1980s but the language understanding capabilities were simply not available. Just like chess, Go and video games proved to be a fertile ground for AI renaissance of the last decade, games like Diplomacy are going to set a benchmark for a new generation of models that can collaborate with humans in really complex language tasks.

A fascinating way to think about Diplomacy is as a benchmark that includes some of the key challenges of the theoretical Turing test. This is highly debatable as the Turing test is more focused on imitating human behavior than anything else. However, complex negotiation and dialog engagement is definitely a key part of it. From that perspective, solving Diplomacy is certainly a step in the right direction.

For now, Meta AI and DeepMind are off to the races with Diplomacy models.

🔺🔻TheSequence Scope – our Sunday edition with the industry’s development overview – is free. To receive high-quality content about the most relevant developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻

🗓 Next week in TheSequence Edge:

Edge#251: Our series about ML interpretability explores the concept of global model-agnotistic interpretability methods.

Edge#252: We discussed DreamFusion, Google’s new text-to-3D generative model.

🔎 ML Research

Another Diplomacy AI Agent

DeepMind published a paper detailing an AI agent that was able to cooperate, negotiate and master the Diplomacy board game. This comes days after Meta AI unveiled CICERO, another AI agent that achieve top human performance in Diplomacy —> Read more.

Data Scarcity and Generative AI

Researchers from MIT published a fascinating paper highlighting the challenges of data scarcity to pretrain large language models —> Read more.

The AlphaCode Paper

DeepMind published the official paper behind AlphaCode, its agent that can solve competitive programming tasks —> Read more.

Evaluating Input Saliency

Google Brain published a paper proposing a method to evaluate input salience methods —> Read more.

Dexterity Training for a Robot Hand

NVIDIA Research published a paper detailing DeXtreme , a technique used to tech dexterity to a robot hand —> Read more.

🤖 Cool AI Tech Releases

ML for Sheets

Google released Simple ML for Sheets, a Google Sheets extensions that allows the use of TensorFlow models —> Read more.

Building Recommender Systems with TensorFlow

TensorFlow published a dedicated page with resources dedicated to build recommender systems —> Read more.

OpenVINO-Torch-ORT Integration

Microsoft and Intel open sourced an integration of OpenVINO and Torch-ORT to build faster inference models in PyTorch —> Read more.

🛠 Real World ML

Summarizing Slack Content

Salesforce Research details the approach used to summarize the content of Slack channels using generative AI —> Read more.

💸 Money in AI

Runway ML raised $50 million to expand its generative AI platform for video editing.
Twelve Labs raised $12 million to develop models that understand contextual information in videos.
Israeli AI startup NeuReality raised $35 million series A to continue working on a high performance AI inference chip.
Enterprise AI startup Protopia AI raised $6 million to expand its solution to derive insights from enterprise data sources while maintaining high levels of privacy.
Continuing the generative AI funding frenzy, SellScale announced that it raised $3.4 million to enable NLP capabilities for sales and marketing teams.
Gaia AI raised $3 million to apply AI to help with forest protection and management.
Pixel AI raised $1 million to use AI to help retailers improve their search experiences.
Akros Technologies raised $2.3 million for applying cutting edge deep learning techniques to asset management.

TheSequence