Discover more from TheSequence
♣️ Edge#54: Facebook ReBeL That Can Master Poker
Deep dive into an open-source bot that can master poker and other imperfect information games
What’s New in AI, a deep dive into one of the freshest research papers or technology frameworks that are worth your attention. Our goal is to keep you up to date with new developments in AI in a way that complements the concepts we are debating in other editions of our newsletter.
💥 What’s New in AI: Facebook ReBeL is an Open-Source Bot that can Master Poker and Other Imperfect Information Games
Poker has been considered by many the core inspiration for the formalization of game theory. John von Neuman was reportedly an avid poker fan and used many analogies of the card game while creating the foundation of game-theory. With the advent of artificial intelligence (AI), there have been many attempts to master different forms of poker, most of them with very limited results. Last year, researchers from Facebook and Carnegie Mellon University astonished the AI world by unveiling Pluribus, an AI agent that beat elite human professional players in the most popular and widely played poker format in the world: six-player no-limit Texas Hold’em poker. Since then, a question that has haunted AI researchers is whether the skills acquired by models like Pluribus can be used in other imperfect information games. Recently, Facebook again used poker as the inspiration for Recursive Belief-based Learning (ReBeL), a reinforcement learning model that is able to master several imperfect-information games.
The inspiration for ReBeL comes from DeepMind’s AlphaZero. After setting up new records in the Go game with the development of AlphaGo, DeepMind expanded its efforts to other perfect-information games such as Chess, or Shogi. The result was AlphaZero, a reinforcement agent that was able to master all these games from scratch. Of course, recreating the magic of AlphaZero in imperfect-information games like poker entails a different level of complexity.
Games like poker, in which players keep their cards secret, represent a major obstacle for reinforcement learning + search algorithms. Most of these techniques assume that each player’s action has a fixed value regardless of the probability of that action being executed. For instance, in chess, a good move is good regardless of whether it is played or not. Now let’s think about a game like poker in which the players bluff all the time. In many scenarios, the value of a bluff action diminishes the more it’s used, as the opponents can adjust their strategy to it. How could we possibly leverage reinforcement learning + search methods across many imperfect-information games? Here is a more clear example; consider a modified form of Rock-Paper-Scissors in which the winner receives two points (and the loser loses two points) when either player chooses Scissors. In that game, Player 2 will act after Player 1 without observing Player 1’s actions (imperfect information). As the figure below shows, the optimal policy of this game is to choose Rock and Paper with 40% probability, and Scissors with 20%. However, if Player 1 was to conduct a classical look-ahead search, which is common in perfect-information games, it wouldn’t find a path to arrive to the optimal policy.
The challenge with a classic search algorithm of perfect-information games is that it would assume that Player 2 would play the Nash equilibrium policy regardless of what Player 1 does. Given that, in our example, the optimal Player 2’s policy would remain fixed at 40 percent Rock, 40 percent Paper, and 20 percent Scissors, then Player 1 could choose Rock every time and achieve an expected value of 0 (see figure above). However, in the case that Player 1 was to choose Rock, then Player 2 would adjust to always choosing Paper, and the value of Player 1 choosing Rock would drop from 0 to -1. Iterating in that dynamic presents a picture in which most traditional search algorithms result impractical when it comes to imperfect-information games.
With ReBeL, Facebook tries to address these challenges by introducing a very clever transformation in imperfect information environments.
The idea behind ReBeL is SO SIMPLE as it is clever. If AlphaZero showed success with reinforcement learning + search strategies in perfect-information games, then why not transform imperfect-information games to perfect-information equivalents? I know, I know, it sounds too good to be true but let’s look at an example.
Let’s imagine a simplified version of poker, in which a single card is dealt to each player who can then choose between three actions: fold, call or raise. Now consider a variation of this game in which the cards are not dealt to the players directly but, instead, they can be seen only by a third-party referee. Instead of taking an action directly, the players will announce how likely they are to take a specific action given the current hand. The referee will take an action based on the player’s analysis. In terms of strategy, this modified game is identical to the original game with the difference that it contains no private information. Instead, the modified game can be considered a continuous-state perfect-information game.
In the ReBeL research, Facebook refers to the first game as the discrete representation and the second game as the belief representation. The magic of ReBeL relies on representing the environment as a history of belief representations known as the public belief state (PBS). The PBS of our game example is described by a sequence of public observations and 104 probabilities (the probability that each player holds each of the 52 possible private cards); an “action” is described by 156 probabilities (one per discrete action per private card). PBS effectively describes the state of a game not only as a sequence of actions but also as their probability of occurrence. By including the probabilities, PBS can be used to model imperfect information environments as perfect-information ones.
From a strategic standpoint, it is important to notice that the two games previously illustrated are identical. However, from a modeling perspective, there is a major difference in the sense that the second game contains no private information. More specifically, the second game can be considered a continuous-state perfect-information game, as we should assume that all players’ strategies are common knowledge and, therefore, the probability that a player chooses a given action is known to all his opponents.
The idea of converting an imperfect information game into a perfect-information alternative is clever but not necessarily new. The main contribution of Facebook’s ReBeL was to combine those techniques with reinforcement learning and self-play methods in order to produce an autonomous agent that can play several imperfect-information games.
After transforming an imperfect information game into a perfect information alternative, the next step is to use a reinforcement learning search algorithm to find the best set of actions. In a high dimensional environment like poker (156 dimensions), most search algorithms would result incredibly inefficient. To address this challenge, ReBeL introduces a gradient-descent-like algorithm called counterfactual regret minimization (CFR), which enables efficient searches in high dimensional spaces.
Facebook benchmarked ReBeL in different two-player, zero-sum, imperfect-information games, such as heads-up no-limit Texas Hold’em and Liar’s Dice with remarkable results. For instance, in the heads-up no-limit Texas Hold’em, ReBeL was able to outperform prior benchmarks and also beat top human players.
To encourage research in this type of ideas, Facebook open sourced ReBeL’s implementation of the Liar’s Dice game. The ideas behind ReBeL are relatively simple to implement and are likely to inspire a new wave of research in the space of imperfect-information games.
If you care for what we do, help us spread the word:
🧠 The Quiz
Every ten quizzes we reward two random people. Participate! The question is the following:
What is the role of Public Belief States in ReBeL?
Thank you. See you on Sunday 😉
TheSequence is a summary of groundbreaking ML research papers, engaging explanations of ML concepts, and exploration of new ML frameworks and platforms, followed by 65,000+ specialists from top AI labs and companies of the world.
5 minutes of your time, 3 times a week – you will steadily become knowledgeable about everything happening in the AI space.