Facebook announced that its AI bot Pluribus, developed in collaboration with Carnegie Mellon University beat human experts in six-player no-limit Hold’em, the most widely played poker format in the world. This is the first time an AI bot has beaten top human players in a complex game with more than two players or two teams.
Pluribus succeeds because it can very efficiently handle the challenges of a game with both hidden information and more than two players. It uses self-play to teach itself how to win, with no examples or guidance on strategy. It also uses far fewer computing resources than the bots that have defeated humans in other games.
Poker embodies the challenge of hidden information because each player has information (their cards) that the others lack. A successful poker AI must reason about this hidden information and carefully balance its strategy to remain unpredictable while still picking good actions. For example, bluffing occasionally can be effective, but always bluffing would be too predictable and would likely result in losing a lot of money.
It is therefore necessary to carefully balance the probability with which one’s bluffs with the probability that one bets with strong hands. In other words, the value of an action in an imperfect-information game is dependent on the probability with which it is chosen and on the probability with which other actions are chosen.
In contrast, in perfect-information games, players need not worry about balancing the probabilities of actions; a good move in chess is good regardless of the probability with which it is chosen. And though it is possible to solve a chess endgame in isolation without understanding the game’s opening strategies such as the Sicilian Defense or Queen’s Gambit, it is impossible to disentangle the optimal strategy of a specific poker situation from the overall strategy of poker.
The core of Pluribus’s strategy was computed via self-play, in which the AI plays against copies of itself, without any human gameplay data used as input. The AI starts from scratch by playing randomly and gradually improves as it determines which actions, and which probability distribution over those actions, lead to better outcomes against earlier versions of its strategy. The version of self-play used in Pluribus is an improved variant of the iterative Monte Carlo CFR (MCCFR) algorithm.