exceeds expert human players in many games, e.g., backgammon (1), checkers (2), chess (3),
Jeopardy! (4), Atari video games (5), and go (6). These successes all involve games with
information symmetry, where all players have identical information about the current state of
the game. This property of perfect information is also at the heart of the algorithms that enabled
these successes, e.g., local search during play (7, 8).
The founder of modern game theory and computing pioneer, von Neumann, envisioned
reasoning in games without perfect information. “Real life is not like that. Real life consists of
bluffing, of little tactics of deception, of asking yourself what is the other man going to think
I mean to do. And that is what games are about in my theory.” (9) One game that fascinated
von Neumann was poker, where players are dealt private cards and take turns making bets or
bluffing on holding the strongest hand, calling opponents’ bets, or folding and giving up on the
hand and the bets already added to the pot. Poker is a game of imperfect information, where
players’ private cards give them asymmetric information about the state of game.
Heads-up no-limit Texas hold’em (HUNL) is a two-player version of poker in which two
cards are initially dealt face-down to each player, and additional cards are dealt face-up in
three subsequent rounds. No limit is placed on the size of the bets although there is an overall
limit to the total amount wagered in each game (10). AI techniques have previously shown
success in the simpler game of heads-up limit Texas hold’em, where all bets are of a fixed size
resulting in just under 10
14
decision points (11). By comparison, computers have exceeded
expert human performance in go (6), a perfect information game with approximately 10
170
decision points (12). The imperfect information game HUNL is comparable in size to go, with
the number of decision points exceeding 10
160
(13).
Imperfect information games require more complex reasoning than similarly sized perfect
information games. The correct decision at a particular moment depends upon the probability
distribution over private information that the opponent holds, which is revealed through their
past actions. However, how our opponent’s actions reveal that information depends upon their
knowledge of our private information and how our actions reveal it. This kind of recursive
reasoning is why one cannot easily reason about game situations in isolation, which is at the
heart of heuristic search methods for perfect information games. Competitive AI approaches
in imperfect information games typically reason about the entire game and produce a complete
strategy prior to play (14–16). Counterfactual regret minimization (CFR) (14,17,18) is one such
technique that uses self-play to do recursive reasoning through adapting its strategy against itself
over successive iterations. If the game is too large to be solved directly, the common response
is to solve a smaller, abstracted game. To play the original game, one translates situations and
actions from the original game to the abstract game.
Although this approach makes it feasible for programs to reason in a game like HUNL, it
does so by squeezing HUNL’s 10
160
situations down to the order of 10
14
abstract situations.
Likely as a result of this loss of information, such programs are behind expert human play. In
2015, the computer program Claudico lost to a team of professional poker players by a margin
of 91 mbb/g (19), which is a “huge margin of victory” (20). Furthermore, it has been recently
shown that abstraction-based programs from the Annual Computer Poker Competition have
2