about the current state of the game that is not shared with their opponents. Therefore, a player
has to act under uncertainty, but they can also leverage that by taking advantage over their own
opponent’s uncertainty. In poker, each player is being dealt private cards (called a hand) and
wagers on having the strongest hand by possibly bluffing, calling opponent bets or folding to
give up their hand. Poker captures the challenge of inferring the decisions of the opponent
without letting them know our intentions. Compared to perfect information games, where all
available information needed to infer a strategy is always present on the board, poker is a game
that requires more complex reasoning. For example, the first player to act does not know which
cards the other player has been dealt and therefore has to behave the same regardless of his op-
ponent’s private information. As the game progresses, more private information of each player
is revealed through their actions and being signified in each others beliefs. The optimal deci-
sion in a particular state will, therefore, depend upon the probability distribution over private
information that each player holds over the other. This is akin to recursive reasoning and algo-
rithms which tackle the game of poker have to handle information asymmetry. This is the key
reason why algorithms that are competitive in games of perfect information will adapt poorly
to games of imperfect information.
For John Von Neumann, one of the founders of modern game theory, poker was a com-
pelling representation of how he viewed games in real life, more than just a well-defined form
of computation, but with tactics of bluffing and deception, and without precise solutions [1].
One particular challenge for him was that probability theory alone is not enough to play poker
competitively, which inspired his desire to formalize the idea of ”bluffing” [6]. Such an idea is
captured in the following way: in order to construct an effective strategy in poker the amount
currently in the pot, as well as the beliefs of both players need to be taken into consideration at
every decision point.
1.2 The Game
Heads-up no-limit Texas Hold’em (HUNL) is a two-player version of the game of poker, which
has become a primary benchmark for imperfect-information games [13]. In HUNL each player
is being dealt face-down two cards, called the hand, from a standard 52-card deck. The first
round is called the pre-flop, and each player starts with a stack of a fixed amount of chips.
The game proceeds with a few rounds of betting as additional cards are dealt face-up in three
subsequent rounds: the flop, turn and river. Both players are wagering on having the strongest
hand at the end of the game. The strength of a hand depends on matching the cards on the table
with the ones held privately. There is no limit placed on the size of the bets, which radically
increases the number of possible actions in each decision point. Instead of being restricted to
a few betting amounts, players are allowed to bet as much as they want and can even go all-in,
wagering every remaining chip in the bet. At the end of the rounds, unless one player folds their
hand and forfeits their wagering amount from the pot to their opponent, there is a showdown,
and the strongest hand wins the amount in the pot. Players are then being dealt another set of
hands, and the game is repeated. The goal of each player is to win as many chips as possible
from the other player over a certain amount of hands played.
The games being investigated in this project are simplified versions of HUNL that retain the
same level of mechanical complexity, with a key difference being a reduction in the number
of rounds and the size of the deck of cards being used. The standard performance measure
has been standardized to milli-big-blinds per game, or mbb/g, a unit agnostic to the amount of
2