Instructions: To add a question/comment to a specific line, equation, table or graph simply click on it.
Click on the annotations on the left side of the paper to read and reply to the questions and comments.
This paper was published in the late nineties. Right before [Deep B...
### MiniMax and Alpha-Beta pruning #### MiniMax MiniMax is a de...
### Chess as a search problem You can think of chess as a decision...
#### How big is the chess game tree? On average, the tree has a br...
In the 1970s Simon and Chase performed a number of very interesting...
- [Programming a Computer for Playing Chess by Claude E. Shannon (1...
A *quiescent position / Turing dead position* is a position where t...
The Elo rating system is a method for calculating the relative skil...
![Computer chess evolution](
A transposition table is a set of recently-visited positions in the...
An end game database is a simply a database containing an array of ...
In iterative deepening the program starts with a one ply search, th...
They are referring to the 1996 game between Kasparov and Deep Blue....
The Historical Development of Computer Chess and its
Impact on Artificial Intelligence
David Heath and Derek Allum
Faculty of Science and Computing,
University of Luton,
Park Square,
Luton LU1 3JU
United Kingdom
In this paper we review the historical
development of computer chess and discuss its
impact on the concept of intelligence. With the
advent of electronic computers after the Second
World War, interest in computer chess was
stimulated by the seminal papers of Shannon
(1950) and Turing (1953). The influential paper
of Shannon introduced the classification of chess
playing programs into either type A (brute force)
or type B (selective). Turing’s paper (1953)
highlighted the importance of only evaluating
’dead positions’ which have no outstanding
captures. The brute force search method is the
most popular approach to solving the chess
problem today. Search enhancements and
pruning techniques developed since that era have
ensured the continuing popularity of the type A
method. Alpha-beta pruning remains a standard
technique. Other important developments are
surveyed. A popular benchmark test for
determining intelligence is the Turing test. In the
case of a computer program playing chess the
moves are generated algorithmically using rules
that have been programmed into the software by
a human mind. A key question in the artificial
intelligence debate is to what extent computer
bytes aided by an arithmetic processing unit can
be claimed to ’think’.
With the advent of computers after the end of the
Second World War, interest in the development
of chess playing programs was stimulated by two
seminal papers in this area. The paper by
Shannon (1950) remains even to this day to be
central importance while the paper by Turing
(I 953) is equally influential.
The minimax algorithm was first applied in a
computer chess context in the landmark paper of
Shannon. He also introduced the classification of
chess playing programs into either type A or B.
Type A are those that search by ’brute force’
alone, while type B programs try and use some
considerable selectivity in deciding which
branches of the game tree require searching.
Alpha-beta pruning was first formulated by
McCarthy at the Dartmouth Summer Research
Conference on Artificial Intelligence in 1956.
However, at this stage no formal specification of
it was given, but it was implemented in game
playing programs of the late 1950s. Papers by
Knuth and Moore (1975) and Newborn (1977)
have analysed the efficiency of the method and it
has been proved that the algorithm returns
exactly the same move as that obtained by full
minimaxing or, alternatively, a move of the same
The success of type A ’brute force’ programs
using exhaustive search, minimaxing and alpha-
beta pruning, transposition tables and other
search enhancements has had the unfortunate
effect of minimising interest in the development
of type B programs. The work of Simon and
Chase (1973) established that most humans only
consider a handful of plausible moves. The
power and ability of the chess grandmaster
resides in his ability to select the correct subset
of moves to examine.
In contrast the brute force programs search the
entire spectrum of initial moves diverging from
some given position referred to as the root.
These initial moves fan out, generating a game
tree which grows exponentially with depth.
Apart from the work of Botvinnik in recent
years, there has been no significant progress in
developing a type B strategy program which
From: AAAI Technical Report WS-97-04. Compilation copyright © 1997, AAAI ( All rights reserved.
would reduce the initial span of the search tree.
The successful development of algorithms that
introduced selectivity into the search engine, so
that the program followed similar lines of
thought to those of a chess grandmaster, would
considerably reduce the amount of searching
Turing’s paper (1953) highlighted the
importance of only evaluating ’dead positions’
which have no outstanding captures. It is from
this paper the term ’Turing dead’ is taken. Most
chess programs search to a quiescent position
(Turing dead) and then evaluate the position as
function of the material balance and other
features. This evaluation function encapsulates
chess-specific knowledge typically relating to
pawn structures and king safety.
The historical evolution of computer chess
programming techniques and knowledge can be
conveniently discussed in three broad eras. Each
era is characterised by its own particular
developments, some of which can be directly
linked to increased processor power, the
availability of new hardware devices and others
to algorithmic advances.
The boundary between each era is not always
precise as it is sometimes not easy to draw a
clear dividing line across a time continuum. Any
such process is artificial. However, these broad
historical eras are (Donskoy and Schaeffer
(i) 1st era (1950 - c1975)
(ii) 2ndera(c1975-c1985)
(iii) 3rd era (c1985 onwards)
The first pioneering era as stated above runs
from 1950-c1975. Here there is a definite point
at which this history commences, marked by the
publication of Shannon’s paper and the advent of
electronic computers. These computers, although
originally regarded as revolutionary and
powerful, had but a fraction of the computing
power of the current generation of
microprocessors. Indeed hardware limitations
characterise the first era of computer chess
development, requiring highly selective
techniques in order to produce moves in an
acceptable time. The earliest programs were,
therefore, Shannon type B.
The first significant chess playing program
was by Bernstein (1957) and ran on an IBM 704
computer, capable of performing approximately
42,000 operations per second. This was not a
’brute force’ program as it only selected the best
seven moves for consideration using heuristics
based on chess lore. Compared to the
sophisticated brute force programs of today
which generate the full span of moves at the
root, this is a very limited range of moves.
The Bernstein program was built around the
strategy of working to a plan. As it was
incapable of doing a full width search due to
time considerations, it selected its quota of seven
moves for deeper analysis by seeking the answer
to eight questions. Once the moves were
selected, they were analysed to a search depth of
4 ply. The potential replies by the opponent were
also selected on the basis of seeking answers to
the same set of questions.
Bernstein’s program played at a very
elementary level and the first program to attain
any recognisable standard of play was that of
Greenblatt (1968). For a number of years this
remained the most proficient chess program and
played at an Elo strength of approximately 1500.
It had carefully chosen quiesence rules to aid
tactical strength and was also the first program to
use transposition tables to reduce the search
space. However, the Greenblatt program also
used an initial selection process to minimize the
size of the game-tree as the computing hardware
of that era was incapable of achieving the
computing speeds of today. Again this program,
because of its selectivity at the root node, falls
into the first era.
The first program to achieve full width search
and make ’brute force’ appear a viable
possibility was Chess 4.5. This program was
developed for entry into the ACM 1973
Computer Chess contest by Slate and Atkin,
using the experience they had gained in
programming earlier selective search chess
programs. The techniques used by Slate and
Atkin (1977) are still in use today although they
have been refined and improved over the years.
These standard techniques initiated what may be
classed as the second technology era giving rise
to programs typically searching to a fixed depth
as fast as possible and then resolving the horizon
problem by extending checks at the cutoff depth
and considering captures only in a quiescence
The second era of computer chess also saw
more emphasis placed on the development of
dedicated chess hardware. Typical of such
developments was Ken Thompson’s chess
machine Belle which won the ACM North
American Computer Chess Championship in
1978. The special purpose hardware increased
the speed of Belle enabling it to analyse 30
million positions in 3 minutes and search
exhaustively to 8 or 9 ply in middlegame
positions. It also had an extensive opening book.
Belle won the 3rd World Computer Chess
Championship in 1983, achieving a rating of
over 2200 in 1980 and was the first program to
receive a Master rating. It was for Belle that
Thompson devised his first end game database,
solving the KQKR problem and hence partially
removing the perceived weakness at that time of
computers in endgame positions.
In 1975 Hyatt commenced work on Blitz
which was then entered in the ACM 1976 North
American Computer Chess Championship.
Initially Blitz was selective and relied on a local
evaluation function to discard moves. However,
the availability of the world’s fastest
supercomputer, the Cray, enabled the program
appropriately renamed Cray Blitz, to use brute
force and in 1981 it was searching approximately
3000 nodes per second and consistently
achieving six ply searches. This rate of analysis
was improved by the use of assembly language
and the availability of the Cray XMP computer
with multiprocessing facilities, allowing 20,000 -
30,000 nodes to be searched per second in 1983.
The third stage, aptly named algorithmic by
Donskoy and Schaeffer (1990), extends onwards
from the mid 1980s to the current time. This has
seen some considerable activity in the refinement
of the basic tree searching algorithms used (see
later). It has also seen the emergence of personal
computers on a scale originally unenvisaged.
The widespread practice of incorporating vast
opening books and more cd-roms for end games.
This current phase has been most fruitful and
seen a considerable increase in the playing
strength of chess programs to the point where the
top commercial software programs are generally
recognised as being superior to humans at speed
chess and in sharp tactical positions. Under strict
tournament conditions the most highly rated
player of all time Kasparov has now lost a game,
but not the match, against Deep Blue.
Move Ordering Techniques
The previous section has outlined the historical
development of computer chess. The transition
to Shannon B type programs to Shannon A is not
solely attributable to increased computing power.
It also partly arose out of increased
understanding of the alpha-beta algorithm which
was subjected to deep analysis by Knuth and
Moore (1975). Other techniques for increasing
the efficiency of the search and pruning
mechanisms also became more prominent at the
beginning of the second era.
Producing a cutoff as soon as possible with the
alpha-beta algorithm considerably reduces the
size of the search tree. Consequently, move
ordering is an important aspect of achieving
maximum speed since, if we know the best move
in a certain situation producing it early rather
than late, will have beneficial results. The worst
case is where the moves are examined in such an
order as to produce no cutoffs, generating the
maximum number of nodes to be analysed. This
is the maximal tree which grows exponentially
with depth, d, so that the number of nodes
examined assuming a uniform branching factor,
b, is given by b
The best situation is where the
moves are all well-ordered and provide
immediate cutoffs, producing the minimal tree
which, although it also grows exponentially with
depth, is very much reduced in size by the
cutoffs. In between these two extremes we have
game trees generated by chess programs which,
initially are unordered, but become progressivley
more ordered as the depth of search increases.
Algorithmic developments and various
heuristics are moving the programs closer to the
minimal tree. This progressive improvement in
reaching closer and closer to the minimal tree is
borne out by Belle, which it is estimated came
within a factor of 2.2 of the minimal tree,
Phoenix within a factor of 1.4 and Zugzwang
within an impressive factor of 1.2 (Plaat 1996).
Iterative deepening is a useful device for
allowing moves to be re-ordered prior to the
depth being increased as well as providing a
method for limiting search depth in response to
time constraints. Originally