I
t is an honor and a pleasure to
accept the Alan Turing
Award. My own work has
been on computer systems,
and that will be my theme.
The essence of systems is that
they are integrating efforts, requir-
ing broad knowledge of the prob-
lem area to be addressed, and the
detailed knowledge required is
rarely held by one person. Thus the
work of systems is usually done by
teams. Hence I am accepting this
award on behalf of the many with
whom I have worked as much as for
myself. It is not practical to name all
the individuals who contributed.
Nevertheless, I would like to give
special mention to Marjorie Dag-
gett and Bob Daley for their parts
in the birth of CTSS and to Bob
Fano and the late Ted Glaser for
their critical contributions to the
development of the Multics System.
Let me turn now to the title of
this talk: "On Building Systems
That Will Fail." Of course the title I
chose was a teaser. I considered and
discarded some alternate titles: "On
Building Messy Systems," but it
seemed too frivolous and suggests
there is no systematic approach.
"On Mastering System Complexity"
sounded like I have all the answers.
The title that came closest, "On
Building Systems that are likely to
have Failures" did not have the
nuance of inevitability that I
wanted to suggest.
What I am really trying to ad-
dress is the class of systems that for
want of a better phrase, I will call
"ambitious systems." It almost goes
without saying that ambitious sys-
tems never quite work as expected.
Things usually go wrong--
sometimes in dramatic ways. And
this leads me to my main thesis,
namely, that the question to ask
when designing such systems is not:
"/f something will go wrong, but
when
it will go wrong?"
Some Examples
Now, ambitious systems that fail are
really much more common than we
may realize. In fact in some circum-
stances we strive for them, revelling
in the excitement of the unex-
pected. For example, let me remind
you of our national sport of foot-
ball. The whole object of the game
is for each team to play at the limit
of its abilities. Besides the sheer
physical skill required, one has the
strategic intricacies, the ability to
audibilize, and the quickness to
react to the unexpected--all a deep
part of the game. Of course, occa-
sionally one team approaches per-
fection, all the plays work, and the
game becomes dull.
Another example of a system
that is too ambitious for perfection
is military warfare. The same ele-
ments are there with opposing sides
having to constantly improvise and
deal with the unexpected. In fact
we get from the military that won-
derful acronym, SNAFU, which is
politely translated as "situation nor-
mal, all fouled up." And if any of
you are still doubtful, consider how
rapidly the phrases "precision
bombing" and "surgical strikes" are
replaced by "the fog of war" and
"casualties from friendly fire" as
soon as hostilities begin.
On a somewhat more whimsical
note, let me offer driving in Boston
as an example of systems that
will
fail. Automobile traffic is an excel-
lent case of distributed control with
a common set of protocols called
traffic regulations. The Boston area
is notorious for the free interpreta-
tions drivers make of these pesky
regulations, and perhaps the epit-
ome of it occurs in the arena of the
traffic rotary. A case can be made
for rotaries. They are efficient.
There is no need to wait for slug-
gish traffic signals. They are direct.
And they offer great opportunities
for creative improvisation, thereby
adding zest to the sport of driving.
One of the most effective strate-
gies is for a driver approaching a
rotary to rigidly fix his or her head,
staring forward, of course, secretly
using peripheral vision to the limit.
It is even more effective if the
driver on entering the rotary,
speeds up, and some drivers embel-
lish this last step by adopting a look
of maniacal glee. The effect is, of
course, one of intimidation, and a
pecking order quickly develops.
The only reason there are not
more accidents is that most drivers
have a second component to the
strategy, namely, they assume
everyone else may be crazy--they
are often correct--and every driver
is really prepared to stop with
inches to spare. Again we see an
example of a system where ambi-
tious tactics and prudent caution
lead to an effective solution.
So far, the examples I have given
may suggest that failures of ambi-
tious systems come from the human
element and that at least the techni-
cal parts of the system can be built
correctly. In particular, turning to
computer systems, it is only a mat-
ter of getting the code debugged.
Some assume rigorous testing will
do the job. Some put their hopes in
proving program correctness. But
unfortunately, there are many cases
for which none of these techniques
will always work [1]. Let me offer a
modest example illustrated in Fig-
ure 1.
Consider the case of an elaborate
numerical calculation with a vari-
able, f, representing some physical
value, being calculated for a set of
points over a range of a parameter,
t. Now the property of physical
variables is that they normally do
not exhibit abrupt changes or dis-
continuities.
So what has happened here? If
we look at the expression for f, we
see it is the result of a constant, k,
added to the product of two other
functions, g and h. Looking further,
we see that the function g has a be-
havior that is exponentially increas-
ing with t. The function h, on the
other hand, is exponentially de-
creasing with t. The resultant prod-
uct of g and h is almost constant
with increasing t until an abrupt
jump occurs and the curve for f
goes flat.
What has gone wrong? The an-
swer is that there has been floating-
point underflow at the critical point
in the curve, i.e., the representation
of the negative exponent has ex-
ceeded the field size in the floating-
COMMUNICATIONS OF THE ACM/September
1991/Vol.34, No.9 7
3