Tl;dr: this paper introduces what is sometimes called “Miller’s law...
George Miller (1920-2012) was an American psychologist, national me...
Here is where Miller first creates the analogy to information theor...
Here is a bit more background on Claude Shannon’s seminal work on i...
Channel capacity is the tight upper bound on the rate at which info...
In information theory, a bit is the entropy of a binary random vari...
Strange that the “Air Force Operational Applications Laboratory” wa...
Interesting thought: “We might argue that in the course of evolutio...
The chunk is a fundamental unit that Miller introduces in this pape...
And finally, what about the magical number seven…. Seven still r...
VOL.
63,
No. 1
MARCH,
1956
THE
PSYCHOLOGICAL
REVIEW
THE
MAGICAL
NUMBER
SEVEN,
PLUS
OR
MINUS TWO:
SOME
LIMITS
ON OUR
CAPACITY
FOR
PROCESSING
INFORMATION
1
GEORGE
A.
MILLER
Harvard
University
My
problem
is
that
I
have been perse-
cuted
by an
integer.
For
seven years
this number
has
followed
me
around,
has
intruded
in my
most private data,
and
has
assaulted
me
from
the
pages
of our
most public journals.
This
number
as-
sumes
a
variety
of
disguises, being some-
times
a
little
larger
and
sometimes
a
little
smaller
than
usual,
but
never
changing
so
much
as to be
unrecogniz-
able.
The
persistence with which
this
number
plagues
me is far
more than
a
random
accident.
There
is, to
quote
a
famous senator,
a
design behind
it,
some
pattern
governing
its
appearances.
Either
there really
is
something unusual
about
the
number
or
else
I am
suffering
from
delusions
of
persecution.
I
shall begin
my
case
history
by
tell-
ing
you
about some experiments
that
tested
how
accurately people
can
assign
numbers
to the
magnitudes
of
various
aspects
of a
stimulus.
In the
tradi-
tional language
of
psychology these
would
be
called experiments
in
absolute
1
This
paper
was first
read
as an
Invited
Address
before
the
Eastern Psychological
As-
sociation
in
Philadelphia
on
April
IS,
19SS.
Preparation
of the
paper
was
supported
by
the
Harvard Psycho-Acoustic Laboratory
un-
der
Contract
NSori-76
between
Harvard Uni-
versity
and the
Office
of
Naval Research,
U.
S.
Navy (Project NR142-201, Report
PNR-174).
Reproduction
for any
purpose
of the U. S.
Government
is
permitted.
judgment. Historical accident, how-
ever,
has
decreed
that
they should have
another name.
We now
call
them
ex-
periments
on the
capacity
of
people
to
transmit
information.
Since these
ex-
periments would
not
have been done
without
the
appearance
of
information
theory
on the
psychological scene,
and
since
the
results
are
analyzed
in
terms
of
the
concepts
of
information theory,
I
shall
have
to
preface
my
discussion
with
a few
remarks about this theory.
INFORMATION
MEASUREMENT
The
"amount
of
information"
is ex-
actly
the
same concept
that
we
have
talked about
for
years under
the
name
of
"variance."
The
equations
are
dif-
ferent,
but if we
hold tight
to the
idea
that
anything
that
increases
the
vari-
ance
also increases
the
amount
of
infor-
mation
we
cannot
go far
astray.
The
advantages
of
this
new way
of
talking about variance
are
simple
enough.
Variance
is
always stated
in
terms
of the
unit
of
measurement
inches,
pounds, volts,
etc.—whereas
the
amount
of
information
is a
dimension-
less quantity. Since
the
information
in
a
discrete statistical distribution does
not
depend upon
the
unit
of
measure-
ment,
we can
extend
the
concept
to
situations where
we
have
no
metric
and
we
would
not
ordinarily think
of
using
81
GEORGE
A.
MILLER
the
variance.
And it
also enables
us to
compare results obtained
in
quite dif-
ferent
experimental situations where
it
would
be
meaningless
to
compare
vari-
ances based
on
different
metrics.
So
there
are
some good reasons
for
adopt-
ing
the
newer concept.
The
similarity
of
variance
and
amount
of
information
might
be
explained this
way: When
we
have
a
large variance,
we
are
very ignorant about what
is go-
ing
to
happen.
If we are
very ignorant,
then when
we
make
the
observation
it
gives
us a
lot
of
information.
On the
other hand,
if the
variance
is
very small,
we
know
in
advance
how our
observa-
tion
must come out,
so we get
little
in-
formation
from
making
the
observation.
If
you
will
now
imagine
a
communi-
cation
system,
you
will realize that
there
is a
great deal
of
variability about
what goes into
the
system
and
also
a
great deal
of
variability about what
comes
out.
The
input
and the
output
can
therefore
be
described
in
terms
of
their variance
(or
their
information).
If
it is a
good communication system,
however,
there must
be
some system-
atic relation between what goes
in and
what
comes out.
That
is to
say,
the
output will depend upon
the
input,
or
will
be
correlated with
the
input.
If we
measure this correlation, then
we can
say how
much
of the
output variance
is
attributable
to the
input
and how
much
is
due to
random
fluctuations or
"noise"
introduced
by the
system during trans-
mission.
So we see
that
the
measure
of
transmitted
information
is
simply
a
measure
of the
input-output correlation.
There
are two
simple rules
to
follow.
Whenever
I
refer
to
"amount
of in-
formation,"
you
will understand
"vari-
ance."
And
whenever
I
refer
to
"amount
of
transmitted information,"
you
will
understand
"covariance"
or
"correla-
tion."
The
situation
can be
described graphi-
cally
by two
partially
overlapping cir-
cles.
Then
the
left
circle
can be
taken
to
represent
the
variance
of the
input,
the
right circle
the
variance
of the
out-
put,
and the
overlap
the
covariance
of
input
and
output.
I
shall speak
of the
left
circle
as the
amount
of
input
infor-
mation,
the
right circle
as the
amount
of
output information,
and the
overlap
as the
amount
of
transmitted
informa-
tion.
In the
experiments
on
absolute judg-
ment,
the
observer
is
considered
to be
a
communication channel.
Then
the
left
circle would represent
the
amount
of
information
in the
stimuli,
the
right
circle
the
amount
of
information
in his
responses,
and the
overlap
the
stimulus-
response correlation
as
measured
by the
amount
of
transmitted information.
The
experimental
problem
is to
increase
the
amount
of
input information
and to
measure
the
amount
of
transmitted
in-
formation.
If the
observer's absolute
judgments
are
quite
accurate,
then
nearly
all of the
input information
will
be
transmitted
and
will
be
recoverable
from
his
responses.
If he
makes errors,
then
the
transmitted
information
may
be
considerably less than
the
input.
We
expect
that,
as we
increase
the
amount
of
input information,
the
observer will
begin
to
make more
and
more errors;
we
can
test
the
limits
of
accuracy
of his
absolute judgments.
If the
human
ob-
server
is a
reasonable
kind
of
communi-
cation
system, then when
we
increase
the
amount
of
input
information
the
transmitted information will increase
at
first and
will eventually
level
off
at
some
asymptotic value.
This
asymptotic value
we
take
to be the
channel capacity
of
the
observer:
it
represents
the
greatest
amount
of
information
that
he can
give
us
about
the
stimulus
on the
basis
of
an
absolute judgment.
The
channel
ca-
pacity
is the
upper limit
on the
extent
to
which
the
observer
can
match
his re-
sponses
to the
stimuli
we
give him.
Now
just
a
brief word
about
the bit
THE
MAGICAL
NUMBER
SEVEN
83
and we can
begin
to
look
at
some
data.
One bit of
information
is the
amount
of
information
that
we
need
to
make
a
decision
between
two
equally likely
al-
ternatives.
If we
must decide whether
a man is
less than
six
feet
tall
or
more
than
six
feet
tall
and if we
know
that
the
chances
are
SO-SO,
then
we
need
one bit of
information. Notice that
this unit
of
information does
not
refer
in any way to the
unit
of
length
that
we
use—feet,
inches, centimeters, etc.
However
you
measure
the
man's height,
we
still
need
just
one bit of
information.
Two
bits
of
information enable
us to
decide among
four
equally likely alter-
natives.
Three
bits
of
information
en-
able
us to
decide among eight equally
likely alternatives. Four bits
of
infor-
mation decide among
16
alternatives,
five
among
32, and so on.
That
is to
say,
if
there
are 32
equally likely alter-
natives,
we
must make
five
successive
binary decisions, worth
one bit
each,
be-
fore
we
know which
alternative
is
cor-
rect.
So the
general rule
is
simple:
every time
the
number
of
alternatives
is
increased
by a
factor
of
two,
one bit
of
information
is
added.
There
are two
ways
we
might
in-
crease
the
amount
of
input information.
We
could increase
the
rate
at
which
we
give
information
to the
observer,
so
that
the
amount
of
information
per
unit time
would
increase.
Or we
could ignore
the
time
variable
completely
and
increase
the
amount
of
input information
by
increasing
the
number
of
alternative
stimuli.
In the
absolute judgment
ex-
periment
we are
interested
in the
second
alternative.
We
give
the
observer
as
much
time
as he
wants
to
make
his re-
sponse;
we
simply increase
the
number
of
alternative stimuli among which
he
must discriminate
and
look
to see
where
confusions
begin
to
occur.
Confusions
will
appear
near
the
point
that
we are
calling
his
"channel
capacity."
ABSOLUTE
JUDGMENTS
OF
UNI-
DIMENSIONAL
STIMULI
Now
let us
consider
what
happens
when
we
make absolute judgments
of
tones.
Pollack
(17)
asked listeners
to
identify
tones
by
assigning numerals
to
them.
The
tones were
different
with
re-
spect
to
frequency,
and
covered
the
range
from
100 to
8000
cps in
equal
logarithmic
steps.
A
tone
was
sounded
and the
listener responded
by
giving
a
numeral.
After
the
listener
had
made
his
response
he was
told
the
correct
identification
of the
tone.
When only
two or
three tones were
used
the
listeners never
confused
them.
With
four
different
tones
confusions
were
quite
rare,
but
with
five or
more
tones
confusions
were frequent. With
fourteen
different
tones
the
listeners
made many mistakes.
These
data
are
plotted
in
Fig.
1.
Along
the
bottom
is the
amount
of in-
put
information
in
bits
per
stimulus.
As
the
number
of
alternative tones
was
increased
from
2 to 14, the
input
infor-
mation
increased
from
1 to 3.8
bits.
On
the
ordinate
is
plotted
the
amount
of
a
2
UJ
en
z
_2.5
BITS
PITCHES
100-8000
CPS
012345
INPUT INFORMATION
FIG.
1.
Data
from
Pollack (17,
18) on the
amount
of
information that
is
transmitted
by
listeners
who
make
absolute
judgments
of
auditory
pitch.
As the
amount
of
input
in-
formation
is
increased
by
increasing
from
2
to 14 the
number
of
different
pitches
to be
judged,
the
amount
of
transmitted informa-
tion approaches
as its
upper
limit
a
channel
capacity
of
about
2.S
bits
per
judgment.
84
GEORGE
A.
MILLER
transmitted information.
The
amount
of
transmitted information behaves
in
much
the way we
would expect
a
com-
munication
channel
to
behave;
the
trans-
mitted information increases linearly
up
to
about
2
bits
and
then bends
off to-
ward
an
asymptote
at
about
2.5
bits.
This
value,
2.5
bits, therefore,
is
what
we
are
calling
the
channel capacity
of
the
listener
for
absolute judgments
of
pitch.
So
now we
have
the
number
2.5
bits. What does
it
mean?
First,
note
that
2.5
bits corresponds
to
about
six
equally likely alternatives.
The
result
means
that
we
cannot pick more than
six
different
pitches that
the
listener
will
never
confuse.
Or,
stated slightly dif-
ferently,
no
matter
how
many alterna-
tive tones
we ask him to
judge,
the
best
we
can
expect
him to do is to
assign
them
to
about
six
different
classes with-
out
error.
Or,
again,
if we
know
that
there
were
N
alternative stimuli, then
his
judgment enables
us to
narrow
down
the
particular stimulus
to one out of
N/6.
Most people
are
surprised that
the
number
is as
small
as
six.
Of
course,
there
is
evidence
that
a
musically
so-
phisticated person with absolute pitch
can
identify
accurately
any one of 50
or
60
different
pitches. Fortunately,
I
do
not
have time
to
discuss these
re-
markable exceptions.
I say it is
for-
tunate because
I do not
know
how to
explain their superior
performance.
So
I
shall stick
to the
more pedestrian
fact
that
most
of us can
identify
about
one
out of
only
five or six
pitches
before
we
begin
to get
confused.
It is
interesting
to
consider
that
psy-
chologists
have been using seven-point
rating
scales
for a
long time,
on the
intuitive
basis
that
trying
to
rate
into
finer
categories
does
not
really
add
much
to the
usefulness
of the
ratings. Pol-
lack's
results indicate that,
at
least
for
pitches,
this intuition
is
fairly sound.
12345
INPUT INFORMATION
FIG.
2.
Data
from
Garner
(7) on the
chan-
nel
capacity
for
absolute judgments
of
audi-
tory loudness.
Next
you can ask how
reproducible
this result
is.
Does
it
depend
on the
spacing
of the
tones
or the
various con-
ditions
of
judgment? Pollack varied
these
conditions
in a
number
of
ways.
The
range
of
frequencies
can be
changed
by a
factor
of
about
20
without chang-
ing
the
amount
of
information trans-
mitted
more than
a
small percentage.
Different
groupings
of the
pitches
de-
creased
the
transmission,
but the
loss
was
small.
For
example,
if you can
discriminate
five
high-pitched tones
in
one
series
and five
low-pitched tones
in
another
series,
it is
reasonable
to ex-
pect
that
you
could combine
all ten
into
a
single series
and
still tell them
all
apart without error. When
you try it,
however,
it
does
not
work.
The
chan-
nel
capacity
for
pitch seems
to be
about
six
and
that
is the
best
you can do.
While
we are on
tones,
let us
look
next
at
Garner's
(7)
work
on
loudness.
Garner's
data
for
loudness
are
sum-
marized
in
Fig.
2.
Garner went
to
some
trouble
to get the
best possible spacing
of
his
tones over
the
intensity range
from
15 to 110 db. He
used
4, 5, 6, 7,
10,
and 20
different
stimulus intensities.
The
results shown
in
Fig.
2
take into
account
the
differences
among subjects
and the
sequential
influence
of the im-
mediately preceding judgment. Again
we
find
that
there seems
to be a
limit.
THE
MAGICAL
NUMBER
SEVEN
85
TAST
JUDGMENTS
OF
SALINE
CONCENTRATION
1234
INPUT
INFORMATION
FIG.
3.
Data
from
Beebe-Center,
Rogers,
and
O'Connell
(1) on the
channel
capacity
for
absolute judgments
of
saltiness.
The
channel capacity
for
absolute judg-
ments
of
loudness
is 2.3
bits,
or
about
five
perfectly
discriminable alternatives.
Since
these
two
studies were done
in
different
laboratories with
slightly
dif-
ferent
techniques
and
methods
of
analy-
sis,
we are not in a
good position
to
argue whether
five
loudnesses
is
signifi-
cantly
different
from
six
pitches. Prob-
ably
the
difference
is in the
right
direc-
tion,
and
absolute judgments
of
pitch
are
slightly more accurate than absolute
judgments
of
loudness.
The
important
point,
however,
is
that
the two
answers
are of the
same order
of
magnitude.
The
experiment
has
also
been done
for
taste
intensities.
In
Fig.
3 are the
results obtained
by
Beebe-Center, Rog-
ers,
and
O'Connell
(1) for
absolute
judgments
of the
concentration
of
salt
solutions.
The
concentrations
ranged
from
0.3 to
34.7
gm.
NaCl
per 100
cc.
tap
water
in
equal subjective
steps.
They
used
3, 5, 9, and
17
different con-
centrations.
The
channel capacity
is
1.9
bits,
which
is
about
four
distinct
concentrations.
Thus
taste
intensities
seem
a
little less distinctive than audi-
tory
stimuli,
but
again
the
order
of
magnitude
is not far
off.
On
the
other hand,
the
channel
ca-
pacity
for
judgments
of
visual
position
seems
to be
significantly larger.
Hake
and
Garner
(8)
asked observers
to in-
terpolate
visually
between
two
scale
markers. Their results
are
shown
in
Fig.
4.
They
did the
experiment
in
two
ways.
In one
version
they
let the
observer
use any
number between zero
and 100 to
describe
the
position,
al-
though
they
presented
stimuli
at
only
5, 10, 20, or SO
different
positions.
The
results
with
this
unlimited response
technique
are
shown
by the filled
circles
on
the
graph.
In the
other version
the
observers were limited
in
their
re-
sponses
to
reporting just those stimu-
lus
values that were possible.
That
is
to
say,
in the
second version
the
num-
ber
of
different
responses
that
the ob-
server
could make
was
exactly
the
same
as the
number
of
different
stimuli
that
the
experimenter might present.
The
results
with
this
limited response
tech-
nique
are
shown
by the
open circles
on
the
graph.
The two
functions
are so
similar
that
it
seems fair
to
conclude
that
the
number
of
responses available
to
the
observer
had
nothing
to do
with
the
channel capacity
of
3.2S
bits.
The
Hake-Garner experiment
has
been
repeated
by
Coonan
and
Klemmer.
Al-
though
they have
not yet
published
their
results, they have given
me
per-
mission
to say
that
they
obtained
chan-
nel
capacities ranging
from
3.2
bits
for
-3.25
.
BITS
8
z
100
01
23456
INPUT INFORMATION
FIG.
4.
Data
from
Hake
and
Garner
(8)
on
the
channel capacity
for
absolute judg-
ments
of the
position
of a
pointer
in a
linear
interval.
86
GEORGE
A.
MILLER
very short exposures
of the
pointer
po-
sition
to 3.9
bits
for
longer exposures.
These
values
are
slightly higher than
Hake
and
Garner's,
so we
must con-
clude
that
there'are
between
10 and
IS
distinct positions along
a
linear
inter-
val.
This
is the
largest channel
ca-
pacity
that
has
been measured
for any
unidimensional
variable.
At the
present time these
four
experi-
ments
on
absolute judgments
of
simple,
unidimensional
stimuli
are all
that
have
appeared
in the
psychological journals.
However,
a
great deal
of
work
on
other
stimulus
variables
has not yet
appeared
in the
journals.
For
example, Eriksen
and
Hake
(6)
have
found
that
the
channel capacity
for
judging
the
sizes
of
squares
is 2.2
bits,
or
about
five
categories,
under
a
wide range
of ex-
perimental
conditions.
In a
separate
experiment
Eriksen
(5)
found
2.8
bits
for
size,
3.1
bits
for
hue,
and 2.3
bits
for
brightness.
Geldard
has
measured
the
channel capacity
for the
skin
by
placing
vibrators
on the
chest region.
A
good observer
can
identify
about
four
intensities,
about
five
durations,
and
about seven locations.
One of the
most active groups
in
this
area
has
been
the Air
Force Operational
Applications
Laboratory.
Pollack
has
been kind
enough
to
furnish
me
with
the
results
of
their measurements
for
several aspects
of
visual displays. They
made measurements
for
area
and for
the
curvature, length,
and
direction
of
lines.
In one set of
experiments they
used
a
very short exposure
of the
stimu-
lus—%
0
second—and
then
they
re-
peated
the
measurements with
a 5-
second
exposure.
For
area they
got
2.6
bits with
the
short exposure
and
2.7
bits
with
the
long exposure.
For
the
length
of a
line they
got
about
2.6
bits with
the
short exposure
and
about
3.0
bits with
the
long exposure. Direc-
tion,
or
angle
of
inclination, gave
2.8
bits
for the
short exposure
and 3.3
bits
for
the
long exposure. Curvature
was
apparently harder
to
judge. When
the
length
of the arc was
constant,
the re-
sult
at the
short exposure duration
was
2.2
bits,
but
when
the
length
of the
chord
was
constant,
the
result
was
only
1.6
bits.
This
last
value
is the
lowest
that
anyone
has
measured
to
date.
I
should add, however,
that
these values
are
apt to be
slightly
too low
because
the
data
from
all
subjects were pooled
before
the
transmitted
information
was
computed.
Now
let us see
where
we
are.
First,
the
channel capacity does seem
to be a
valid notion
for
describing human
ob-
servers.
Second,
the
channel capacities
measured
for
these unidimensional vari-
ables range
from
1.6
bits
for
curvature
to 3.9
bits
for
positions
in an
interval.
Although
there
is no
question that
the
differences
among
the
variables
are
real
and
meaningful,
the
more impressive
fact
to me is
their considerable simi-
larity.
If I
take
the
best
estimates
I
can get of the
channel capacities
for all
the
stimulus
variables
I
have mentioned,
the
mean
is 2.6
bits
and the
standard
deviation
is
only
0.6
bit.
In
terms
of
distinguishable
alternatives, this mean
corresponds
to
about
6.5
categories,
one
standard deviation includes
from
4 to
10
categories,
and the
total range
is
from
3 to
IS
categories. Considering
the
wide
variety
of
different
variables
that
have been studied,
I find
this
to
be
a
remarkably narrow range.
There
seems
to be
some limitation
built
into
us
either
by
learning
or by
the
design
of our
nervous systems,
a
limit
that
keeps
our
channel capacities
in
this
general range.
On the
basis
of
the
present
evidence
it
seems
safe
to
say
that
we
possess
a finite and
rather
small capacity
for
making such unidi-
mensional judgments
and
that
this
ca-
pacity
does
not
vary
a
great deal
from
one
simple sensory attribute
to
another.
THE
MAGICAL NUMBER SEVEN
87
ABSOLUTE JUDGMENTS
OF
MULTI-
DIMENSIONAL
STIMULI
You
may
have noticed
that
I
have
been careful
to say
that
this
magical
number
seven applies
to
one-dimensional
judgments.
Everyday
experience
teaches
us
that
we can
identify
accurately
any
one of
several hundred
faces,
any one
of
several thousand words,
any one of
several thousand objects, etc.
The
story
certainly would
not be
complete
if we
stopped
at
this
point.
We
must have
some understanding
of why the
one-
dimensional
variables
we
judge
in the
laboratory
give
results
so far out of
line
with
what
we do
constantly
in our
behavior outside
the
laboratory.
A
pos-
sible explanation lies
in the
number
of
independently variable attributes
of the
stimuli
that
are
being judged. Objects,
faces,
words,
and the
like
differ
from
one
another
in
many ways, whereas
the
simple stimuli
we
have considered thus
far
differ
from
one
another
in
only
one
respect.
Fortunately,
there
are a few
data
on
what happens
when
we
make absolute
judgments
of
stimuli
that
differ
from
one
another
in
several ways.
Let us
look
first at the
results Klemmer
and
Frick
(13) have reported
for the
abso-
lute
judgment
of the
position
of a dot
in
a
square.
In
Fig.
5 we see
their
re-
POINTS
IN A
SQUARE
NO
GRID
.03
SEC. EXPOSURE
34567
INPUT INFORMATION
FIG.
S.
Data
from
Klemmer
and
Frick (13)
on
the
channel
capacity
for
absolute judg-
ments
of the
position
of a dot in a
square.
suits.
Now the
channel capacity seems
to
have increased
to 4.6
bits,
which
means that people
can
identify
accu-
rately
any one of 24
positions
in the
square.
The
position
of a dot in a
square
is
clearly
a
two-dimensional proposition.
Both
its
horizontal
and its
vertical
po-
sition must
be
identified.
Thus
it
seems
natural
to
compare
the
4.6-bit capacity
for
a
square with
the
3.25-bit
capacity
for
the
position
of a
point
in an
inter-
val.
The
point
in the
square requires
two
judgments
of the
interval
type.
If
we
have
a
capacity
of
3.2S
bits
for
esti-
mating
intervals
and we do
this
twice,
we
should
get 6.5
bits
as our
capacity
for
locating
points
in a
square.
Adding
the
second independent dimension gives
us
an
increase
from
3.2S
to
4.6,
but it
falls
short
of the
perfect
addition that
would
give
6.5
bits.
Another
example
is
provided
by
Beebe-
Center, Rogers,
and
O'Connell. When
they asked people
to
identify
both
the
saltiness
and the
sweetness
of
solutions
containing various concentrations
of
salt
and
sucrose, they
found
that
the
chan-
nel
capacity
was 2.3
bits.
Since
the ca-
pacity
for
salt
alone
was
1.9,
we
might
expect
about
3.8
bits
if the two
aspects
of
the
compound stimuli were judged
independently.
As
with
spatial
loca-
tions,
the
second dimension
adds
a
little
to
the
capacity
but not as
much
as it
conceivably
might.
A
third example
is
provided
by
Pol-
lack
(18),
who
asked
listeners
to
judge
both
the
loudness
and the
pitch
of
pure
tones.
Since
pitch
gives
2.S
bits
and
loudness gives
2.3
bits,
we
might hope
to get as
much
as 4.8
bits
for
pitch
and
loudness together.
Pollack
obtained
3.1
bits,
which
again
indicates
that
the
second dimension augments
the
channel
capacity
but not so
much
as it
might.
A
fourth example
can be
drawn
from
the
work
of
Halsey
and
Chapanis
(9)
on
confusions among
colors
of
equal
88
GEORGE
A.
MILLER
luminance.
Although they
did not
ana-
lyze their
results
in
informational
terms,
they estimate that there
are
about
11
to
IS
identifiable colors,
or, in our
terms,
about
3.6
bits. Since these colors
varied
in
both
hue and
saturation,
it is
prob-
ably correct
to
regard this
as a
two-
dimensional
judgment.
If we
compare
this with
Eriksen's
3.1
bits
for hue
(which
is a
questionable comparison
to
draw),
we
again
have something less
than perfect addition
when
a
second
dimension
is
added.
It is
still
a
long way, however,
from
these
two-dimensional examples
to the
multidimensional
stimuli provided
by
faces,
words, etc.
To fill
this
gap we
have
only
one
experiment,
an
auditory
study done
by
Pollack
and
Picks
(19).
They
managed
to get six
different
acous-
tic
variables that they could change:
frequency,
intensity,
rate
of
interrup-
tion,
on-time fraction,
total
duration,
and
spatial location.
Each
one of
these
six
variables could assume
any one of
five
different
values,
so
altogether
there
were
S
8
, or
15,625
different
tones that
they could
present.
The
listeners
made
a
separate rating
for
each
one of
these
six
dimensions.
Under
these
conditions
the
transmitted information
was 7.2
bits,
which
corresponds
to
about
150
differ-
ent
categories
that
could
be
absolutely
identified
without error.
Now we are
beginning
to get up
into
the
range
that
ordinary experience would lead
us to
expect.
Suppose
that
we
plot these data,
fragmentary
as
they
are,
and
make
a
guess
about
how the
channel capacity
changes
with
the
dimensionality
of the
stimuli.
The
result
is
given
in
Fig.
6.
In
a
moment
of
considerable
daring
I
sketched
the
dotted line
to
indicate
roughly
the
trend
that
the
data
seemed
to
be
taking.
Clearly,
the
addition
of
independently
variable
attributes
to the
stimulus
in-
creases
the
channel capacity,
but at a
I
234567
NUMBER
OF
VARIABLE
ASPECTS
FIG.
6. The
general
form
of the
relation
be-
tween
channel capacity
and the
number
of in-
dependently variable attributes
of the
stimuli.
decreasing
rate.
It is
interesting
to
note
that
the
channel capacity
is in-
creased even
when
the
several variables
are
not
independent. Eriksen
(5) re-
ports
that, when size, brightness,
and
hue
all
vary
together
in
perfect correla-
tion,
the
transmitted
information
is 4.1
bits
as
compared with
an
average
of
about
2.7
bits when these attributes
are
varied
one at a
time.
By
confounding
three
attributes, Eriksen increased
the
dimensionality
of the
input without
in-
creasing
the
amount
of
input
informa-
tion
; the
result
was an
increase
in
chan-
nel
capacity
of
about
the
amount
that
the
dotted
function
in
Fig.
6
would
lead
us
to
expect.
The
point seems
to be
that,
as we
add
more variables
to the
display,
we
increase
the
total capacity,
but we de-
crease
the
accuracy
for any
particular
variable.
In
other words,
we can
make
relatively
crude
judgments
of
several
things simultaneously.
We
might argue
that
in the
course
of
evolution
those organisms were most
successful
that were responsive
to the
widest
range
of
stimulus
energies
in
their environment.
In
order
to
survive
in
a
constantly
fluctuating
world,
it was
better
to
have
a
little
information about
a lot of
things than
to
have
a lot of in-
formation
about
a
small segment
of the
THE
MAGICAL
NUMBER
SEVEN
89
environment.
If a
compromise
was
nec-
essary,
the one we
seem
to
have made
is
clearly
the
more adaptive.
Pollack
and
Picks's
results
are
very
strongly suggestive
of an
argument
that
linguists
and
phoneticians have been
making
for
some time
(11).
According
to the
linguistic analysis
of the
sounds
of
human
speech,
there
are
about eight
or
ten
dimensions—the
linguists
call
them distinctive
features—that
distin-
guish
one
phoneme
from
another.
These
distinctive
features
are
usually binary,
or
at
most ternary,
in
nature.
For ex-
ample,
a
binary distinction
is
made
be-
tween vowels
and
consonants,
a
binary
decision
is
made between oral
and
nasal
consonants,
a
ternary
decision
is
made
among
front,
middle,
and
back
pho-
nemes, etc.
This
approach
gives
us
quite
a
different
picture
of
speech per-
ception
than
we
might otherwise obtain
from
our
studies
of the
speech spectrum
and of the
ear's
ability
to
discriminate
relative
differences
among pure tones.
I am
personally much interested
in
this
new
approach
(15),
and I
regret
that
there
is not
time
to
discuss
it
here.
It was
probably with this linguistic
theory
in
mind
that
Pollack
and
Picks
conducted
a
test
on a set of
tonal
stimuli
that
varied
in
eight dimensions,
but
required only
a
binary decision
on
each dimension. With these tones they
measured
the
transmitted
information
at 6.9
bits,
or
about
120
recognizable
kinds
of
sounds.
It is an
intriguing
question,
as yet
unexplored, whether
one can go on
adding dimensions
in-
definitely
in
this
way.
In
human speech there
is
clearly
a
limit
to the
number
of
dimensions
that
we
use.
In
this
instance, however,
it is
not
known whether
the
limit
is
imposed
by the
nature
of the
perceptual
ma-
chinery
that
must recognize
the
sounds
or
by the
nature
of the
speech
ma-
chinery
that
must produce them. Some-
body will have
to do the
experiment
to
find
out.
There
is a
limit, however,
at
about eight
or
nine distinctive
features
in
every language that
has
been studied,
and so
when
we
talk
we
must resort
to
still another trick
for
increasing
our
channel capacity. Language uses
se-
quences
of
phonemes,
so we
make sev-
eral
judgments
successively
when
we
listen
to
words
and
sentences.
That
is
to
say,
we use
both simultaneous
and
successive discriminations
in
order
to
expand
the
rather
rigid
limits
imposed
by the
inaccuracy
of our
absolute
judg-
ments
of
simple magnitudes.
These
multidimensional judgments
are
strongly reminiscent
of the
abstraction
experiment
of
Kulpe
(14).
As you may
remember,
Kiilpe
showed that observers
report
more accurately
on an
attribute
for
which they
are set
than
on
attributes
for
which they
are not
set.
For
exam-
ple, Chapman
(4)
used three
different
attributes
and
compared
the
results
ob-
tained when the. observers
were
in-
structed
before
the
tachistoscopic
pres-
entation with
the
results obtained
when
they were
not
told until
after
the
pres-
entation which
one of the
three attri-
butes
was to be
reported. When
the
instruction
was
given
in
advance,
the
judgments were more
accurate.
When
the
instruction
was
given
afterwards,
the
subjects presumably
had to
judge
all
three attributes
in
order
to
report
on
any one of
them
and the
accuracy
was
correspondingly lower.
This
is in
com-
plete accord
with
the
results
we
have
just been considering, where
the ac-
curacy
of
judgment
on
each
attribute
decreased
as
more dimensions were
added.
The
point
is
probably obvious,
but
I
shall make
it
anyhow,
that
the
abstraction
experiments
did not
demon-
strate
that
people
can
judge only
one
attribute
at a
time.
They
merely showed
what seems quite reasonable,
that
peo-
ple are
less
accurate
if
they
must judge
more
than
one
attribute simultaneously.
90
GEORGE
A.
MILLER
SUBITIZING
I
cannot leave this general area with-
out
mentioning, however
briefly,
the ex-
periments
conducted
at
Mount Holyoke
College
on the
discrimination
of
num-
ber
(12).
In
experiments
by
Kaufman,
Lord, Reese,
and
Volkmann random
patterns
of
dots were
flashed on a
screen
for
y
5
of a
second. Anywhere
from
1
to
more than
200
dots
could appear
in
the
pattern.
The
subject's task
was to
report
how
many dots there were.
The first
point
to
note
is
that
on
pat-
terns containing
up to five or six
dots
the
subjects simply
did not
make errors.
The
performance
on
these
small num-
bers
of
dots
was so
different
from
the
performance
with more dots that
it was
given
a
special name. Below seven
the
subjects
were
said
to
subitize;
above
seven
they
were said
to
estimate.
This
is,
as you
will recognize, what
we
once
optimistically called "the span
of
atten-
tion."
This
discontinuity
at
seven
is, of
course, suggestive.
Is
this
the
same
basic process
that
limits
our
unidimen-
sional judgments
to
about seven
cate-
gories?
The
generalization
is
tempting,
but
not
sound
in my
opinion.
The
data
on
number estimates have
not
been ana-
lyzed
in
informational terms;
but on
the
basis
of the
published
data
I
would
guess
that
the
subjects transmitted
something more than
four
bits
of in-
formation
about
the
number
of
dots.
Using
the
same arguments
as
before,
we
would
conclude
that
there
are
about
20
or 30
distinguishable
categories
of nu-
merousness.
This
is
considerably more
information
than
we
would expect
to
get
from
a
unidimensional display.
It
is,
as a
matter
of
fact,
very much like
a
two-dimensional
display. Although
the
dimensionality
of the
random
dot
pat-
terns
is not
entirely clear, these results
are
in the
same range
as
Klemmer
and
Frick's
for
their two-dimensional dis-
play
of
dots
in a
square.
Perhaps
the
two
dimensions
of
numerousness
are
area
and
density. When
the
subject
can
subitize, area
and
density
may not
be
the
significant
variables,
but
when
the
subject must estimate perhaps they
are
significant.
In any
event,
the
com-
parison
is not so
simple
as it
might
seem
at first
thought.
This
is one of the
ways
in
which
the
magical number seven
has
persecuted
me.
Here
we
have
two
closely related
kinds
of
experiments, both
of
which
point
to the
significance
of the
number
seven
as a
limit
on our
capacities.
And
yet
when
we
examine
the
matter more
closely,
there seems
to be a
reasonable
suspicion
that
it is
nothing more than
a
coincidence.
THE
SPAN
OF
IMMEDIATE
MEMOKY
Let me
summarize
the
situation
in
this way.
There
is a
clear
and
definite
limit
to the
accuracy with which
we can
identify
absolutely
the
magnitude
of
a
unidimensional stimulus variable.
I
would
propose
to
call this limit
the
span
of
absolute judgment,
and I
maintain
that
for
unidimensional judg-
ments
this
span
is
usually somewhere
in
the
neighborhood
of
seven.
We are
not
completely
at the
mercy
of
this
limited
span, however, because
we
have
a
variety
of
techniques
for
getting
around
it and
increasing
the
accuracy
of
our
judgments.
The
three most
im-
portant
of
these devices
are (a) to
make relative rather than absolute judg-
ments;
or, if
that
is not
possible,
(b)
to
increase
the
number
of
dimensions
along
which
the
stimuli
can
differ;
or
(c)
to
arrange
the
task
in
such
a way
that
we
make
a
sequence
of
several
ab-
solute judgments
in a
row.
The
study
of
relative judgments
is
one
of the
oldest topics
in
experimental
psychology,
and I
will
not
pause
to re-
view
it
now.
The
second device,
in-
creasing
the
dimensionality,
we
have just
considered.
It
seems
that
by
adding
THE
MAGICAL NUMBER SEVEN
91
more dimensions
and
requiring crude,
binary, yes-no judgments
on
each
at-
tribute
we can
extend
the
span
of
abso-
lute judgment
from
seven
to at
least
ISO.
Judging
from
our
everyday
be-
havior,
the
limit
is
probably
in the
thousands,
if
indeed there
is a
limit.
In
my
opinion,
we
cannot
go on
compound-
ing
dimensions
indefinitely.
I
suspect
that
there
is
also
a
span
of
perceptual
dimensionality
and
that this span
is
somewhere
in the
neighborhood
of
ten,
but
I
must
add at
once that there
is no
objective
evidence
to
support this sus-
picion.
This
is a
question
sadly
need-
ing
experimental exploration.
Concerning
the
third device,
the use
of
successive judgments,
I
have quite
a
bit to say
because this device introduces
memory
as the
handmaiden
of
discrimi-
nation. And, since mnemonic processes
are
at
least
as
complex
as are
perceptual
processes,
we can
anticipate
that
their
interactions will
not be
easily disen-
tangled.
Suppose
that
we
start
by
simply
ex-
tending
slightly
the
experimental pro-
cedure
that
we
have been using.
Up
to
this
point
we
have presented
a
single
stimulus
and
asked
the
observer
to
name
it
immediately
thereafter.
We can ex-
tend
this
procedure
by
requiring
the ob-
server
to
withhold
his
response until
we
have
given
him
several
stimuli
in
suc-
cession.
At the end of the
sequence
of
stimuli
he
then makes
his
response.
We
still have
the
same sort
of
input-out-
put
situation
that
is
required
for the
measurement
of
transmitted
informa-
tion.
But now we
have passed
from
an
experiment
on
absolute judgment
to
what
is
traditionally called
an
experi-
ment
on
immediate memory.
Before
we
look
at any
data
on
this
topic
I
feel
I
must give
you a
word
of
warning
to
help
you
avoid
some obvi-
ous
associations
that
can be
confusing.
Everybody knows
that
there
is a finite
span
of
immediate memory
and
that
for
a lot of
different
kinds
of
test materials
this
span
is
about seven items
in
length.
I
have
just
shown
you
that
there
is a
span
of
absolute judgment
that
can
dis-
tinguish
about
seven
categories
and
that
there
is a
span
of
attention
that
will
encompass about
six
objects
at a
glance.
What
is
more natural than
to
think
that
all
three
of
these spans
are
different
as-
pects
of a
single underlying process?
And
that
is a
fundamental mistake,
as
I
shall
be at
some pains
to
demonstrate.
This
mistake
is one of the
malicious
persecutions
that
the
magical number
seven
has
subjected
me to.
My
mistake went something like this.
We
have seen
that
the
invariant
fea-
ture
in the
span
of
absolute judgment
is
the
amount
of
information
that
the
observer
can
transmit.
There
is a
real
operational
similarity between
the ab-
solute judgment experiment
and the
immediate memory experiment.
If im-
mediate memory
is
like absolute judg-
ment, then
it
should
follow
that
the in-
variant feature
in the
span
of
immediate
memory
is
also
the
amount
of
informa-
tion
that
an
observer
can
retain.
If the
amount
of
information
in the
span
of
immediate
memory
is a
constant, then
the
span should
be
short when
the
indi-
vidual items contain
a lot of
informa-
tion
and the
span should
be
long when
the
items contain
little
information.
For
example,
decimal digits
are
worth
3.3
bits apiece.
We can
recall about seven
of
them,
for a
total
of 23
bits
of in-
formation.
Isolated
English words
are
worth about
10
bits
apiece.
If the
total
amount
of
information
is to
remain
constant
at 23
bits, then
we
should
be
able
to
remember only
two or
three
words
chosen
at
random.
In
this
way
I
generated
a
theory about
how the
span
of
immediate memory should vary
as a
function
of the
amount
of
information
per
item
in the
test
materials.
The
measurements
of
memory span
in
the
literature
are
suggestive
on
this
92
GEORGE
A.
MILLER
question,
but not
definitive.
And so it
was
necessary
to do the
experiment
to
see. Hayes
(10)
tried
it out
with
five
different
kinds
of
test
materials: binary
digits, decimal digits, letters
of the al-
phabet, letters plus decimal digits,
and
with
1,000
monosyllabic
words.
The
lists were read aloud
at the
rate
of one
item
per
second
and the
subjects
had as
much
time
as
they needed
to
give their
responses.
A
procedure described
by
Woodworth
(20)
was
used
to
score
the
responses.
The
results
are
shown
by the
filled
circles
in
Fig.
7.
Here
the
dotted line
indicates what
the
span should have
been
if the
amount
of
information
in the
span were constant.
The
solid curves
represent
the
data.
Hayes repeated
the
experiment
using
test
vocabularies
of
different
sizes
but all
containing only
English monosyllables (open circles
in
Fig.
7).
This more homogeneous test
material
did not
change
the
picture
sig-
nificantly.
With binary items
the
span
is
about nine
and,
although
it
drops
to
about
five
with monosyllabic English
words,
the
difference
is far
less than
the
hypothesis
of
constant information
would
require.
1
£
O
2
2
z
tft
2
K
fc
o:
m
5
3
z
BINARY
DECIMAL
[-BETTERS
|
000
DIGITS
DIGITS
.-LETTERS
a
DIGITS
WORDS
P
-
J
i
U
1
DU
40
30
20
10
O
>
i
*
CONSTANT
\INFORMATION
\
\
\
»
\
\
\
s
-y,
t
*
>
^
<
^~
- J
n
~~I'_1
WOROS-^
10
INFORMATION
PER
ITEM
IN
BITS
FIG.
7.
Data
from
Hayes
(10)
on the
span
of
immediate
memory
plotted
as a
function
of
the
amount
of
information
per
item
in the
test materials.
"0123456
INFORMATION
PER
ITEM
IN
BITS
FIG.
8.
Data
from
Pollack
(16)
on the
amount
of
information
retained
after
one
presentation
plotted
as a
function
of the
amount
of
information
per
item
in the
test
materials.
There
is
nothing wrong with
Hayes's
experiment,
because Pollack
(16)
re-
peated
it
much more elaborately
and
got
essentially
the
same result.
Pol-
lack took pains
to
measure
the
amount
of
information transmitted
and did not
rely
on the
traditional procedure
for
scoring
the
responses.
His
results
are
plotted
in
Fig.
8.
Here
it is
clear that
the
amount
of
information
transmitted
is
not a
constant,
but
increases almost
linearly
as the
amount
of
information
per
item
in the
input
is
increased.
And
so the
outcome
is
perfectly clear.
In
spite
of the
coincidence
that
the
magical number seven appears
in
both
places,
the
span
of
absolute judgment
and the
span
of
immediate memory
are
quite
different
kinds
of
limitations
that
are
imposed
on our
ability
to
process
information.
Absolute judgment
is
lim-
ited
by the
amount
of
information.
Im-
mediate memory
is
limited
by the
num-
ber of
items.
In
order
to
capture this dis-
tinction
in
somewhat picturesque terms,
I
have
fallen
into
the
custom
of
distin-
guishing
between bits
of
information
and
chunks
of
information.
Then
I can
say
that
the
number
of
bits
of
informa-
tion
is
constant
for
absolute judgment
and the
number
of
chunks
of
informa-
THE
MAGICAL
NUMBER SEVEN
93
tlon
is
constant
for
immediate memory.
The
span
of
immediate memory seems
to be
almost independent
of the
number
of
bits
per
chunk,
at
least
over
the
range
that
has
been examined
to
date.
The
contrast
of the
terms
bit and
chunk also serves
to
highlight
the
fact
that
we are not
very definite
about
what
constitutes
a
chunk
of
information.
For
example,
the
memory span
of five
words
that Hayes obtained when each word
was
drawn
at
random
from
a set of
1000
English monosyllables might just
as ap-
propriately have been called
a
memory
span
of
IS
phonemes,
since each word
had
about three phonemes
in it.
Intui-
tively,
it is
clear
that
the
subjects were
recalling
five
words,
not
IS
phonemes,
but the
logical
distinction
is not im-
mediately apparent.
We are
dealing
here with
a
process
of
organizing
or
grouping
the
input into
familiar
units
or
chunks,
and a
great
deal
of
learning
has
gone
into
the
formation
of
these
familiar
units.
RECODING
In
order
to
speak more precisely,
therefore,
we
must recognize
the
impor-
tance
of
grouping
or
organizing
the in-
put
sequence into units
or
chunks.
Since
the
memory span
is a fixed
num-
ber
of
chunks,
we can
increase
the
num-
ber
of
bits
of
information
that
it
con-
tains simply
by
building larger
and
larger
chunks,
each
chunk containing
more
information than
before.
A
man
just
beginning
to
learn radio-
telegraphic code hears each
dit
and
dah
as
a
separate chunk. Soon
he is
able
to
organize these sounds into
letters
and
then
he can
deal with
the
letters
as
chunks.
Then
the
letters organize
themselves
as
words,
which
are
still
larger
chunks,
and he
begins
to
hear
whole
phrases.
I do not
mean
that
each
step
is a
discrete
process,
or
that
pla-
teaus must appear
in his
learning curve,
for
surely
the
levels
of
organization
are
achieved
at
different
rates
and
overlap
each
other during
the
learning process.
I am
simply pointing
to the
obvious
fact
that
the
dits
and
dahs
are
organ-
ized
by
learning into patterns
and
that
as
these
larger chunks emerge
the
amount
of
message
that
the
operator
can
remember
increases
correspondingly.
In
the
terms
I am
proposing
to
use,
the
operator learns
to
increase
the
bits
per
chunk.
In the
jargon
of
communication the-
ory, this process would
be
called
reced-
ing.
The
input
is
given
in a
code
that
contains many chunks with
few
bits
per
chunk.
The
operator
recedes
the
input
into another code
that
contains
fewer
chunks with more bits
per
chunk. There
are
many ways
to do
this
receding,
but
probably
the
simplest
is to
group
the
input
events,
apply
a new
name
to the
group,
and
then remember
the new
name
rather
than
the
original
input
events.
Since
I am
convinced
that
this proc-
ess
is a
very general
and
important
one
for
psychology,
I
want
to
tell
you
about
a
demonstration experiment
that
should
make perfectly explicit what
I am
talk-
ing
about.
This
experiment
was
con-
ducted
by
Sidney Smith
and was re-
ported
by him
before
the
Eastern Psy-
chological Association
in
1954.
Begin
with
the
observed
fact
that
peo-
ple
can
repeat
back
eight decimal
digits,
but
only nine binary digits. Since there
is a
large
discrepancy
in the
amount
of
information
recalled
in
these
two
cases,
we
suspect
at
once that
a
receding
pro-
cedure
could
be
used
to
increase
the
span
of
immediate memory
for
binary
digits.
In
Table
1 a
method
for
group-
ing
and
renaming
is
illustrated.
Along
the top is a
sequence
of 18
binary digits,
far
more than
any
subject
was
able
to
recall
after
a
single
presentation.
In
the
next line these same binary digits
are
grouped
by
pairs.
Four possible
pairs
can
occur:
00 is
renamed
0, 01 is
renamed
1, 10 is
renamed
2, and
11
is
GEORGE
A.
MILLER
TABLE
1
WAYS
OP
RECODING
SEQUENCES
OF
BINARY
DIGITS
Binary
Digits
(Bits)
101000100111001110
2:
1
Chunks
Receding
10
2
10
2
00
0
10
2
01
1
11
3
00
0
11
3
10
2
3:1
Chunks
Receding
4:1
Chunks
Receding
101
5
1010
10
000
0
100
4
0010
2
111
7
0111
7
001
1
0011
3
110
6
10
5:1
Chunks
Receding
10100
20
01001
9
11001
25
110
renamed
3.
That
is to
say,
we
recode
from
a
base-two arithmetic
to a
base-
four
arithmetic.
In the
receded
se-
quence
there
are now
just nine digits
to
remember,
and
this
is
almost within
the
span
of
immediate memory.
In the
next
line
the
same sequence
of
binary digits
is
regrouped into chunks
of
three. There
are
eight possible sequences
of
three,
so
we
give each sequence
a new
name
be-
tween
0 and 7. Now we
have
receded
from
a
sequence
of 18
binary digits
into
a
sequence
of 6
octal digits,
and
this
is
well
within
the
span
of
immedi-
ate
memory.
In the
last
two
lines
the
binary
digits
are
grouped
by
fours
and
by fives and are
given decimal-digit
names
from
0 to
IS
and
from
0 to 31.
It is
reasonably obvious that this kind
of
receding
increases
the
bits
per
chunk,
and
packages
the
binary sequence into
a
form
that
can be
retained within
the
span
of
immediate memory.
So
Smith
assembled
20
subjects
and
measured
their
spans
for
binary
and
octal digits.
The
spans
were
9 for
binaries
and 7 for
octals. Then
he
gave each
receding
scheme
to five of the
subjects.
They
studied
the
receding
until they said
they understood
it—for
about
S or 10
minutes.
Then
he
tested their
span
for
binary
digits again while they tried
to
use
the
receding
schemes they
had
studied.
The
receding
schemes increased
their
span
for
binary digits
in
every case.
But
the
increase
was not as
large
as we
had
expected
on the
basis
of
their span
for
octal
digits.
Since
the
discrepancy
increased
as the
receding
ratio increased,
we
reasoned
that
the few
minutes
the
subjects
had
spent learning
the
reced-
ing
schemes
had not
been
sufficient.
Apparently
the
translation
from
one
code
to the
other must
be
almost auto-
matic
or the
subject will lose part
of the
next group while
he is
trying
to
remem-
ber the
translation
of the
last group.
Since
the
4:1
and
5:1
ratios require
considerable study, Smith
decided
to
imitate Ebbinghaus
and do the
experi-
ment
on
himself. With Germanic
pa-
tience
he
drilled
himself
on
each
reced-
ing
successively,
and
obtained
the re-
sults
shown
in
Fig.
9.
Here
the
data
follow
along rather nicely with
the re-
sults
you
would predict
on the
basis
of
his
span
for
octal digits.
He
could
re-
member
12
octal digits. With
the
2:1
receding,
these
12
chunks
were
worth
24
binary digits. With
the
3:1
reced-
ing
they were worth
36
binary digits.
With
the
4:1
and
5:1
recodings, they
were
worth about
40
binary digits.
It is a
little
dramatic
to
watch
a
per-
son
get 40
binary digits
in a row and
then repeat them back without error.
However,
if you
think
of
this merely
as
THE
MAGICAL
NUMBER
SEVEN
50
1
>
40
S 30
e
z 20
10
PREDICTED
FROM SPAN
FOR
OCTAL
DIGITS
,*/
N
OBSERVED
ONE
HIGHLY
PRACTICED
SUBJECT
2:1
3:1
4:1
SM
RECOOINO
RATIO
FIG.
9. The
span
of
immediate
memory
for
binary
digits
is
plotted
as a
function
of the
receding
procedure
used.
The
predicted
func-
tion
is
obtained
by
multiplying
the
span
for
octals
by 2, 3 and 3.3 for
receding
into base
4,
base
8, and
base
10,
respectively.
a
mnemonic trick
for
extending
the
memory span,
you
will miss
the
more
important point
that
is
implicit
in
nearly
all
such mnemonic devices.
The
point
is
that
receding
is an
extremely
powerful
weapon
for
increasing
the
amount
of
information
that
we can
deal with.
In one
form
or
another
we
use
receding
constantly
in our
daily
behavior.
In my
opinion
the
most customary
kind
of
receding
that
we do all the
time
is
to
translate into
a
verbal code. When
there
is a
story
or an
argument
or an
idea
that
we
want
to
remember,
we
usu-
ally
try to
rephrase
it "in our own
words." When
we
witness some event
we
want
to
remember,
we
make
a
verbal
description
of the
event
and
then
re-
member
our
verbalization. Upon recall
we
recreate
by
secondary elaboration
the
details
that
seem consistent with
the
particular verbal recoding
we
hap-
pen
to
have made.
The
well-known
ex-
periment
by
Carmichael,
Hogan,
and
Walter
(3) on the
influence
that
names
have
on the
recall
of
visual
figures is
one
demonstration
of the
process.
The
inaccuracy
of the
testimony
of
eyewitnesses
is
well
known
in
legal psy-
chology,
but the
distortions
of
testi-
mony
are not
random—they
follow
naturally
from
the
particular recoding
that
the
witness used,
and the
particu-
lar
recoding
he
used depends upon
his
whole
life
history.
Our
language
is
tre-
mendously
useful
for
repackaging
ma-
terial
into
a few
chunks
rich
in
infor-
mation.
I
suspect
that
imagery
is a
form
of
recoding, too,
but
images seem
much harder
to get at
operationally
and
to
study
experimentally than
the
more
symbolic
kinds
of
recoding.
It
seems probable
that
even memori-
zation
can be
studied
in
these
terms.
The
process
of
memorizing
may be
sim-
ply the
formation
of
chunks,
or
groups
of
items
that
go
together,
until
there
are few
enough chunks
so
that
we can
recall
all the
items.
The
work
by
Bous-
field and
Cohen
(2) on the
occurrence
of
clustering
in the
recall
of
words
is
especially interesting
in
this respect.
SUMMARY
I
have come
to the end of the
data
that
I
wanted
to
present,
so I
would
like
now to
make some summarizing
re-
marks.
First,
the
span
of
absolute judgment
and the
span
of
immediate memory
im-
pose severe limitations
on the
amount
of
information
that
we are
able
to re-
ceive,
process,
and
remember.
By or-
ganizing
the
stimulus input simultane-
ously
into several dimensions
and
suc-
cessively into
a
sequence
of
chunks,
we
manage
to
break
(or at
least
stretch)
this informational bottleneck.
Second,
the
process
of
recoding
is a
very
important
one in
human
psychol-
ogy
and
deserves much more explicit
at-
tention
than
it has
received.
In
par-
ticular,
the
kind
of
linguistic recoding
that
people
do
seems
to me to be the
very
lifeblood
of the
thought processes.
Recoding procedures
are a
constant
concern
to
clinicians,
social
psycholo-
96
GEORGE
A.
MILLER
gists,
linguists,
and
anthropologists
and
yet, probably because
receding
is
less
accessible
to
experimental manipulation
than
nonsense
syllables
or T
mazes,
the
traditional
experimental
psychologist
has
contributed
little
or
nothing
to
their
analysis. Nevertheless, experimental
techniques
can be
used, methods
of re-
coding
can be
specified,
behavioral
in-
dicants
can be
found.
And I
anticipate
that
we
will
find a
very orderly
set of
relations describing what
now
seems
an
uncharted wilderness
of
individual dif-
ferences.
Third,
the
concepts
and
measures
provided
by the
theory
of
information
provide
a
quantitative
way of
getting
at
some
of
these
questions.
The
theory
provides
us
with
a
yardstick
for
cali-
brating
our
stimulus materials
and for
measuring
the
performance
of our
sub-
jects.
In the
interests
of
communica-
tion
I
have suppressed
the
technical
de-
tails
of
information measurement
and
have tried
to
express
the
ideas
in
more
familiar
terms;
I
hope
this
paraphrase
will
not
lead
you to
think they
are not
useful
in
research. Informational con-
cepts have already proved valuable
in
the
study
of
discrimination
and of
lan-
guage; they promise
a
great deal
in the
study
of
learning
and
memory;
and it
has
even been proposed that they
can
be
useful
in the
study
of
concept for-
mation.
A lot of
questions that seemed
fruitless
twenty
or
thirty years
ago may
now
be
worth another look.
In
fact,
I
feel
that
my
story
here must
stop
just
as it
begins
to get
really
interesting.
And
finally,
what about
the
magical
number
seven? What about
the
seven
wonders
of the
world,
the
seven
seas,
the
seven
deadly
sins,
the
seven daugh-
ters
of
Atlas
in the
Pleiades,
the
seven
ages
of
man,
the
seven levels
of
hell,
the
seven primary colors,
the
seven notes
of
the
musical
scale,
and the
seven
days
of
the
week? What about
the
seven-
point rating scale,
the
seven categories
for
absolute
judgment,
the
seven
ob-
jects
in the
span
of
attention,
and the
seven
digits
in the
span
of
immediate
memory?
For the
present
I
propose
to
withhold judgment.
Perhaps
there
is
something
deep
and
profound
behind
all
these
sevens, something just calling
out
for
us to
discover
it. But I
suspect
that
it is
only
a
pernicious,
Pythagorean
coincidence.
REFERENCES
1.
BEEBE-CENTER,
J. G.,
ROGERS,
M. S., &
O'CoNNELL,
B.
N.
Transmission
of in-
formation
about
sucrose
and
saline solu-
tions
through
the
sense
of
taste.
J.
Psychol.,
19SS,
39,
157-160.
2.
BOUSFIELD,
W.
A.,
&
COHEN,
B. H. The
occurrence
of
clustering
in the
recall
of
randomly
arranged
words
of
different
frequencies-of-usage.
/.
gen.
Psychol.,
1955,
52,
83-9S.
3.
CARMICHAEL,
L.,
HOGAN,
H. P., &
WALTER,
A. A. An
experimental study
of the
effect
of
language
on the
reproduction
of
visually
perceived
form.
/.
exp.
Psychol.,
1932,
IS,
73-86.
4.
CHAPMAN,
D. W.
Relative
effects
of de-
terminate
and
indeterminate
Aufgaben.
Amer.
J.
Psychol, 1932,
44,
163-174.
5.
ERIKSEN,
C. W.
Multidimensional stimu-
lus
differences
and
accuracy
of
discrimi-
nation.
VSAF,
WADC
Tech.
Rep.,
1954,
No.
54-165.
6.
ERIKSEN,
C.
W.,
&
HAKE,
H. W.
Abso-
lute
judgments
as a
function
of the
stimulus
range
and the
number
of
stimulus
and
response categories.
/.
exp.
Psychol.,
1955,
49,
323-332.
7.
GARNER,
W. R. An
informational
analy-
sis of
absolute judgments
of
loudness.
/.
exp.
Psychol.,
1953,
46,
373-380.
8.
HAKE,
H.
W.,
&
GARNER,
W. R. The ef-
fect
of
presenting various numbers
of
discrete
steps
on
scale
reading
accuracy,
/.
exp.
Psychol.,
1951,
42,
358-366.
9.
HALSEY,
R.
M.,
&
CHAPANIS,
A.
Chro-
maticity-confusion
contours
in a
com-
plex
viewing situation.
J.
Opt. Soc.
Amer.,
1954,
44,
442-4S4.
10.
HAYES,
J. R. M.
Memory span
for
sev-
eral vocabularies
as a
function
of vo-
cabulary
size.
In
Quarterly
Progress
Report,
Cambridge, Mass.: Acoustics
Laboratory, Massachusetts
Institute
of
Technology,
Jan.-June,
1952.

Discussion

And finally, what about the magical number seven…. Seven still remains in popular culture and academia as a "magical" number, much to the chagrin of Miller who later joked that he felt stalked and harassed by the integer... Interesting thought: “We might argue that in the course of evolution those organisms were most successful that were responsive to the widest range of stimulus energies in their environment. In order to survive in a constantly fluctuating world, it was better to have a little information about a lot of things than to have a lot of information about a small segment of the environment.” Here is a bit more background on Claude Shannon’s seminal work on information theory: “A Mathematical Theory of Communication”: https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf George Miller (1920-2012) was an American psychologist, national medal of science winner, and among the founders of cognitive psychology, cognitive science and psycholinguistics. Miller helped move psychology into the realm of mental processes and helped align that move with information theory, computation theory, and linguistics. For more on Miller: https://en.wikipedia.org/wiki/George_Armitage_Miller In information theory, a bit is the entropy of a binary random variable that can be 0 or 1 with equal probability. For more on the bit, read here: https://en.wikipedia.org/wiki/Bit Strange that the “Air Force Operational Applications Laboratory” was one of the active groups in this area… Here is where Miller first creates the analogy to information theory. Previously, psychology was disconnected from the realm of mental processes, but he made things a bit more quantitative with this integer… Information theory was raging at the time of this paper, and Claude Shannon even wrote an article about how too many people were just applying information theory to their fields, even if it did not make sense. Here is that paper: “The Bandwagon”: https://fermatslibrary.com/s/the-bandwagon The chunk is a fundamental unit that Miller introduces in this paper, that will go on to be used in psychology to this day. This is also a very important concluding point: “First, the span of absolute judgement and the span of immediate memory impose severe limitations on the amount of information that we are able to receive, process, and remember. By organizing the stimulus input simultaneously into several dimensions and successively into a sequence of chunks, we manage to break (or at least stretch) this informational bottleneck.” Channel capacity is the tight upper bound on the rate at which information can be reliably transmitted over a communication channel, and Claude Shannon defined the notion of channel capacity and a mathematical model by which it could be computed in 1948. For more on channel capacity: https://en.wikipedia.org/wiki/Channel_capacity Tl;dr: this paper introduces what is sometimes called “Miller’s law”, which argues that the number of objects an average human can hold in their short term memory is 7 ± 2. It brings information theory to memory, with the main unit of information being the bit- the amount of data required to choose between two equally likely choices. Miller conducted different experiments across different tasks to reach the magical number! Before this paper, while psychologists distinguished between short and long term memory ,the limits to short-term memory were not known or well studied. This is one is one of the most highly cited papers in psychology, with over 37K citations. More here: https://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus_or_Minus_Two