558
IEEE TRANSACTIONS
ON
PATTERN ANALYSIS AND MACHINE INTELLIGENCE.
VOL.
12.
NO.
6.
JUNE
1990
1.
d4
Qf6
2.
cl
c3
3,
dS
b3
I.
a‘3
1.
d4
mf6
2.
cl
e3
3.
da
bl
1.
mrl3
Fig. 14. A line of text containing enough impasses to induce failure of the
semantic analysis. The top line is the original bitmap, after dirt frag-
ments have been deleted, sometimes erroneously
(as
in
“f”).
The bot-
tom line shows the classifier’s top choices, with bad matches enclosed
in boxes.
seconds per game, but of course with a large variance
since several games required up to one CPU minute and
a few (not averaged in) never terminated normally.
IX.
DISCUSS~ON
The
Chess
Informant
is an unusual object of machine
vision in that its content has effectively- and efficiently-
computable semantics. The remarkably high competency
achieved by exploiting this fact should encourage further
attacks on images with similarly conventional, unambig-
uous
consistency constraints.
In other respects, however, the
Informant
is represen-
tative of a broad class of printed documents. Nonrectili-
near layout, tight column- and line-spacing, and broken,
touching, or dirty character-images all occur in other doc-
uments. For this reason, we believe that several of the
methods first applied here will be useful in achieving high
performance in the general case.
The layout analysis approach
is
unusual in following a
strictly top-down policy. Relying on a minimum of
a
priori
assumptions, it is characterized by a sequence of
refinement steps ordered
so
that the broadest statistical
support may be applied to the inference of layout param-
eters.
So,
for example, text line orientation is estimated
from evidence distributed over the whole page, word-
break spacing over each column, and baseline is deter-
mined by consensus within entire text lines. The success
of this strategy, without recourse to backtracking, in the
face of manifold layout distortions is largely due to the
accuracy of the skew-finding method.
We attempted to base as much
of
the error management
as possible on a single robust measure of merit. The clas-
sifier’s template match score, occasionally modified (by,
e.g., height above baseline) or combined (to score words
and ply), was used to control virtually all later analysis.
The attempt to apply semantic interpretation to long se-
quences of text ran a substantial risk of intolerable run-
times due to combinatorial explosion. Without a basis of
consistently good results from the early states of layout
analysis and character-shape recognition, the effort would
have collapsed.
A striking feature of the engineering design is a pro-
gression from fast heuristics in the early stages to slower
and more exhaustive algorithms at the end. For example,
it is not until text has been segmented into words that a
nearly-exhaustive search is unleashed to fix broken,
touching, and dirty characters. Full backtracking is re-
served for the last stage of semantic processing after sev-
eral kinds of global and local context has been applied to
prune interpretations to a manageable number.
ACKNOWLEDGMENT
An earlier version of the paper appeared as
[3].
The
character-splitting heuristic was a variant of one devel-
oped by
S.
Kahan. The UNIBUS interface for the Ricoh
scanner was designed by
J.
Condon and built by W. Moy.
Its UNIX driver was written by
L.
Cherry.
REFERENCES
[I]
H.
S.
Baird, “The skew angle
of
printed documents,” in Proc. SPSE
40th Con$ Symp. Hybrid Imaging Systems, Rochester, NY, May
121
W.
W.
Bledsoe and I. Browning, “Pattern recognition and reading
by machine,” in 1959 Proc. Eastern Joint Computer Conf., 1959.
131
H.
S.
Baird and K. Thompson, “Reading chess,” in Proc. IEEE
Comput. Soc. Workshop Computer Vision. Miami Beach, FL, Nov.
30-Dec. 2, 1987.
[4] J.
H.
Condon and K. Thompson, “Belle,” in Chess Skill in Man and
Machine, P. Frey, Ed.
[5] Chess Informant. vols. 29, 30, 33. Beograd, Yugoslavia: Sahovski
Informator. 1980- 1982.
[6] The Chicago Manual of
Style.
13th ed. Chicago,
1L:
University of
Chicago Press, 1982, p. 254.
[7]
J.
J.
Hull.
and
S.
N.
Srihai-i, “Experiments in text recognition with
1987, pp. 21-24.
New York: Springer-Verlag, 1982.-
binary n-gram and Viterbi analysis,”
IEEE
Trans. Pattern Anal. MA-
chine Intell.
,
vol. PAMI-4, pp. 520-530, Sept. 1982.
S.
Kahan,
T.
Pavlidis, and
H.
S.
Baird, “On the recognition
of
printed
characters
of
any font and size,” IEEE Trans. Pattern Anal. Machine
Intell.,
vol. PAMI-9, no. 2,
pp.
274-288, Mar. 1987.
H.
0.
Kida,
0.
Iwaki. and
K.
Kawada. “Document recognition sys-
tem for office automation.” in Proc. 8th
Inr.
Con$ Pattern Recog-
nition. Paris. France. Oct. 1986.
pp.
446-448.
E.
Meynieux,
S.
Seisen, and
K.
Tombre, “Bilevel information cod-
ing in office paper documents,” in Proc. 8th
Int.
Con$
Pattern Rec-
ognition, Paris, France, Oct. 1986, pp. 442-445.
W.
S.
Rosenbaum and J. J. Hilliard, “Multifont OCR postprocessing
system,”
IBM
J.
Res.
Develop.,
vol.
19,
pp.
398-421, July 1975.
H. F. Schantz, The History of OCR, Recognition Technologies Users
Assoc., 1982.
S.
N. Srihari,
J.
J. Hull, and R. Choudhari, “Integrating diverse
knowledge sources in text recognition.” ACM Trans.
Ofice
Inform.
S~sr.,
vol.
I,
pp.
68-87, Jan. 1983.
R. Shinghal, and G. T. Toussaint, “Experiments in text recognition
with the modified Viterbi algorithm,” IEEE Trans. Pattern Anal. Ma-
chine lnte/l., vol. PAMI-I, pp. 184-193, Apr. 1979.
S.
N. Srihari and
G.
W.
Zack, “Document image analysis,” in Proc.
8th
Int.
Con$
Pattern Recognition, Paris, France, Oct. 1986,
pp.
434-
436.
1.
R. Ullman, “A binary n-gram technique for automatic correction
of substitution, deletion, insertion, and reversal errors in words,”
Comput.
J.,
vol. 20,
pp.
141-147, 1977.