Born in 1896, Hans Peter Luhn worked for IBM as a computer scientis...
In the KWIC method Keywords are defined as those which characterize...
It's interesting to note the similarities between the way authors w...
The benefits of KWIC and the fact that it could be easily run using...
It's important to note that one of the problems a user of a KWIC In...
The difference between the results before and after 1958 is clear i...
Today the KWIC index is less relevant as people use other methods t...
Are Titles of Chemical Papers Becoming More Informative?
The efficiency of key-word-in-context (KWIC) permuted-
title indexes and their numerous variations
is
highly
dependent upon authors’ choices of titles for their
papers. Titles are important not only in commercial
services, such as Chemical Titles,
BASIC,
Current Con-
tents, and CA Condensates, but also in scanning pri-
mary journals, and in traditional library services, such
as bibliographies. It
is
generally believed and often
stated that titles of chemical papers are becoming
more informative as authors become increasingly
aware of the importance of titles as “carriers” of in-
formation. The present study was undertaken to test
whether (1
1
titles of chemical papers are becoming
more informative and
(2)
whether uninformative titles
of chemical papers are being eliminated since the
advent of the KWIC index in 1958.
Introduction
The key-word-in-context
(KWIC)
permuted-title index
was introduced by
H.
P.
Luhn in
1958
as
a
prompt,
relatively inexpensive means of building
a
temporary
bridge between the contents
of
the current literature and
readers, awaiting the completion of the more slowly
prepared conventionnl indexes. Luhn himself recognized
that the quality of the
KWIC
index could not equal that
of some
of
the more carefully prepared conventional
indexes. However, because of its low cost and ease
of
preparation, the
KWIC
index and its variations have
replaced the conventional index in many cases.
One
of
the strongest objections voiced against such
indexes is that titles of papers are unsatisfactory as
a
basis
for
subject index entries because titles are not com-
posed with indexing in mind. The adequacy
of
titles as
a
source of subject content clues has been given much
attention
in
the Iast decade, and many studies have
been published on the subject.
t
Present address:
UNESCO,
Paris,
France.
The first hypothesis was tested by comparing titles
published in 1948, 1958, and 1968 by the following
criteria: (1) a count of substantive words in the title;
(2)
a count of
all
word matches between title and
abstract, with and without the use of a thesaurus; and
(3)
a count of word matches between title and 10 lead-
ing substantive words selected from the abstract, with
and without the use of a thesaurus. The second hy-
pothesis was tested by comparing a count of short
titles (with
3
or
less substantive words) published in
1948, 1958, and 1968.
Results confirm that uninformative titles of chemical
papers are being eliminated and that informative
titles are becoming more informative since the advent
of the KWIC index.
JACQUES J. TOCATLIAN
t
Merek
Sharp
&
Dohme
Research Labordories
West
Point,
Pennsylvnnin
The results and conclusions
of
these studies
on
the
adequacy
of
titles vary. There
is,
however, a recognition
of the obvious: The efficiency
of
a
KWIC
index is depen-
dent upon authors’ choices of titles. In fact, the impor-
tance of informative titles has been stressed in the litera-
ture
for
many years.
There developed
a
frequently stated conviction that the
very existence
of
permuted-title indexes would stimulate
authors to use better titles for their papers, as exemplified
by
J.
D.
Black’s statement in
1062:
“Before long the
engineer, scientist,
or
mathematician will realize that
if
his title is not descriptive enough his paper will not
be used
as
much as it might be.”
The belief that titles would become more informative
is sometimes translated into statements that they actually
did. However, none of the statements made verbally
at
professional meetings or in writing were based
on
statisti-
cal evidence.
Recognition that evidence in support
of,
or in contra-
diction to, the belief that titles are becoming more infor-
mative could be important in future index building led
to the undertaking of the present study.
Journal
of
the American Society for Information Science
-
September-October
1970
345
Titles are important
im
“carriers” of information not
only in permuted-title indexes, such as
Chemical Titles
and
BASIC
(the index to
Biological Abstracts),
but also
in tape services, such as
CA
Condensates,
and in widely
read alerting services, such as
Current Contents,
Titles
are important for readers who scan the primary journals,
and in traditional library services, such
as
bibliographies
and lists of references.
It
seemed important to ask at this point, Are titles of
chemical papers really becoming more informative
?
The present study tests the following two hypotheses:
1.
As
rated by the criteria adopted and validated,
titles
of
chemical papers are becoming more informative
since the advent of the
KWIC
index.
2.
As rated by the criterion adopted, uninformative
titles of chemical papers are being eliminated since the
advent of the
KWIC
index.
Both hypotheses
play
a part in how informative titles
are: Elimination of uninformative titles results in a
proportional increase in informative ones; independently,
informative titles may become more informative.
Hypothesis
No.
1
The first hypothesis (titles are becoming more infor-
mative) was tested by using
5
criteria,
or
measurements,
to compare titles
of
chemical papers published in 1948,
1958, and 1968.
CRITERIA
Measurement
A:
all substantive
words-a, count
of
all
substantive words in the title.
For
the purpose
of
this work it was found practical to
isolate substantive words by rejecting all nonsubstantive
words. These were defined as noninformative words that
convey little
or
no information by themselves, such
as
articles, preposit,ions, conjunctions, pronouns, and auxili-
ary
verbs. An example will be given shortly.
Measurement
B:
all matches-a
count of the number
of
substantive words in the title that match corre-
sponding words in the abstract, using
a
thesaurus
concept in making matches.
The thesaurus allowed
a
matach between the singular
form of
a
word and its plural form; between synonyms;
beheen specific and generic terms; between
a
chemical
compound and its formula; between verbs in different
ten.ses; and, in general, between terms conveying the
same concept. Multiple appesrances of key words in
titles were counted
as
one appearance.
Measurement
C:
level matches-a
count
of
the number
of suhstant,ive words in the title that match identical
substant,ive words in the abstract, without using
a
thesaurus.
In this case nouns in singular and plural forms
or
verbs
in different tenses were not counted
as
matc,hes; neither
were synonyms nor chemical compounds and their
formulas.
The difference between measurements
B
and
C
is the
use in
B
of
a
thesaurus in making mntches.
Before subjecting
a
title
to
measurements
D
and
E,
each abstract was rend and
10
substantive words that
seemed t,o describe best t,he subject content
of
the ab-
stract were selected (referred
t’o
as
10
“leading” terms).
The choice was subjective, in much the same way an
indexer selects indexing terms to describe
a
paper.
Measurement
D:
all leading matches-a
count of the
number of substantive words in the title that match
any of the
10
“leading” terms, using
a
thesaurus.
Measurement
E
:
level leading matches-a
count of the
number
of
substa,ntive words in the title that match
my of the
10
“leading” t,erms, without using
a
thesaurus.
The results of the following title subject.ed
to
mea.sure-
ments
A
through
E
are shown in Table
1.
TABLE
1.
Results
of
measurements
A
through
E
made
on
a
sample
title
A
C
D
E
All
B
All
10
All Level
substantive All level leading leading leading
words matches matches terms matches matches
A=6
BE5
c=2
D
z=
zt
Ex2
-
Field Field
Method Test,
Rapid
-
Field
-
-
Analysis
Test
Color
Determination
Determining
-
Determination
Hydrogen cyanide Hydrogen cyanide Hydrogen cyanide Hydrogen cyanide
Air Atmospheres Atmospheres
Industrial
Paper
Ferrous sulfate
Sodium hydroxide
-
Method
Determination
Hydrogen cyanide
Air
-
_.
-
-
-
-
-
-
Determination
Hydrogen
cyanide
-
-
-
-
-
346
,Journal
of
the American Society for Information Science
-
September-October 1970
A FIELD METHOD
FOR
THE RAPID
DETERMINATION
OF
HYDROGEN
CYANIDE IN AIR
B.
E. DISON, G. C.
HANDS
and A.
F.
F.
BARTLETT
(Department of the Government Chemist,
Clement’s Inn Passage, Strand,
W.C.2)
A
field test
for
determining small amounts
of
hydro-
gen cysnide in industrial atmospheres is based on
formation of
a
Prussian blue colour on test paper
impregnated with ferrous sulphate and sodium hydros-
ide. The test is specific for hydrogen cyanide. The
behaviour
of
a number of possibly interfering gases
has been investigated. The test is sensitive to slightlv
less than
1
p.p.m. of hydrogen cyanide in air and
has
an error of
10
to
20
per cent. The blue stains ob-
tained are permanent. Test papers properly prepared
and stored retain their activjtv for at least
10
months.
Analyst,
S3
:
199 (1958)
The following terms, shown in Table
1,
were considered
determination
=
determining
air
=
atmosphere
equivalent
:
method
=
test
SELE~TION
OF
SAMPLE
they be
:
The criteria
for
journals to be inclnded were that
1.
In
the fieId of chemistry
2.
In the English language
3.
Published continuously since
1948
4.
Committed to publishing an abstract
or
summary
The universe selected for this study was confined to the
chemical journal collection
of
an Industrial Research
Library (Merck
&
Co., Inc.; Rahway,
N.J.).
From the
periodical holding list
of
the
736
journals in this library,
it was found that
96
were in English with issues back to
at
least
1948.
Of
these,
64
carried
an
abstract with each
article. Ten out of the
64
dealt with chemistry and,
therefore, constituted the journal sample for this study.
The following journals fulfill the selection criteria and
are complete for the years
1943, 1958,
and
196S,
except
as noted in parentheses:
with every article
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Acta Chemica Scandinavica
(1968
:
missing August
through December)
Anal yst
(
1968
:
missing December)
Analytical Chemistry
Discussions of the Faraday Society
(1967:
com-
plete)
Journal
of
the American Chemical Society
Jouriial
of
the Chemical Society
(1968:
missing
BII,
C22,
and C25)
Journal
of
Organic Chemistry
Journal
of
Physical Chemistry
Journal of Research of the National Bureau
of
Standards
Transactions of the Faraday Society
(1968:
missing
January through June and December)
Using a table of random numbers,
10
articles
for
each
of the
3
years
for
each of the
10
journals were selected.
The total number
of
articles per journaI was, therefore,
30,
:tnd the total number of articles in the sample was
300.
VAIdID.4TION
OF
blEASUREMENTS
A
THROUGH
E
Before any measurement was taken, the
300
titles
constituting the sample were read and
a
subjective selec-
t.ion
%as
made
of
20
informative titles and
20
uninforma-
tive titles. The articles themselves, or their abstrack,
were not rend; the subjective choice was made based
exclusively on titles. No attempt was made to select the
20
best,
or
most
informative titles, and the
20
least
informative; but, rather, tlhe first
20
titles encountered
that, seemed
tlo
convey a fair amount of information about
the article and, conversely, the first
20
titles that seemed
too general, vague, ambiguous,
or
uninformative were
chosen. These tit,les were used for the validation of mea-
surement.s; some examples follow.
Informative Titles
1.
Dinickel Phosphide as
a
Heterogeneous Catalyst
for
the Vapor Phase Reduction of Nitrobenzene With
Hydrogen to Aniline and Water
(Journal
of the Ameri-
can Chemical Society)
2.
The Heat Capacity, Heat of Fusion and Entropy
of
Benzene
(Journal
of
the American Chemical Society)
3.
The Kinetics of Nitrogen Formation From Nitrous
Acid and Ammonium or Methylainmonium Ions
(Jour-
nal of Physical Chemistry
j
4.
’I’ulcanization of Synthetic Rubbers by the Pea.chy
Process
(Journal of Research
of
the Na.tiona1 Bureau
of
Standards)
5.
The Determination
of
Magnesium
in
Silicate and
Carbonate Rocks by the Titan Yellow Spectrophot,o-
metric Method
(Analyst)
Uninforni ative Titles
1.
A Note on the Chemistry of Enterogastrone
(Acta
Chemica Scandinavica)
2.
Variability and Inhomogeneity
of
Aluminum Dilau-
rate
(Journal of Physical Chemistry)
3.
Some Physical Properties of Uranium Hexafluoride
(Transactions of the Faraday Society)
4.
Studies on the Chemistry of Lichens
(Acta Ch.emica
Scandinavica)
5.
The Preparation of Some Iodinated Phenylalkanoic
.\cids
(Journal of Organic Chemistry)
Table
2
shows
the resilks obt,ained when these
20
informnt.ive and
20
uninformative t,itles were subjected
t,o
mensurenients
A
t,hrough
E.
TABLE
2.
Measurements
A
through
E
for
informat,ive
and
uninformative titles
9
All
B
C
D
E
All
Level
stantive
All
Level leading leading
words
matches matches matches matches
Type
of
title sub-
Mean value
of
20
infor-
5.a
5.3
5.3
5.0
4.6
mative
titles
Mean value
of
20
uninfor-
mative titles
4.4
2.2
1.5
2.1
1.6
Difference between infor-
mative and uninforma-
tive titles
1.4
3.1
3.6
2.9
3.0
Journ;tl of thc -4merican Society
for
Informntion Science
-
September-October
1970
31;
The standard analysis of variance technique was used
for
all
measurements. The differences in measurements
A
through
E
between the
20
informative and
20
unin-
formative titles are statistically significant (probability
less than
0.001).
These results support the validity of
the criteria adopted for the anaIysis of titles.
It
might be suggested tha,t these criteria be used as
a
general method for evaluating titles.
It
seems that this
general method could very well be applied to the analysis
of titles in other disciplines, such as Biology and Engi-
neering, and the results compared to those presented here.
RESULTS
Table 3 shows the mean values of measurements
A
through
E
for the
10
journals. In all cases no statistically
significant difference was detected between 1948 and
1958, whereas,
a
statistically significant difference was
found between 1958 and 1968 for'all measurements.
Table 4 shows the difference between decades in mean
vdues of
A
through
E.
We can see that titles
of
chemical
papers in
1968
contained, on the average,
a
statistically
significant increase in the number of substantive words
over those of 1958
or
1948 and
a
statistically significant
increase in the number of substantive words that matched
corresponding words in the abstract over those of 1958
or
1948. When substantive words in the title were compared
with
10
leading terms selected subjectively from the ab-
stract to describe its content, there was
a
statistically sig-
nificant increase in the number
of
matches in 1968 chemi-
cal papers over those
of
1958
or
1948.
Figure
1
shows
a
plot
of
the mean values of measure-
ments
A
through
E
versus time. We can, therefore, con-
clude that
our
first hypothesis is valid:
As
rated by the
criteria adopted, titles
of
chemical papers are becoming
more informative
since
the advent of the
KWIC
index.
Hypothesis
No.
2
To
test the second hypothesis (uninformative titles
of
chemical papers are being eliminated since the advent of
the
KWIC
index), we compared
a
count
of
short titles
(with three
or
less substantive words) published in 1948,
TABLE:
3.
Mean
values
of
measurements
A
through
E
for
the
10
journals
1100
titles
per
yearl
Measurement
1968 1958 1948
A:
All substantive
words
6.77 5.46 5.57
B:
All matches
4.83 4.02 4.13
C:
Level matches
4.17
3.39 3.64
D:
All leading matches
4.35 3
65
3.62
E:
Level leading matches
3.67 3.01
3.08
TABLE
4.
Difference between decades
in
mean values
of
A
through
E
[Mean values are given
for 100
titles
per
yearl
A
All
substan-
tive
Difference
words
1968-1958 1.31
1968-1948 1.20
1958-1948
-0.11
1)
All
B c
All Level leading
matches matches matches
0.81
0.78 0.70
0.70
0.53 0.73
-0.11
--0.25 0.03
___I
E
Level
leading
matches
0.66
0.59
-4.07
1958, and 1968. The following is
a
list of examples of
titles with three
or
less substantive words:
1.
New Hafnium Phosphides
(Acta Chemica Scandi-
navica,
1968)
2.
3,3-Dinitro-l-AlkanoIs
(Journal
of
Organic Chem-
istry,
1958)
3. Slow Combustion of Methyl Ethyl Ketone
(Trans-
actions
of
the Faraday Society,
1968)
4. Heats of Adsorption
(Journal
of
Physical Chem-
istry,
1948)
5.
Algenic Acid Acetate
(Journal
of
the Chemical
Society,
1948)
Scandin.avica,
1958)
6.
Recent Studies on Sialic Acid
(Acta Chemica
7. The Viscosity of Dilute Emulsions
(Transactions
8.
Some
N-Arylsulfonyl-N'-Alkylureas
(Journal
of
of
the Faraday Society,
1958)
Organic Chemistry,
1958)
0.8
6.6-
6.4-
A
=
All
Substantive
lords
B
=
All
nldlches
C
=
Level
matches
D
=
All
leading
matches
I1
E
i_
Lerel
leading
matches
S.D
{
4-*/
4.6
3.8
:::p;z
3.2
----
________
~
1948
I
qfa
mtS
Year
3.0
FIG.
1.
Mean values
A
through
E
versus
time
:{-ks
Journal
of
the American Society for Information Science
-
September-October 1970
9. Phase Equilibrium Description
(JournaZ
of
Physi-
cal Chemistry,
1948)
10. Determination of AsDartic Acid
(Analutical Chem-
istry,
1948)
the Chemical Societu.
1958)
11. Diffusion Studies With Lysolecithin
(Journal
of
12.
Svnthesis of KetodGxtrans’
(Acta Chemica Scandi-
nbvicn.
1968)
13. Octaethylporphyrin
(Journal
of
Organic Chemis-
tru.
1968)
14. M&oeffusiometry
(Analytical Chemistry,
1948)
15.
Investigations in Serum Copper
(Acta Chemica
Scandinavica,
1948)
How many substantive words are needed to make an
informative title
?
It
would be very difficult,
if
not impossible, to state the
answer quantitatively. However,
“six
substantive words”
has been stated arbitrarily in the literature as a minimum
number to make an informative title.
Chemical Titles,
for example, had, in 1963, an average of six key words
per
title.
In the present study, a sample of
20
titles selected sub-
jectively as being informative showed an average of
5.8
substantive words per title. On the other hand,
a
sani-
ple of 20 titles selected subjectively as being uninforma-
tive showed an average
of
4.7 substantive words per title.
It
seemed logical to assume that titles with three or less
substantive words are most likely uninformative, the
thinking being that if six substantive words represent an
average, four to eight could be taken as the range.
It
was reasoned that by counting the number of titles
with three
or
less substantive words in
a
given journal
for
1968, 1955, and 1948, and comparing the percentages of
such titles, it could be established whether there has been
a discernible tendency for the uninformative title to dis-
appear.
If
uninformative titles are being eliminated, this
is one means by which titles are becoming, overall, more
informative
By this criterion, the title of this article would not rate
as informative. In fact, it is not a good title; it asks
a
question but does not indicate what was done.
SELECTION
OF
SAMPLE
Every article published in 1948, 1958, and 1968 for the
10
journals selected for the study was taken into account.
The only exception was the articles in the issues missing
from the collection at the time of sampling and the arti-
cles in French
or
German published in
Acta Chemica
Scandinavica.
Number of articles for 1948..
................
3,061
Number of articles for 1958..
................
5,780
Number of articles for 1968..
................
7,499
Total number of articles 1948-68..
.......
.16,340
RESULTS
By observing the data obtained (Table 5), it was ob-
vious that journals fell in three general groups. The first
group
(Journal
of
Organic Chemistry)
shows
31.7%
titles
with three
or
less substantive words in 1948, decreasing
to 29.6% in 1958, and dropping steeply to 15.9% in 1968.
It
would seem that any improvement observed in the
titles of this first group from 1948 to 1968 would derive
from the elimination of short titles
(
Ti2,).
Measurements
A
through
E
for
this group did not detect an improve-
ment in the titles (statistically significant difference in
any
A
through
E
value over the years).
The second group consists of the three fundamental
journals
Acta Chemica Scandinavica, Journal
of
the
American Chemical Society,
and
Journal
of
the Chemical
Society.
The mean value of
TI,,
for
this group decreased
from
24.5% in 1948, to
13.3%
in 1958, to 10.9% in 1968.
This group
of
journals had a lower percentage
of
short
titles
for
the years 1948, 1958, and 1968 than group
I.
TARLE
5.
Mean
values
(90%
confidence
limits)
of
the number
of
titles with three
or
less substantive words
Mean
value
of TI,
~.
-
Journal
1968 1958 1948
Group
I
(Journal
of
Organic
29.6% (27.0270, 32.25%)
Group
I1
(Acta Chemica
10.9%
(10.1352, 11.70%)
13.3%
(12.34%, 14.29%)
31.7% (25.54%,
38.22%)
Chemistry
)
15.9% (14.24%, 17.64%)
Scnndinavica, Journal
of
the
American Chemical Society,
and
Journal
of
the Chemical
Socsety)
Chemistry, Discussions
of
the Faraday Society, Journal
of
Physical Chemistry,
Journal
of
Research
of
the
National
Bureau
of
Standards,
and
Transactions
of
the
Faraday
Socze
t
y
)
24.5% (22.94%, 26.10%)
14.4% (12.52%, 16.39%)
Group
I11
(Analyst, Analytical
4.1% (3.407‘0, 4.8570)
9.870
(8.640/0, 11.02%)
-
~-
___.~______~
-
.Journal
of
the American Society
for
Information Science
-
September-October 1670
349
The elimination of short titles between 1958 and 1968,
however, was not as large as for group
I.
The difference
for group
I1
was observed between 1948 and 1958. Of the
three journals in this group,
Journal
of
the American
Chemical Society
rated among the best also in criteria
A
through
E.
The third group of journals had the lowest mean value
of
short titles
(TI.,,)
in 1948, 1958, and 1968, and showed
the largest elimination of short titles: from a value of
14.4% in 1948 to a value of 4.1% in 196s. Two of these
journals, however,
(Discussions of the Faraday Society
and
Transactions of the Faraday Society)
rated among
the lowest for measurements
A
through
E.
This may in-
dicate that elimination of short titles does not necessarily
mean that the remaining informative titles have become
more informative.
On examining Fig.
2,
it might be said with confidence
that short titles are definitely being eliminated.
It
is
difficult to say, however, what effect the advent of the
KWIC
index has had on these changes. The curve for
group
I
is the only one that shows an abrupt decrease in
short titles after 1958. Group
I1
shows the opposite ef-
fect,
as
though these journals eliminated their short titles
before the
KWIC
index made its appearance. Group
I11
shows
s
steady decrease from 1948 to 1968, without any
effect that could be attributed to the appearance
of
permuted-title indexes around 1958.
11
0
19
R
lV>R
iw
YeLP
FIG.
2.
Mean values
of
TI,
(number
of
titles with three
or
less substantive
words)
versus time
Therefore, we can only conclude:
As rated by the
criterion adopted, uninformative titles of chemical papers
are being eliminated.
There was
a.
statist,ically sig:nificant decrease in the
number
of
short titles (with t.hree
or
less substantive
words) bet,ween 1958 and 196s and between 1948 and
1958 for all journals test'ed.
The elimination of short title:; in a given journal from
1948 to 1968 does not necessari1.y correspond to
a
statis-
tically significant increase in values
A
through
E.
In
other words, there are a.t least
two
distinct changes that
may occur in titles: Informative titles may become more
informative, and uninformative t.itles may be .eliminated.
These two changes do not necessarily coincide.
Limitations
of
This
Study
The differences observed when comparing changes in
titles of chemical papers suggest that journals fall into
certain patterns. The homogeneity of the field covered,
the degree of standardization
of
the particular termin-
ology, the average length of terms particular to the field,
and the editorial policy of the journal are but
a
few of
the factors that make the interpretation of the observed
patterns difficult. It is not possible, therefore,
to
draw
a
conclusion from the differences observed among journals.
Some, such as
Journal of Organic Chemistry,
strongly
suggest that the field of chemistry is too heterogeneous
and should be subdivided in further studies
of
this kind
to be more meaningful.
IJnderlying our measurements
B
and
C
(matches made
between title and abstract) is the following assumption:
For
a
given journal, the abstracts accompanying an arti-
cle have remained essentially the same in quality from
1948 to 1968. The abstract has been considered
a
con-
stant entity against which potentially changing titles were
compared.
It
is
probable that, some changes have oc-
cured also in the abstract from 1948 to 1968, but such
changes are difficult to measure
or
estimate.
Decisions concerning whether
a
word is to be counted
as
s
substantive word, whether two words are to be con-
sidered synonyms, or whether two hyphenated words con-
stitute one
or
two substantive words are inherently sub-
jective.
The
task of writing explicit and unambiguous
rules to be followed in these decisions posed
a
problem.
Stating these rules with clarity and precision for possible
replication of the study by another worker was found
diAicult.
Conclusion
The statistical significance
of
the differences we
mea-
sured is such that we can say with confidence: "Yes, ti-
tles of chemical papers are becoming more informative!
"
350
Journal of the American Society for Information Science
-
September-October
1970

Discussion

It's interesting to note the similarities between the way authors were tweaking the titles of the papers for KWIC index optimization and the way websites are created today with Search Engine Optimization (SEO) in mind since the beginning. Papers with no keywords in the title would be harder to find, making them less visible to the rest of the scientific community. It's important to note that one of the problems a user of a KWIC Index faces is that of synonyms and variations in word usage and spelling. It must however be assumed that the expert in his field is sufficiently familiar with such variations and is resourceful enough to overcome this problem, as he had to in the past with other indexing methods. The difference between the results before and after 1958 is clear in terms of increasing number of informative words in the titles. As for the decrease in number of papers with uninformative titles, it is harder to attribute it just to KWIC. There are other factors that could have influenced the shape of the curves such as the increased specialization and the growing complexity of the field (the number of papers with less informative titles was decreasing because scientists were diving deeper in the field and studying increasingly more complex phenomena). Today the KWIC index is less relevant as people use other methods to find papers such as Google and paper search engines. It would be interesting to do a similar study and see if titles have changed over the years to make them perform better in terms of search engine results. Are papers changing because of SEO? The benefits of KWIC and the fact that it could be easily run using the computers at the time helped spread its adoption accross several institutions (doing a manual KWIC index of the Bible would have taken 50 scholars ten years to complete). Naturally IBM used it to index several internal documents, but several other institutions followed: the American Chemical Society used it to index Chemical Titles in papers, Stanford University used it to index Dissertations in Physics. In the KWIC method Keywords are defined as those which characterize a subject more than others. Due to the difficulty in predicting significance, it is more practical to define a group of non-significant or common words for a certain field, which are then rejected (common words like "a,the,and,of" as well as certain adjectives and words like "report","analysis","theory" etc). After the common words are removed from the text, the remaining keywords would be extracted from the text together with a certain number of words that precede and follow them. By making the keywords assume a fixed position within the extracted portions and by arranging these portions in alphabetic order of the keywords, the KWIC Index is generated. For example, if words to ignore are "the, of, and, as, a" and the list of titles is: - Descent of Man - The Ascent of Man A KWIC-index of these titles might be given by: the ASCENT of man DESCENT of man descent of MAN the ascent of MAN The process may be applied to the title of an article, its abstract or its entire text. Additionally, the code for producing this type of index was fairly straight-forward for the computers at the time, with rules for the punching of cards available to be used in IBM computers. Born in 1896, Hans Peter Luhn worked for IBM as a computer scientist and later as a manager of the information retrieval research division. Luhn was the son of a German printer and used to spent countless hours in German libraries growing up, which a lot of people speculate, might have fostered his interest in organizing information. ![Hans Peter Luhn](https://spectrum.ieee.org/image/MzAxMTAxMw.jpeg) Luhn observed that the pace of scientific developments was accelerating and that it was accentuating the perishable nature of new information. People wanted to find relevant information faster with an increasing body of knowledge, which was causing the standard conventional methods of indexing literature to break! With this in mind, Luhn introduced a new indexing method called Keyword-in-Context (KWIC) in 1958 at the annual meeting of the American Chemical Society ([here's Luhn origin paper](http://altaplana.com/Luhn-KWICindexing.pdf)).