A NEW METHOD
OF
RECORDING AND
SEARCHING INFOR MATION
H.
P.
LUHN*
This method applies to
the
procedures re-
quired to record a legend concerning a docu-
ment and to enable an inquirer to locate this
document by means of the legend,
if
it
is
related
to a specified subject.
The conventional methods of indexing and
classifying attempt to evaluate the relative im-
portance of a plurality of aspects contained in
a
document and makes the most important one the
key for locating the document within an orderly
scale of
a
certain dimension. Subordinated as-
pects are covered by way of reference in appro-
priate other locations of
the
scale.
system
is
that the standard of value on which the
indexer bases his decision may change and,
what
suddenly
is
considered an aspect of major sig-
nificance, may not have been included in the
classification or index at
the
time, even though
it
was contained in a document.
Another drawback
is
that
it
becomes diffi-
cult for an inquirer to reverse the process of
classification or indexing and pose his query in
a form matching to a reasonable degree the val-
ues of a potential reference.
acterizing a topic by a set of identifying ele-
ments or criteria. These elements may be of
any dimension and as many may be recorded as
is
desirable. Also,
they
are not weighted and no
significance need be implied by the order in
which they are given.
One of the main functions of the new method
is
that of producing a response to an inquiry in
all cases, even
if
the reference appears to be
remote,
it
being the understanding that
it
is
the
closest available.
The elements enumerated by recorders to
identify
a
topic will necessarily vary as no two
recorders
will
view a topic in identical fashion.
Similarly, no two inquirers, when referring to
the same subject
will
state their query in iden-
One of the disadvantages of the conventional
The new method uses the principle of char-
tical fashion.
It
is
therefore important that a
system recognizes that these variations arise
and that they cannot be controlled.
It must then
become the function of
the
system to overcome
these
variations to a reasonable degree.
When identifying a topic by a set of criteria
or identifying terms, the more terms are stated
the more specifically the topic
is
delineated.
Each term in turn may be a concept which in it-
self may vary as to specificity.
If
we consider
a concept
as
being a field in a multi-dimensional
array, we may then visualize a topic as being
located in that space which
is
common to all the
concept fields stated. It may further be visual-
ized that related topics are located more or less
adjacent to each other depending on the degree
of similarity and that this
is
so
because they
agree in some of
the
identifying terms and there-
fore share some of
the
concept fields.
Figure
1
is
a diagrammatic illustration.
The Topic,
Identified
-1.
FIGURE
1.
Other
Topics
In order to understand the nature of the ar-
rangement, let
us
assume a vocabulary of
100
concepts and
let
us
identify a topic by five con-
ceptual terms.
By using all possible combina-
tions of five terms, a total of
75
million patterns
of criteria result, each of these patterns having
a fixed location within
the
system.
Lf
then a
topic is identified by five terms of the vocabu-
lary,
it
is
thereby assigned to a definite one of
these fixed locations.
While assuming that there
is
an ideal and
true location where a topic belongs,
it
is
un-
*International Business Machines Corporation, Engineering Laboratory, Poughkeepsie, New York.
14