A NEW METHOD

OF

RECORDING AND

SEARCHING INFOR MATION

H.

P.

LUHN*

This method applies to

the

procedures re-

quired to record a legend concerning a docu-

ment and to enable an inquirer to locate this

document by means of the legend,

if

it

is

related

to a specified subject.

The conventional methods of indexing and

classifying attempt to evaluate the relative im-

portance of a plurality of aspects contained in

a

document and makes the most important one the

key for locating the document within an orderly

scale of

a

certain dimension. Subordinated as-

pects are covered by way of reference in appro-

priate other locations of

the

scale.

system

is

that the standard of value on which the

indexer bases his decision may change and,

what

suddenly

is

considered an aspect of major sig-

nificance, may not have been included in the

classification or index at

the

time, even though

it

was contained in a document.

Another drawback

is

that

it

becomes diffi-

cult for an inquirer to reverse the process of

classification or indexing and pose his query in

a form matching to a reasonable degree the val-

ues of a potential reference.

acterizing a topic by a set of identifying ele-

ments or criteria. These elements may be of

any dimension and as many may be recorded as

is

desirable. Also,

they

are not weighted and no

significance need be implied by the order in

which they are given.

One of the main functions of the new method

is

that of producing a response to an inquiry in

all cases, even

if

the reference appears to be

remote,

it

being the understanding that

it

is

the

closest available.

The elements enumerated by recorders to

identify

a

topic will necessarily vary as no two

recorders

will

view a topic in identical fashion.

Similarly, no two inquirers, when referring to

the same subject

will

state their query in iden-

One of the disadvantages of the conventional

The new method uses the principle of char-

tical fashion.

It

is

therefore important that a

system recognizes that these variations arise

and that they cannot be controlled.

It must then

become the function of

the

system to overcome

these

variations to a reasonable degree.

When identifying a topic by a set of criteria

or identifying terms, the more terms are stated

the more specifically the topic

is

delineated.

Each term in turn may be a concept which in it-

self may vary as to specificity.

If

we consider

a concept

as

being a field in a multi-dimensional

array, we may then visualize a topic as being

located in that space which

is

common to all the

concept fields stated. It may further be visual-

ized that related topics are located more or less

adjacent to each other depending on the degree

of similarity and that this

is

so

because they

agree in some of

the

identifying terms and there-

fore share some of

the

concept fields.

Figure

1

is

a diagrammatic illustration.

The Topic,

Identified

-1.

FIGURE

1.

Other

Topics

In order to understand the nature of the ar-

rangement, let

us

assume a vocabulary of

100

concepts and

let

us

identify a topic by five con-

ceptual terms.

By using all possible combina-

tions of five terms, a total of

75

million patterns

of criteria result, each of these patterns having

a fixed location within

the

system.

Lf

then a

topic is identified by five terms of the vocabu-

lary,

it

is

thereby assigned to a definite one of

these fixed locations.

While assuming that there

is

an ideal and

true location where a topic belongs,

it

is

un-

*International Business Machines Corporation, Engineering Laboratory, Poughkeepsie, New York.

14