scholarly journals Application of the Variety-Generator Approach to Searches of Personal Names in Bibliographic Data Bases--Part 2. Optimization of Key-Sets, and Evaluation of Their Retrieval Efficiency

1974 ◽  
Vol 7 (3) ◽  
pp. 201
Author(s):  
Dirk W. Fokker ◽  
Michael F. Lynch

<p class="p1">Keys consisting of variable-length chamcter strings from the front and rear of surnames, derived by analysis of author names in a particular data base, am used to provide approximate representations of author names. When combined in appropriate ratios, and used together with keys for each of the first two initials of personal names, they provide a high degree of discrimination in search.</p> <p class="p1">Methods for optimization of key-sets are described, and the performance of key-sets varying in size between <span class="s1">150 </span>and <span class="s1">300 </span>is determined at file sizes of up to <span class="s1">50,000 </span>name entries. The effects of varying the proportions of the queries present in the file are also examined. The results obtained with fixed-length keys are compared with those for variable-length keys, showing the latter to be greatly superior.</p> <p class="p1">Implications of the work for a variety of types of information systems are discussed.</p>

1974 ◽  
Vol 7 (2) ◽  
pp. 105 ◽  
Author(s):  
Dirk W. Fokker ◽  
Michael F. Lynch

<p class="p1">Conventional approaches to processing records of linguistic origin for storage and retrieval tend to regard the data as immutable. The data generally exhibit great variety and disparate frequency distributions, which are largely ignored and which entail either the storage of extensive lists of items or the use of complex numerical algorithms such as hash coding. The results in each case are far from ideal.</p> <p class="p1">The variety-generator approach seeks to reflect the microstructure of data elements in their description for storage and search, and takes advantage of the consistency of statistical characteristics of data elements in homogeneous data bases.</p> <p class="p1">In this paper, the application of the variety-generator approach to the description of personal author names from the INSPEC data base by means of small sets of keys is detailed. It is shown that high degrees of partitioning of names can be obtained by key-sets generated from the initial characters of surnames, fmm the terminal characters of surnames, and from the initials.</p> <p class="p1">The implications of the findings for computer-based bibliographical informationsystems are discussed.</p>


1985 ◽  
Vol 8 (1) ◽  
pp. 1-25
Author(s):  
Andrzej W. Jankowski ◽  
Cecylia Rauszer

The paper deals with the mathematical description of information systems with a limited access to a data base. Similarly as in [1] an area to which the user has access is called his priority. The information systems introduced in the paper are mathematical models for an intermediate logic with logical constants that corresponds to the priorities. The principles for operating this language are described, as well as a complete semantics is formulated.


1984 ◽  
Vol 8 (2) ◽  
pp. 57-61 ◽  
Author(s):  
Kazimierz Kowalski ◽  
Aleksander Zgrzywa

This paper presents the results of an investigation of the operation of bibliographic data bases. The investigations were concerned with the length of information queries and the number of answers as well as the relationship between them, depending on the data base. The results obtained can affect the assumptions and strategy of operating the data bases in SDI systems.


1973 ◽  
Vol 12 (1) ◽  
pp. 30-44 ◽  
Author(s):  
M. E. Senko ◽  
E. B. Altman ◽  
M. M. Astrahan ◽  
P. L. Fehder

1984 ◽  
Vol 8 (2) ◽  
pp. 63-66 ◽  
Author(s):  
C.P.R. Dubois

The controlled vocabulary versus the free text approach to information retrieval is reviewed from the mid 1960s to the early 1980s. The dominance of the free text approach following the Cranfield tests is increasingly coming into question as a result of tests on existing online data bases and case studies. This is supported by two case studies on the Coffeeline data base. The differences and values of the two approaches are explored considering thesauri as semantic maps. It is suggested that the most appropriate evaluatory technique for indexing languages is to study the actual use made of various techniques in a wide variety of search environments. Such research is becoming more urgent. Economic and other reasons for the scarcity of online thesauri are reviewed and suggestions are made for methods to secure revenue from thesaurus display facilities. Finally, the promising outlook for renewed develop ment of controlled vocabularies with more effective online display techniques is mentioned, although such development must be based on firm research of user behaviour and needs.


Sign in / Sign up

Export Citation Format

Share Document