Correction of Misspellings and Typographical Errors in a Free-Text Medical English Information Storage and Retrieval System

1979 ◽  
Vol 18 (04) ◽  
pp. 228-234 ◽  
Author(s):  
D. M. Joseph ◽  
Ruth L. Wong

The errors studied are misspellings and typographical errors made by the physician house staff, surgical pathologists, and secretary/typists of a large teaching hospital. The 6,019 errors studies were encountered in the compilation of a LEXICON now containing 24,135 medical and non-medical terms (including errors) from Tissue Examination Request Forms and Surgical Pathology Reports. An automated error correction algorithm was sought to reduce the tedious task of manual encoding of errors, and eliminate the need for storing errors occupying 24.9% of the LEXICON storage space. The errors were classified into 23 types, and it was found that 84.2% of the errors were in the 11 first order categories.Existing error correction algorithms were analyzed with respect to possible application to our medical sample. Two were selected for experimentation, the Baskin-Selfridge algorithm and SOUNDEX. Results showed that Baskin-Selfridge worked quite well, but was too slow to be applied singularly. SOUNDEX was reasonable in speed, but had too many mismatches to be applied singularly in a non-interactive application. SOUNDEX was modified phonologically and with respect to code length in various ways and some experimental data showed improvements.The optimal design for the medical LEXICON sample appears to be a two-step process. The modified version of SOUNDEX will quickly select the most likely corrections for the error (experimental average is 2.38 choices/error). Then the Baskin-Selfridge will decide which, if any, is the actual correct form of the error. By only considering a very small number of choices, the time required for the Baskin-Selfridge algorithm becomes trivial.On the basis of experimental results, it is estimated that this combination will reduce manual encoding of errors by 60—70% and reduce the storage required for the LEXICON by approximately 15%.

1989 ◽  
Vol 110 ◽  
pp. 72-76
Author(s):  
Robyn M. Shobbrook

Astronomers and librarians have been experiencing difficulties in keeping up with the amount of published literature. The astronomer tries to keep abreast in his particular field and the librarian in the management, control and retrieval of scientific information. The 1980’s have seen a revolution in the methods for information storage and retrieval and in particular the advent of the online database. The speed of processing information for storage has been embraced by all, however little thought has been given to how we shall achieve effective high precision recall of documents.Many librarians firmly believe the best road to success in information retrieval from automated systems is provided by vocabulary control. Contrary to belief, free text or natural language searching alone does not lead to high precision recall. Consistency and integrity of the online catalogue can only be achieved with the addition of a controlled vocabulary. With today’s technology it is possible to maintain the best of both worlds. The controlled vocabulary is used to index the major concepts of a given document over and above the natural language used within the document.


1980 ◽  
Vol 2 (1) ◽  
pp. 23-28
Author(s):  
John H. Ashford

Minicomputers are both technically suitable and cost effective for information storage and retrieval systems. Soft ware packages for structured database management and for free text systems are well established and can reduce both the costs and the technical difficulties of setting up an informa tion management system. The particular requirements of designing minicomputer systems are described, and some practical consequences indicated. A number of commercially available free text systems are listed with their salient fea tures. It is expected that free text systems rather than struc tured databases will prove to be of most use to the informa tion scientist or librarian.


1979 ◽  
Vol 2 (1) ◽  
pp. 17-41
Author(s):  
Michał Jaegermann

In the paper is developed a theory of information storage and retrieval systems which arise in situations when a whole possessed information amounts to a fact that a given document has some feature from properly chosen set. Such systems are described as suitable maps from descriptor algebras into sets of subsets of sets of documents. Since descriptor algebras turn out to be pseudo-Boolean algebras, hence an “inner logic” of our systems is intuitionistic. In the paper is given a construction of systems and are considered theirs properties. We will show also (in Part II) a formalized theory of such systems.


2021 ◽  
Vol 70 ◽  
pp. 24-33
Author(s):  
Johnatan Aljadeff ◽  
Maxwell Gillett ◽  
Ulises Pereira Obilinovic ◽  
Nicolas Brunel

1981 ◽  
Vol 3 (5) ◽  
pp. 227-233 ◽  
Author(s):  
K.P. Broadbent

China has suffered from over a decade of turmoil which has prevented the development of modern information ser vices. Present policy stresses the role of information storage and retrieval in national development. Apart from technical and political constraints, China faces a serious handicap with its unique written language, where the 5000 plus characters needed to express scientific and technical concepts are too large to be handled cost-effectively by present computers. This report outlines ways in which China is currently attempt ing to meet these problems and provide for modern informa tion services by the end of the decade.


Sign in / Sign up

Export Citation Format

Share Document