A Feasibility Study of Automatic Indexing and Information Retrieval

1970 ◽  
Vol 13 (2) ◽  
pp. 58-59
Author(s):  
Roy W. Graves ◽  
Donald P. Helander
Author(s):  
Stéfan J. Darmoni ◽  
Suzanne Pereira ◽  
Saoussen Sakji ◽  
Tayeb Merabti ◽  
Élise Prieur ◽  
...  

Author(s):  
Thomas Mandl

In the 1960s, automatic indexing methods for texts were developed. They had already implemented the “bag-ofwords” approach, which still prevails. Although automatic indexing is widely used today, many information providers and even Internet services still rely on human information work. In the 1970s, research shifted its interest to partial-match retrieval models and proved their superiority over Boolean retrieval models. Vector-space and later probabilistic retrieval models were developed. However, it took until the 1990s for partial-match models to succeed in the market. The Internet played a great role in this success. All Web search engines were based on partial-match models and provided ranked lists as results rather than unordered sets of documents. Consumers got used to this kind of search systems, and all big search engines included partial-match functionality. However, there are many niches in which Boolean methods still dominate, for example, patent retrieval. The basis for information retrieval systems may be pictures, graphics, videos, music objects, structured documents, or combinations thereof. This article is mainly concerned with information retrieval for text documents.


2000 ◽  
Vol 22 (1) ◽  
pp. 25-26
Author(s):  
Jan Ross

The indexing process has changed remarkably with technological advances. Indexing is no longer just ‘back-ofbook’ indexing, but includes automatic indexing, machine-aided indexing, web indexing and even 3-D indexing. Not all the effects have been positive, especially for the indexer, but the future of the Internet and efficient information retrieval lies with indexing.


Author(s):  
Sándor Darányi ◽  
Péter Wittek

Current methods of automatic indexing, automatic classification, and information retrieval treat index and query terms, that is, vocabulary units in any language, as locations in a geometry. With spatial sense relations among such units identified, and syntax added, the making of a geometric equivalent of language for advanced communication is an opportunity to be explored.


2013 ◽  
Vol 64 (2-3) ◽  
Author(s):  
Andreas Oskar Kempf

Der Artikel basiert auf einer Masterarbeit mit dem Titel „Automatische Indexierung in der sozialwissenschaftlichen Fachinformation. Eine Evaluationsstudie zur maschinellen Erschließung für die Datenbank SOLIS“ (Kempf 2012), die im Rahmen des Aufbaustudiengangs Bibliotheks- und Informationswissenschaft an der Humboldt- Universität zu Berlin am Lehrstuhl Information Retrieval verfasst wurde. Auf der Grundlage des Schalenmodells zur Inhaltserschließung in der Fachinformation (vgl. Krause 1996, 2006) stellt der Artikel Evaluationsergebnisse eines automatischen Erschließungsverfahrens für den Einsatz in der sozialwissenschaftlichen Fachinformation vor. Ausgehend von dem von Krause beschriebenen Anwendungsszenario, wonach SOLIS-Datenbestände (Sozialwissenschaftliches Literaturinformationssystem) von geringerer Relevanz automatisch erschlossen werden sollten, wurden auf dieser Dokumentgrundlage zwei Testreihen mit der Indexierungssoftware MindServer der Firma Recommind1 durchgeführt. Neben den Auswirkungen allgemeiner Systemeinstellungen in der ersten Testreihe wurde in der zweiten Testreihe die Indexierungsleistung der Software für die Rand- und die Kernbereiche der Literaturdatenbank miteinander verglichen. Für letztere Testreihe wurden für beide Bereiche der Datenbank spezifische Versionen der Indexierungssoftware aufgebaut, die anhand von Dokumentkorpora aus den entsprechenden Bereichen trainiert wurden. Die Ergebnisse der Evaluation, die auf der Grundlage intellektuell generierter Vergleichsdaten erfolgt, weisen auf Unterschiede in der Indexierungsleistung zwischen Rand- und Kernbereichen hin, die einerseits gegen den Einsatz automatischer Indexierungsverfahren in den Randbereichen sprechen. Andererseits deutet sich an, dass sich die Indexierungsresultate durch den Aufbau fachteilgebietsspezifischer Trainingsmengen verbessern lassen.


2015 ◽  
Vol 33 (2) ◽  
pp. 195-210 ◽  
Author(s):  
Mari Vállez ◽  
Rafael Pedraza-Jiménez ◽  
Lluís Codina ◽  
Saúl Blanco ◽  
Cristòfol Rovira

Purpose – The purpose of this paper is to describe and evaluate the tool DigiDoc MetaEdit which allows the semi-automatic indexing of HTML documents. The tool works by identifying and suggesting keywords from a thesaurus according to the embedded information in HTML documents. This enables the parameterization of keyword assignment based on how frequently the terms appear in the document, the relevance of their position, and the combination of both. Design/methodology/approach – In order to evaluate the efficiency of the indexing tool, the descriptors/keywords suggested by the indexing tool are compared to the keywords which have been indexed manually by human experts. To make this comparison a corpus of HTML documents are randomly selected from a journal devoted to Library and Information Science. Findings – The results of the evaluation show that there: first, is close to a 50 per cent match or overlap between the two indexing systems, however, if you take into consideration the related terms and the narrow terms the matches can reach 73 per cent; and second, the first terms identified by the tool are the most relevant. Originality/value – The tool presented identifies the most important keywords in an HTML document based on the embedded information in HTML documents. Nowadays, representing the contents of documents with keywords is an essential practice in areas such as information retrieval and e-commerce.


Sign in / Sign up

Export Citation Format

Share Document