scholarly journals What Have We Got to Lose? The Effect of Controlled Vocabulary on Keyword Searching Results

2005 ◽  
Vol 66 (3) ◽  
pp. 212-230 ◽  
Author(s):  
Tina Gross ◽  
Arlene G. Taylor

Using controlled vocabulary in the creation and searching of library catalogs has evoked a great deal of debate because it is expensive to provide. Leading to this study were suggestions that because most users seem to search by keyword, subject headings could be removed from catalog records to save space and cost. This study asked, what proportion of records retrieved by a keyword search has a keyword only in a subject heading field and thus would not be retrieved if there were no subject headings? It was found that more than one-third of records retrieved by successful keyword searches would be lost if subject headings were not present, and many individual cases exist in which 80, 90, and even 100 percent of the retrieved records would not be retrieved in the absence of subject headings.

Author(s):  
Rosie Hanneke, MLS ◽  
Kelly K. O’Brien, MLIS

Objective: The purpose of this study was to investigate the relative effectiveness of three web-scale discovery (WSD) tools in answering health sciences search queries.Methods: Simple keyword searches, based on topics from six health sciences disciplines, were run at multiple real-world implementations of EBSCO Discovery Service (EDS), Ex Libris’s Primo, and ProQuest’s Summon. Each WSD tool was evaluated in its ability to retrieve relevant results and in its coverage of MEDLINE content.Results: All WSD tools returned between 50%–60% relevant results. Primo returned a higher number of duplicate results than the other 2WSD products. Summon results were more relevant when search terms were automatically mapped to controlled vocabulary. EDS indexed the largest number of MEDLINE citations, followed closely by Summon. Additionally, keyword searches in all 3 WSD tools retrieved relevant material that was not found with precision (Medical Subject Headings) searches in MEDLINE.Conclusions: None of the 3 WSD products studied was overwhelmingly more effective in returning relevant results. While difficult to place the figure of 50%–60% relevance in context, it implies a strong likelihood that the average user would be able to find satisfactory sources on the first page of search results using a rudimentary keyword search. The discovery of additional relevant material beyond that retrieved from MEDLINE indicates WSD tools’ value as a supplement to traditional resources for health sciences researchers.


1996 ◽  
Vol 35 (04/05) ◽  
pp. 309-316 ◽  
Author(s):  
M. R. Lehto ◽  
G. S. Sorock

Abstract:Bayesian inferencing as a machine learning technique was evaluated for identifying pre-crash activity and crash type from accident narratives describing 3,686 motor vehicle crashes. It was hypothesized that a Bayesian model could learn from a computer search for 63 keywords related to accident categories. Learning was described in terms of the ability to accurately classify previously unclassifiable narratives not containing the original keywords. When narratives contained keywords, the results obtained using both the Bayesian model and keyword search corresponded closely to expert ratings (P(detection)≥0.9, and P(false positive)≤0.05). For narratives not containing keywords, when the threshold used by the Bayesian model was varied between p>0.5 and p>0.9, the overall probability of detecting a category assigned by the expert varied between 67% and 12%. False positives correspondingly varied between 32% and 3%. These latter results demonstrated that the Bayesian system learned from the results of the keyword searches.


2017 ◽  
Vol 34 (8) ◽  
pp. 20-23 ◽  
Author(s):  
R. Cecilia Knight ◽  
Elizabeth Rodrigues ◽  
Rebecca Ciota

Purpose Working with faculty and staff to create digital projects requires a complex group of skills and activities. Potential collaborators often jump to the end vision without fully grasping the need for proper description and metadata. Design/methodology/approach On-the-job experiences. Contextual inquiry. Findings Using Google Forms and Sheets is perceived of as neutral and less frightening than working in a platform that will be the home of the project or using other proprietary productivity software. Social implications Digital scholars gradually come to understand the stakes of early decisions in metadata creation (such as file naming conventions and controlled vocabulary) and how that affects database structures, record display, keyword searching and long-term curation. Originality/value The authors did not find any other publications addressing using Google Apps for creating metadata.


JAMIA Open ◽  
2020 ◽  
Vol 3 (2) ◽  
pp. 225-232 ◽  
Author(s):  
Anita M Preininger ◽  
Brett South ◽  
Jeff Heiland ◽  
Adam Buchold ◽  
Mya Baca ◽  
...  

Abstract Objective This article describes the system architecture, training, initial use, and performance of Watson Assistant (WA), an artificial intelligence-based conversational agent, accessible within Micromedex®. Materials and methods The number and frequency of intents (target of a user’s query) triggered in WA during its initial use were examined; intents triggered over 9 months were compared to the frequency of topics accessed via keyword search of Micromedex. Accuracy of WA intents assigned to 400 queries was compared to assignments by 2 independent subject matter experts (SMEs), with inter-rater reliability measured by Cohen’s kappa. Results In over 126 000 conversations with WA, intents most frequently triggered involved dosing (N = 30 239, 23.9%) and administration (N = 14 520, 11.5%). SMEs with substantial inter-rater agreement (kappa = 0.71) agreed with intent mapping in 247 of 400 queries (62%), including 16 queries related to content that WA and SMEs agreed was unavailable in WA. SMEs found 57 (14%) of 400 queries incorrectly mapped by WA; 112 (28%) queries unanswerable by WA included queries that were either ambiguous, contained unrecognized typographical errors, or addressed topics unavailable to WA. Of the queries answerable by WA (288), SMEs determined 231 (80%) were correctly linked to an intent. Discussion A conversational agent successfully linked most queries to intents in Micromedex. Ongoing system training seeks to widen the scope of WA and improve matching capabilities. Conclusion WA enabled Micromedex users to obtain answers to many medication-related questions using natural language, with the conversational agent facilitating mapping to a broader distribution of topics than standard keyword searches.


Author(s):  
Karen Corral ◽  
David Schuff ◽  
Robert D. St. Louis ◽  
Ozgur Turetken

Inefficient and ineffective search is widely recognized as a problem for businesses. The shortcomings of keyword searches have been elaborated upon by many authors, and many enhancements to keyword searches have been proposed. To date, however, no one has provided a quantitative model or systematic process for evaluating the savings that accrue from enhanced search procedures. This paper presents a model for estimating the total cost to a company of relying on keyword searches versus a dimensional search approach. The model is based on the Zipf-Mandelbrot law in quantitative linguistics. Our analysis of the model shows that a surprisingly small number of searches are required to justify the cost associated with encoding the metadata necessary to support a dimensional search engine. The results imply that it is cost effective for almost any business organization to implement a dimensional search strategy.


Religions ◽  
2019 ◽  
Vol 10 (10) ◽  
pp. 585 ◽  
Author(s):  
David M. DiValerio

As a genre defined by its content rather than by its form, the extreme diversity of the kinds of texts that can be considered “hagiographic” often proves an impediment to the progress of comparative hagiology. This essay offers some suggestions for the creation of a controlled vocabulary for the formal description of hagiographic texts, demonstrating how having a more highly developed shared language at our disposal will facilitate both the systematic analysis and the comparative discussion of hagiography.


2020 ◽  
Vol 47 (2) ◽  
pp. 183-194
Author(s):  
Heather Dunn ◽  
Paul Bourcier

We present an overview of Nomenclature’s history, characteristics, structure, use, management, development process, limitations, and future. Nomenclature for Museum Cataloging is a bilingual (English/French) structured and controlled list of object terms organized in a classification system to provide a basis for indexing and cataloging collections of human-made objects. It includes illustrations and bibliographic references as well as a user guide. It is used in the creation and management of object records in human history collections within museums and other organizations, and it focuses on objects relevant to North American history and culture. First published in 1978, Nomenclature is the most extensively used museum classification and controlled vocabulary for historical and ethnological collections in North America and represents thereby a de facto standard in the field. An online reference version of Nomenclature was made available in 2018, and it will be available under open license in 2020.


Author(s):  
Ji Ke ◽  
J. S. Wallace ◽  
L. H. Shu

Biology is a good source of analogies for engineering design. One approach of retrieving biological analogies is to perform keyword searches on natural-language sources such as books, journals, etc. A challenge of retrieving information from natural-language sources is the potential requirement to process a large number of search results. This paper describes a categorization method that organizes a large group of diverse biological information into meaningful categories. The benefits of the categorization functionality are demonstrated through a case study on the redesign of a fuel cell bipolar plate. In this case study, our categorization method reduced the effort to systematically identify biological phenomena by up to ∼80%.


Sign in / Sign up

Export Citation Format

Share Document