What Have We Got to Lose? The Effect of Controlled Vocabulary on Keyword Searching Results

Using controlled vocabulary in the creation and searching of library catalogs has evoked a great deal of debate because it is expensive to provide. Leading to this study were suggestions that because most users seem to search by keyword, subject headings could be removed from catalog records to save space and cost. This study asked, what proportion of records retrieved by a keyword search has a keyword only in a subject heading field and thus would not be retrieved if there were no subject headings? It was found that more than one-third of records retrieved by successful keyword searches would be lost if subject headings were not present, and many individual cases exist in which 80, 90, and even 100 percent of the retrieved records would not be retrieved in the absence of subject headings.

Download Full-text

Comparison of three web-scale discovery services for health sciences research*

Journal of the Medical Library Association JMLA ◽

10.5195/jmla.2016.52 ◽

2016 ◽

Vol 104 (2) ◽

Cited By ~ 1

Author(s):

Rosie Hanneke, MLS ◽

Kelly K. O’Brien, MLIS

Keyword(s):

Keyword Search ◽

Relative Effectiveness ◽

Health Sciences ◽

Controlled Vocabulary ◽

Medical Subject Headings ◽

Search Terms ◽

Discovery Service ◽

Keyword Searches ◽

Relevant Material ◽

Average User

Objective: The purpose of this study was to investigate the relative effectiveness of three web-scale discovery (WSD) tools in answering health sciences search queries.Methods: Simple keyword searches, based on topics from six health sciences disciplines, were run at multiple real-world implementations of EBSCO Discovery Service (EDS), Ex Libris’s Primo, and ProQuest’s Summon. Each WSD tool was evaluated in its ability to retrieve relevant results and in its coverage of MEDLINE content.Results: All WSD tools returned between 50%–60% relevant results. Primo returned a higher number of duplicate results than the other 2WSD products. Summon results were more relevant when search terms were automatically mapped to controlled vocabulary. EDS indexed the largest number of MEDLINE citations, followed closely by Summon. Additionally, keyword searches in all 3 WSD tools retrieved relevant material that was not found with precision (Medical Subject Headings) searches in MEDLINE.Conclusions: None of the 3 WSD products studied was overwhelmingly more effective in returning relevant results. While difficult to place the figure of 50%–60% relevance in context, it implies a strong likelihood that the average user would be able to find satisfactory sources on the first page of search results using a rudimentary keyword search. The discovery of additional relevant material beyond that retrieved from MEDLINE indicates WSD tools’ value as a supplement to traditional resources for health sciences researchers.

Download Full-text

Machine Learning of Motor Vehicle Accident Categories from Narrative Data

Methods of Information in Medicine ◽

10.1055/s-0038-1634680 ◽

1996 ◽

Vol 35 (04/05) ◽

pp. 309-316 ◽

Cited By ~ 4

Author(s):

M. R. Lehto ◽

G. S. Sorock

Keyword(s):

Machine Learning ◽

Bayesian Model ◽

Keyword Search ◽

Motor Vehicle ◽

Motor Vehicle Accident ◽

Computer Search ◽

Vehicle Accident ◽

Learning Technique ◽

Expert Ratings ◽

Keyword Searches

Abstract:Bayesian inferencing as a machine learning technique was evaluated for identifying pre-crash activity and crash type from accident narratives describing 3,686 motor vehicle crashes. It was hypothesized that a Bayesian model could learn from a computer search for 63 keywords related to accident categories. Learning was described in terms of the ability to accurately classify previously unclassifiable narratives not containing the original keywords. When narratives contained keywords, the results obtained using both the Bayesian model and keyword search corresponded closely to expert ratings (P(detection)≥0.9, and P(false positive)≤0.05). For narratives not containing keywords, when the threshold used by the Bayesian model was varied between p>0.5 and p>0.9, the overall probability of detecting a category assigned by the expert varied between 67% and 12%. False positives correspondingly varied between 32% and 3%. These latter results demonstrated that the Bayesian system learned from the results of the keyword searches.

Download Full-text

Collaborating for metadata creation on digital projects: using Google Forms and Sheets

Library Hi Tech News ◽

10.1108/lhtn-08-2017-0056 ◽

2017 ◽

Vol 34 (8) ◽

pp. 20-23 ◽

Cited By ~ 1

Author(s):

R. Cecilia Knight ◽

Elizabeth Rodrigues ◽

Rebecca Ciota

Keyword(s):

Design Methodology ◽

Controlled Vocabulary ◽

Social Implications ◽

Content Type ◽

Complex Group ◽

Job Experiences ◽

Faculty And Staff ◽

Keyword Searching ◽

Google Apps

Purpose Working with faculty and staff to create digital projects requires a complex group of skills and activities. Potential collaborators often jump to the end vision without fully grasping the need for proper description and metadata. Design/methodology/approach On-the-job experiences. Contextual inquiry. Findings Using Google Forms and Sheets is perceived of as neutral and less frightening than working in a platform that will be the home of the project or using other proprietary productivity software. Social implications Digital scholars gradually come to understand the stakes of early decisions in metadata creation (such as file naming conventions and controlled vocabulary) and how that affects database structures, record display, keyword searching and long-term curation. Originality/value The authors did not find any other publications addressing using Google Apps for creating metadata.

Download Full-text

The Creation and Persistence of Misinformation in Shared Library Catalogs: Language and Subject Knowledge in a Technological Era

Library Collections Acquisitions and Technical Services ◽

10.1080/14649055.2003.10765902 ◽

2003 ◽

Vol 27 (1) ◽

pp. 130-131

Author(s):

Rosann Bazirjian

Keyword(s):

Subject Knowledge ◽

The Creation ◽

Library Catalogs

Download Full-text

Artificial intelligence-based conversational agent to support medication prescribing

JAMIA Open ◽

10.1093/jamiaopen/ooaa009 ◽

2020 ◽

Vol 3 (2) ◽

pp. 225-232 ◽

Cited By ~ 1

Author(s):

Anita M Preininger ◽

Brett South ◽

Jeff Heiland ◽

Adam Buchold ◽

Mya Baca ◽

...

Keyword(s):

Artificial Intelligence ◽

Natural Language ◽

Subject Matter ◽

System Architecture ◽

Keyword Search ◽

Conversational Agent ◽

Subject Matter Experts ◽

Rater Agreement ◽

And Performance ◽

Keyword Searches

Abstract Objective This article describes the system architecture, training, initial use, and performance of Watson Assistant (WA), an artificial intelligence-based conversational agent, accessible within Micromedex®. Materials and methods The number and frequency of intents (target of a user’s query) triggered in WA during its initial use were examined; intents triggered over 9 months were compared to the frequency of topics accessed via keyword search of Micromedex. Accuracy of WA intents assigned to 400 queries was compared to assignments by 2 independent subject matter experts (SMEs), with inter-rater reliability measured by Cohen’s kappa. Results In over 126 000 conversations with WA, intents most frequently triggered involved dosing (N = 30 239, 23.9%) and administration (N = 14 520, 11.5%). SMEs with substantial inter-rater agreement (kappa = 0.71) agreed with intent mapping in 247 of 400 queries (62%), including 16 queries related to content that WA and SMEs agreed was unavailable in WA. SMEs found 57 (14%) of 400 queries incorrectly mapped by WA; 112 (28%) queries unanswerable by WA included queries that were either ambiguous, contained unrecognized typographical errors, or addressed topics unavailable to WA. Of the queries answerable by WA (288), SMEs determined 231 (80%) were correctly linked to an intent. Discussion A conversational agent successfully linked most queries to intents in Micromedex. Ongoing system training seeks to widen the scope of WA and improve matching capabilities. Conclusion WA enabled Micromedex users to obtain answers to many medication-related questions using natural language, with the conversational agent facilitating mapping to a broader distribution of topics than standard keyword searches.

Download Full-text

A Model for Estimating the Savings from Dimensional vs. Keyword Search

Advances in Database Research - Advanced Principles for Improving Database Design, Systems Modeling, and Software Development ◽

10.4018/978-1-60566-172-8.ch009 ◽

2009 ◽

pp. 146-157

Author(s):

Karen Corral ◽

David Schuff ◽

Robert D. St. Louis ◽

Ozgur Turetken

Keyword(s):

Search Engine ◽

Search Strategy ◽

Keyword Search ◽

Cost Effective ◽

Quantitative Model ◽

Total Cost ◽

Search Approach ◽

The Cost ◽

Keyword Searches ◽

A Company

Inefficient and ineffective search is widely recognized as a problem for businesses. The shortcomings of keyword searches have been elaborated upon by many authors, and many enhancements to keyword searches have been proposed. To date, however, no one has provided a quantitative model or systematic process for evaluating the savings that accrue from enhanced search procedures. This paper presents a model for estimating the total cost to a company of relying on keyword searches versus a dimensional search approach. The model is based on the Zipf-Mandelbrot law in quantitative linguistics. Our analysis of the model shows that a surprisingly small number of searches are required to justify the cost associated with encoding the metadata necessary to support a dimensional search engine. The results imply that it is cost effective for almost any business organization to implement a dimensional search strategy.

Download Full-text

Bade, David. The Creation and Persistence of Misinformation in Shared Library Catalogs: Language and Subject Knowledge in a Technological Era. Champaign-Urbana, Ill.: Graduate School of Library and Information Science, Univ. of Illinois (Occasional Papers, no. 211), 2002. 33p. $8 (ISBN 087845120X).

College & Research Libraries ◽

10.5860/crl.63.5.470 ◽

2002 ◽

Vol 63 (5) ◽

pp. 470-471

Author(s):

Robert Bland

Keyword(s):

Graduate School ◽

Information Science ◽

Library And Information Science ◽

Subject Knowledge ◽

The Creation ◽

Library Catalogs

Download Full-text

A Preliminary Controlled Vocabulary for the Description of Hagiographic Texts

Religions ◽

10.3390/rel10100585 ◽

2019 ◽

Vol 10 (10) ◽

pp. 585 ◽

Cited By ~ 2

Author(s):

David M. DiValerio

Keyword(s):

Controlled Vocabulary ◽

Formal Description ◽

Systematic Analysis ◽

Shared Language ◽

The Creation

As a genre defined by its content rather than by its form, the extreme diversity of the kinds of texts that can be considered “hagiographic” often proves an impediment to the progress of comparative hagiology. This essay offers some suggestions for the creation of a controlled vocabulary for the formal description of hagiographic texts, demonstrating how having a more highly developed shared language at our disposal will facilitate both the systematic analysis and the comparative discussion of hagiography.

Download Full-text

Nomenclature for Museum Cataloging

KNOWLEDGE ORGANIZATION ◽

10.5771/0943-7444-2020-2-183 ◽

2020 ◽

Vol 47 (2) ◽

pp. 183-194

Author(s):

Heather Dunn ◽

Paul Bourcier

Keyword(s):

North America ◽

North American ◽

American History ◽

Classification System ◽

Development Process ◽

Management Development ◽

Controlled Vocabulary ◽

Human History ◽

The Creation

We present an overview of Nomenclature’s history, characteristics, structure, use, management, development process, limitations, and future. Nomenclature for Museum Cataloging is a bilingual (English/French) structured and controlled list of object terms organized in a classification system to provide a basis for indexing and cataloging collections of human-made objects. It includes illustrations and bibliographic references as well as a user guide. It is used in the creation and management of object records in human history collections within museums and other organizations, and it focuses on objects relevant to North American history and culture. First published in 1978, Nomenclature is the most extensively used museum classification and controlled vocabulary for historical and ethnological collections in North America and represents thereby a de facto standard in the field. An online reference version of Nomenclature was made available in 2018, and it will be available under open license in 2020.

Download Full-text

Supporting Biomimetic Design Through Categorization of Natural-Language Keyword-Search Results

Volume 8: 14th Design for Manufacturing and the Life Cycle Conference; 6th Symposium on International Design and Design Education; 21st International Conference on Design Theory and Methodology, Parts A and B ◽

10.1115/detc2009-86681 ◽

2009 ◽

Cited By ~ 4

Author(s):

Ji Ke ◽

J. S. Wallace ◽

L. H. Shu

Keyword(s):

Fuel Cell ◽

Natural Language ◽

Keyword Search ◽

Bipolar Plate ◽

Biological Information ◽

Good Source ◽

Search Results ◽

Biological Phenomena ◽

Keyword Searches

Biology is a good source of analogies for engineering design. One approach of retrieving biological analogies is to perform keyword searches on natural-language sources such as books, journals, etc. A challenge of retrieving information from natural-language sources is the potential requirement to process a large number of search results. This paper describes a categorization method that organizes a large group of diverse biological information into meaningful categories. The benefits of the categorization functionality are demonstrated through a case study on the redesign of a fuel cell bipolar plate. In this case study, our categorization method reduced the effort to systematically identify biological phenomena by up to ∼80%.

Download Full-text