Evaluation of Meta-1 for a Concept-based Approach to the Automated Indexing and Retrieval of Bibliographic and Full-text Databases

SAPHIRE is a concept-based approach to information retrieval in the biomedical domain. Indexing and retrieval are based on a concept-matching algorithm that processes free text to identify concepts and map them to their canonical form. This process requires a large vocabulary containing a breadth of medical concepts and a diversity of synonym forms, which is provided by the Meta-1 vocabulary from the Unified Medical Language System Project of the National Library of Medicine. This paper describes the use of Meta-1 in SAPHIRE and an evaluation of both entities in the context of an information retrieval study.

Download Full-text

Core Data Elements in Acute Myeloid Leukemia: A Unified Medical Language System–Based Semantic Analysis and Experts’ Review (Preprint)

10.2196/preprints.13554 ◽

2019 ◽

Author(s):

Christian Holz ◽

Torsten Kessler ◽

Martin Dugas ◽

Julian Varghese

Keyword(s):

Acute Myeloid Leukemia ◽

Clinical Data ◽

Myeloid Leukemia ◽

Data Capture ◽

Large Set ◽

Unified Medical Language System ◽

Medical Language ◽

Data Elements ◽

Medical Concepts ◽

Acute Myeloid

BACKGROUND For cancer domains such as acute myeloid leukemia (AML), a large set of data elements is obtained from different institutions with heterogeneous data definitions within one patient course. The lack of clinical data harmonization impedes cross-institutional electronic data exchange and future meta-analyses. OBJECTIVE This study aimed to identify and harmonize a semantic core of common data elements (CDEs) in clinical routine and research documentation, based on a systematic metadata analysis of existing documentation models. METHODS Lists of relevant data items were collected and reviewed by hematologists from two university hospitals regarding routine documentation and several case report forms of clinical trials for AML. In addition, existing registries and international recommendations were included. Data items were coded to medical concepts via the Unified Medical Language System (UMLS) by a physician and reviewed by another physician. On the basis of the coded concepts, the data sources were analyzed for concept overlaps and identification of most frequent concepts. The most frequent concepts were then implemented as data elements in the standardized format of the Operational Data Model by the Clinical Data Interchange Standards Consortium. RESULTS A total of 3265 medical concepts were identified, of which 1414 were unique. Among the 1414 unique medical concepts, the 50 most frequent ones cover 26.98% of all concept occurrences within the collected AML documentation. The top 100 concepts represent 39.48% of all concepts’ occurrences. Implementation of CDEs is available on a European research infrastructure and can be downloaded in different formats for reuse in different electronic data capture systems. CONCLUSIONS Information management is a complex process for research-intense disease entities as AML that is associated with a large set of lab-based diagnostics and different treatment options. Our systematic UMLS-based analysis revealed the existence of a core data set and an exemplary reusable implementation for harmonized data capture is available on an established metadata repository.

Download Full-text

Similarity of medical concepts in question and answering of health communities

Health Informatics Journal ◽

10.1177/1460458219881333 ◽

2019 ◽

Vol 26 (2) ◽

pp. 1443-1454 ◽

Cited By ~ 1

Author(s):

Hamid Naderi ◽

Sina Madani ◽

Behzad Kiani ◽

Kobra Etminani

Keyword(s):

Computing Methods ◽

Inverse Document Frequency ◽

Language System ◽

Unified Medical Language System ◽

Retrieval Systems ◽

Medical Language ◽

Document Frequency ◽

Health Communities ◽

Medical Concepts ◽

Question And Answering

The ability to automatically categorize submitted questions based on topics and suggest similar question and answer to the users reduces the number of redundant questions. Our objective was to compare intra-topic and inter-topic similarity between question and answers by using concept-based similarity computing analysis. We gathered existing question and answers from several popular online health communities. Then, Unified Medical Language System concepts related to selected questions and experts in different topics were extracted and weighted by term frequency -inverse document frequency values. Finally, the similarity between weighted vectors of Unified Medical Language System concepts was computed. Our result showed a considerable gap between intra-topic and inter-topic similarities in such a way that the average of intra-topic similarity (0.095, 0.192, and 0.110, respectively) was higher than the average of inter-topic similarity (0.012, 0.025, and 0.018, respectively) for questions of the top 3 popular online communities including NetWellness, WebMD, and Yahoo Answers. Similarity scores between the content of questions answered by experts in the same and different topics were calculated as 0.51 and 0.11, respectively. Concept-based similarity computing methods can be used in developing intelligent question and answering retrieval systems that contain auto recommendation functionality for similar questions and experts.

Download Full-text

Interoperability and Mapping Between Knowledge Organization Systems: Metathesaurus— Unified Medical Language System of the National Library of Medicine

KNOWLEDGE ORGANIZATION ◽

10.5771/0943-7444-2016-2-107 ◽

2016 ◽

Vol 43 (2) ◽

pp. 107-112 ◽

Cited By ~ 1

Author(s):

Julietti de Andrade ◽

Marilda Lopes Ginez de Lara

Keyword(s):

Knowledge Organization ◽

National Library ◽

Language System ◽

Unified Medical Language System ◽

Medical Language ◽

Knowledge Organization Systems

Download Full-text

Pathology Abbreviated: A Long Review of Short Terms

Archives of Pathology & Laboratory Medicine ◽

10.5858/2004-128-347-paalro ◽

2004 ◽

Vol 128 (3) ◽

pp. 347-352 ◽

Cited By ~ 2

Author(s):

Jules J. Berman

Keyword(s):

Medical Records ◽

Free Text ◽

Current Version ◽

The Public ◽

Language System ◽

Unified Medical Language System ◽

The Past ◽

Medical Language ◽

Pathology Reports ◽

Algorithmic Approaches

Abstract Context.—Abbreviations are used frequently in pathology reports and medical records. Efforts to identify and organize free-text concepts must correctly interpret medical abbreviations. During the past decade, the author has collected more than 12 000 medical abbreviations, concentrating on terms used or interpreted by pathologists. Objective.—The purpose of the study is to provide readers with a listing of abbreviations. The listing of abbreviations is reviewed for the purpose of determining the variety of ways that long forms are shortened. Design.—Abbreviations fell into different classes. These classes seemed amenable to distinct algorithmic approaches to their correct expansions. A discussion of these abbreviation classes was included to assist informaticians who are searching for ways to write software that expands abbreviations found in medical text. Classes were separated by the algorithmic approaches that could be used to map abbreviations to their correct expansions. A Perl implementation was developed to automatically match expansions with Unified Medical Language System concepts. Measurements.—The abbreviation list contained 12 097 terms; 5772 abbreviations had unique expansions. There were 6325 polysemous abbreviation/expansion pairs. The expansions of 8599 abbreviations mapped to Unified Medical Language System concepts. Three hundred twenty-four abbreviations could be confused with unabbreviated words. Two hundred thirteen abbreviations had different expansions depending on whether the American or the British spellings were used. Nine hundred seventy abbreviations ended in the letter “s.” Results.—There were 6 nonexclusive groups of abbreviations classed by expansion algorithm, as follows: (1) ephemeral; (2) hyponymous; (3) monosemous; (4) polysemous; (5) masqueraders of common words; and (6) fatal (abbreviations whose incorrect expansions could easily result in clinical errors). Conclusion.—Collecting and classifying abbreviations creates a logical approach to the development of class-specific algorithms designed to expand abbreviations. A large listing of medical abbreviations is placed into the public domain. The most current version is available at http://www.pathologyinformatics.org/downloads/abbtwo.htm.

Download Full-text

Navigating to Knowledge

Methods of Information in Medicine ◽

10.1055/s-0038-1634582 ◽

1995 ◽

Vol 34 (01/02) ◽

pp. 214-231 ◽

Cited By ~ 5

Author(s):

M. S. Tuttle ◽

W. G. Cole ◽

D. D. Sherertz ◽

S. J. Nelson

Keyword(s):

National Cancer Institute ◽

Point Of Care ◽

Visual Representation ◽

Medical Knowledge ◽

National Library ◽

Language System ◽

Unified Medical Language System ◽

Medical Language ◽

Computer Based ◽

The U.S

Abstract:One way to fulfill point-of-care knowledge needs is to present caregivers with a visual representation of the available “answers”. Using such a representation, caregivers can recognize what they want, rather than have to recall what they need, and then navigate to an appropriate answer. Given selected pieces of information from a computer-based patient record, an interface can anticipate certain knowledge needs by initializing caregiver navigation in a semantic neighborhood of answers likely to be relevant to the patient at hand. These notions draw heavily on two collaborative projects – the U.S. National Library of Medicine Unified Medical Language System® and the U.S. National Cancer Institute Knowledge Server. Both of these projects support navigation because they make the structure of medical knowledge explicit in a way that can be exploited by human interfaces.

Download Full-text

The Unified Medical Language System SPECIALIST Lexicon and Lexical Tools: Development and applications

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa056 ◽

2020 ◽

Vol 27 (10) ◽

pp. 1600-1605 ◽

Cited By ~ 2

Author(s):

Chris J Lu ◽

Amanda Payne ◽

James G Mork

Keyword(s):

Concept Mapping ◽

Language Processing ◽

Vital Role ◽

Unstructured Data ◽

National Library ◽

Language System ◽

Unified Medical Language System ◽

The Core ◽

Medical Language ◽

Recent Developments

Abstract Natural language processing (NLP) plays a vital role in modern medical informatics. It converts narrative text or unstructured data into knowledge by analyzing and extracting concepts. A comprehensive lexical system is the foundation to the success of NLP applications and an essential component at the beginning of the NLP pipeline. The SPECIALIST Lexicon and Lexical Tools, distributed by the National Library of Medicine as one of the Unified Medical Language System Knowledge Sources, provides an underlying resource for many NLP applications. This article reports recent developments of 3 key components in the Lexicon. The core NLP operation of Unified Medical Language System concept mapping is used to illustrate the importance of these developments. Our objective is to provide generic, broad coverage and a robust lexical system for NLP applications. A novel multiword approach and other planned developments are proposed.

Download Full-text

Ambiguity in medical concept normalization: An analysis of types and coverage in electronic health record datasets

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa269 ◽

2020 ◽

Author(s):

Denis Newman-Griffis ◽

Guy Divita ◽

Bart Desmet ◽

Ayah Zirikly ◽

Carolyn P Rosé ◽

...

Keyword(s):

Biomedical Literature ◽

Clinical Text ◽

Unified Medical Language System ◽

Medical Language ◽

Medical Concept ◽

Normalization Methods ◽

The Rich ◽

Medical Concept Normalization ◽

Medical Concepts ◽

Clinical Concept

Abstract Objectives Normalizing mentions of medical concepts to standardized vocabularies is a fundamental component of clinical text analysis. Ambiguity—words or phrases that may refer to different concepts—has been extensively researched as part of information extraction from biomedical literature, but less is known about the types and frequency of ambiguity in clinical text. This study characterizes the distribution and distinct types of ambiguity exhibited by benchmark clinical concept normalization datasets, in order to identify directions for advancing medical concept normalization research. Materials and Methods We identified ambiguous strings in datasets derived from the 2 available clinical corpora for concept normalization and categorized the distinct types of ambiguity they exhibited. We then compared observed string ambiguity in the datasets with potential ambiguity in the Unified Medical Language System (UMLS) to assess how representative available datasets are of ambiguity in clinical language. Results We found that <15% of strings were ambiguous within the datasets, while over 50% were ambiguous in the UMLS, indicating only partial coverage of clinical ambiguity. The percentage of strings in common between any pair of datasets ranged from 2% to only 36%; of these, 40% were annotated with different sets of concepts, severely limiting generalization. Finally, we observed 12 distinct types of ambiguity, distributed unequally across the available datasets, reflecting diverse linguistic and medical phenomena. Discussion Existing datasets are not sufficient to cover the diversity of clinical concept ambiguity, limiting both training and evaluation of normalization methods for clinical text. Additionally, the UMLS offers important semantic information for building and evaluating normalization methods. Conclusions Our findings identify 3 opportunities for concept normalization research, including a need for ambiguity-specific clinical datasets and leveraging the rich semantics of the UMLS in new methods and evaluation measures for normalization.

Download Full-text

Terminology Tools: State of the Art and Practical Lessons

Methods of Information in Medicine ◽

10.1055/s-0038-1634425 ◽

2001 ◽

Vol 40 (04) ◽

pp. 298-306 ◽

Cited By ~ 16

Author(s):

J. J. Cimino

Keyword(s):

New York ◽

Columbia University ◽

Unified Medical Language System ◽

Knowledge Based ◽

Medical Language ◽

Tool Set ◽

Systematized Nomenclature Of Medicine ◽

Work Done ◽

Medical Concepts

Summary Objectives: As controlled medical terminologies evolve from simple code-name-hierarchy arrangements, into rich, knowledge-based ontologies of medical concepts, increased demands are placed on both the developers and users of the terminologies. In response, researchers have begun developing tools to address their needs. The aims of this article are to review previous work done to develop these tools and then to describe work done at Columbia University and New York Presbyterian Hospital (NYPH). Methods: Researchers working with the Systematized Nomenclature of Medicine (SNOMED), the Unified Medical Language System (UMLS), and NYPH’s Medical Entities Dictionary (MED) have created a wide variety of terminology browsers, editors and servers to facilitate creation, maintenance and use of these terminologies. Results: Although much work has been done, no generally available tools have yet emerged. Consensus on requirement for tool functions, especially terminology servers is emerging. Tools at NYPH have been used successfully to support the integration of clinical applications and the merger of health care institutions. Conclusions: Significant advancement has occurred over the past fifteen years in the development of sophisticated controlled terminologies and the tools to support them. The tool set at NYPH provides a case study to demonstrate one feasible architecture.

Download Full-text

Mapping the Gene Ontology Into the Unified Medical Language System

Comparative and Functional Genomics ◽

10.1002/cfg.407 ◽

2004 ◽

Vol 5 (4) ◽

pp. 354-361 ◽

Cited By ~ 22

Author(s):

Jane Lomax ◽

Alexa T. McCray

Keyword(s):

Gene Ontology ◽

Clinical Medicine ◽

Gene Products ◽

National Library ◽

Web Based ◽

Language System ◽

Unified Medical Language System ◽

Medical Language ◽

Large Numbers ◽

Go Terms

We have recently mapped the Gene Ontology (GO), developed by the Gene Ontology Consortium, into the National Library of Medicine's Unified Medical Language System (UMLS). GO has been developed for the purpose of annotating gene products in genome databases, and the UMLS has been developed as a framework for integrating large numbers of disparate terminologies, primarily for the purpose of providing better access to biomedical information sources. The mapping of GO to UMLS highlighted issues in both terminology systems. After some initial explorations and discussions between the UMLS and GO teams, the GO was integrated with the UMLS. Overall, a total of 23% of the GO terms either matched directly (3%) or linked (20%) to existing UMLS concepts. All GO terms now have a corresponding, official UMLS concept, and the entire vocabulary is available through the web-based UMLS Knowledge Source Server. The mapping of the Gene Ontology, with its focus on structures, processes and functions at the molecular level, to the existing broad coverage UMLS should contribute to linking the language and practices of clinical medicine to the language and practices of genomics.

Download Full-text

Integrated access to medical and pharmacological information: the unified medical language system at the National Library of Medicine

Chemical Information 2 ◽

10.1007/978-3-642-85872-7_16 ◽

1991 ◽

pp. 187-195

Author(s):

Peri Schuyler

Keyword(s):

National Library ◽

Language System ◽

Unified Medical Language System ◽

Medical Language

Download Full-text