Scalability of Piecewise Synonym Identification in Integration of SNOMED into the UMLS

Synonym identification during source terminology integration into the Unified Medical Language System (UMLS) is a labor-intensive task needed for every new release of the source. The piecewise synonym (PWS) methodology was previously used for the integration of a small source. The goal of this paper is to determine whether the piecewise synonym methodology with two control parameters scales to a much larger terminology (a subset of SNOMED CT), the control parameters are necessary to make the methodology viable, and the control parameters lead to any loss of matching results. Additional methods for limiting the size of the dictionary used in the PWS generation methodology are used. The authors’ methodology discovered 41% of concepts not found by string matching. The necessity and effectiveness of the control parameters were confirmed. Furthermore, when comparing the results of experiments with and without control parameters, no matches were lost.

Download Full-text

A transformation-based method for auditing the IS-A hierarchy of biomedical terminologies in the Unified Medical Language System

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa123 ◽

2020 ◽

Vol 27 (10) ◽

pp. 1568-1575 ◽

Cited By ~ 1

Author(s):

Fengbo Zheng ◽

Jay Shi ◽

Yuntao Yang ◽

W Jim Zheng ◽

Licong Cui

Keyword(s):

Gene Ontology ◽

Information Systems ◽

Random Sample ◽

English Language ◽

Snomed Ct ◽

Domain Experts ◽

Language System ◽

Unified Medical Language System ◽

Medical Language ◽

Original Concept

Abstract Objective The Unified Medical Language System (UMLS) integrates various source terminologies to support interoperability between biomedical information systems. In this article, we introduce a novel transformation-based auditing method that leverages the UMLS knowledge to systematically identify missing hierarchical IS-A relations in the source terminologies. Materials and Methods Given a concept name in the UMLS, we first identify its base and secondary noun chunks. For each identified noun chunk, we generate replacement candidates that are more general than the noun chunk. Then, we replace the noun chunks with their replacement candidates to generate new potential concept names that may serve as supertypes of the original concept. If a newly generated name is an existing concept name in the same source terminology with the original concept, then a potentially missing IS-A relation between the original and the new concept is identified. Results Applying our transformation-based method to English-language concept names in the UMLS (2019AB release), a total of 39 359 potentially missing IS-A relations were detected in 13 source terminologies. Domain experts evaluated a random sample of 200 potentially missing IS-A relations identified in the SNOMED CT (U.S. edition) and 100 in Gene Ontology. A total of 173 of 200 and 63 of 100 potentially missing IS-A relations were confirmed by domain experts, indicating that our method achieved a precision of 86.5% and 63% for the SNOMED CT and Gene Ontology, respectively. Conclusions Our results showed that our transformation-based method is effective in identifying missing IS-A relations in the UMLS source terminologies.

Download Full-text

Unified Medical Language System

10.32388/urur42 ◽

2020 ◽

Cited By ~ 2

Author(s):

Keyword(s):

Language System ◽

Unified Medical Language System ◽

Medical Language

Download Full-text

Using Semantic and Structural Properties of the Unified Medical Language System to Discover Potential Terminological Relationships

Journal of the American Medical Informatics Association ◽

10.1197/jamia.m2931 ◽

2009 ◽

Vol 16 (3) ◽

pp. 346-353 ◽

Cited By ~ 10

Author(s):

C. O. Patel ◽

J. J. Cimino

Keyword(s):

Structural Properties ◽

Language System ◽

Unified Medical Language System ◽

Medical Language

Download Full-text

Auditing the Unified Medical Language System with Semantic Methods

Journal of the American Medical Informatics Association ◽

10.1136/jamia.1998.0050041 ◽

1998 ◽

Vol 5 (1) ◽

pp. 41-51 ◽

Cited By ~ 48

Author(s):

J. J. Cimino

Keyword(s):

Language System ◽

Unified Medical Language System ◽

Medical Language

Download Full-text

The outline of Unified Medical Language System(UMLS) Knowledge Sources.

Journal of Information Processing and Management ◽

10.1241/johokanri.41.15 ◽

1998 ◽

Vol 41 (1) ◽

pp. 15-23

Author(s):

Koreni KAWANO

Keyword(s):

Knowledge Sources ◽

Language System ◽

Unified Medical Language System ◽

Medical Language

Download Full-text

Unified Medical Language System

Electronic Health Record ◽

10.1002/9781118479612.ch16 ◽

2012 ◽

pp. 145-152 ◽

Cited By ~ 1

Keyword(s):

Language System ◽

Unified Medical Language System ◽

Medical Language

Download Full-text

IAIMS and UMLS at Columbia-Presbyterian Medical Center

Medical Decision Making ◽

10.1177/0272989x9101104s17 ◽

1991 ◽

Vol 11 (4_suppl) ◽

pp. S89-S93 ◽

Cited By ~ 4

Author(s):

James J. Cimino ◽

Soumitra Sengupta

Keyword(s):

Information Management ◽

Management System ◽

Medical Center ◽

Information Management System ◽

Language System ◽

Unified Medical Language System ◽

Medical Language ◽

Academic Information

The authors use an example to illustrate combining Integrated Academic Information Management System (IAIMS) components (applications) into an integral whole, to facilitate using the components simultaneously or in sequence. They examine a model for classifying IAIMS systems, proposing ways in which the Unified Medical Language System (UMLS) can be exploited in them.

Download Full-text

Mining Biomedical Data Using MetaMap Transfer (MMTx) and the Unified Medical Language System (UMLS)

Gene Function Analysis - Methods in Molecular Biology™ ◽

10.1007/978-1-59745-547-3_9 ◽

2007 ◽

pp. 153-169 ◽

Cited By ~ 15

Author(s):

John D. Osborne ◽

Simon Lin ◽

Lihua Julie Zhu ◽

Warren A. Kibbe

Keyword(s):

Biomedical Data ◽

Language System ◽

Unified Medical Language System ◽

Medical Language

Download Full-text

Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa136 ◽

2020 ◽

Vol 27 (10) ◽

pp. 1538-1546 ◽

Cited By ~ 1

Author(s):

Yuqing Mao ◽

Kin Wah Fung

Keyword(s):

Word Sense Disambiguation ◽

Graph Embedding ◽

Semantic Relatedness ◽

Word Sense ◽

Medical Subject Headings ◽

Network Graph ◽

Convolutional Network ◽

Language System ◽

Unified Medical Language System ◽

Medical Language

Abstract Objective The study sought to explore the use of deep learning techniques to measure the semantic relatedness between Unified Medical Language System (UMLS) concepts. Materials and Methods Concept sentence embeddings were generated for UMLS concepts by applying the word embedding models BioWordVec and various flavors of BERT to concept sentences formed by concatenating UMLS terms. Graph embeddings were generated by the graph convolutional networks and 4 knowledge graph embedding models, using graphs built from UMLS hierarchical relations. Semantic relatedness was measured by the cosine between the concepts’ embedding vectors. Performance was compared with 2 traditional path-based (shortest path and Leacock-Chodorow) measurements and the publicly available concept embeddings, cui2vec, generated from large biomedical corpora. The concept sentence embeddings were also evaluated on a word sense disambiguation (WSD) task. Reference standards used included the semantic relatedness and semantic similarity datasets from the University of Minnesota, concept pairs generated from the Standardized MedDRA Queries and the MeSH (Medical Subject Headings) WSD corpus. Results Sentence embeddings generated by BioWordVec outperformed all other methods used individually in semantic relatedness measurements. Graph convolutional network graph embedding uniformly outperformed path-based measurements and was better than some word embeddings for the Standardized MedDRA Queries dataset. When used together, combined word and graph embedding achieved the best performance in all datasets. For WSD, the enhanced versions of BERT outperformed BioWordVec. Conclusions Word and graph embedding techniques can be used to harness terms and relations in the UMLS to measure semantic relatedness between concepts. Concept sentence embedding outperforms path-based measurements and cui2vec, and can be further enhanced by combining with graph embedding.

Download Full-text