scholarly journals A transformation-based method for auditing the IS-A hierarchy of biomedical terminologies in the Unified Medical Language System

2020 ◽  
Vol 27 (10) ◽  
pp. 1568-1575 ◽  
Author(s):  
Fengbo Zheng ◽  
Jay Shi ◽  
Yuntao Yang ◽  
W Jim Zheng ◽  
Licong Cui

Abstract Objective The Unified Medical Language System (UMLS) integrates various source terminologies to support interoperability between biomedical information systems. In this article, we introduce a novel transformation-based auditing method that leverages the UMLS knowledge to systematically identify missing hierarchical IS-A relations in the source terminologies. Materials and Methods Given a concept name in the UMLS, we first identify its base and secondary noun chunks. For each identified noun chunk, we generate replacement candidates that are more general than the noun chunk. Then, we replace the noun chunks with their replacement candidates to generate new potential concept names that may serve as supertypes of the original concept. If a newly generated name is an existing concept name in the same source terminology with the original concept, then a potentially missing IS-A relation between the original and the new concept is identified. Results Applying our transformation-based method to English-language concept names in the UMLS (2019AB release), a total of 39 359 potentially missing IS-A relations were detected in 13 source terminologies. Domain experts evaluated a random sample of 200 potentially missing IS-A relations identified in the SNOMED CT (U.S. edition) and 100 in Gene Ontology. A total of 173 of 200 and 63 of 100 potentially missing IS-A relations were confirmed by domain experts, indicating that our method achieved a precision of 86.5% and 63% for the SNOMED CT and Gene Ontology, respectively. Conclusions Our results showed that our transformation-based method is effective in identifying missing IS-A relations in the UMLS source terminologies.

Author(s):  
Kuo-Chuan Huang ◽  
James Geller ◽  
Michael Halper ◽  
Gai Elhanan ◽  
Yehoshua Perl

Synonym identification during source terminology integration into the Unified Medical Language System (UMLS) is a labor-intensive task needed for every new release of the source. The piecewise synonym (PWS) methodology was previously used for the integration of a small source. The goal of this paper is to determine whether the piecewise synonym methodology with two control parameters scales to a much larger terminology (a subset of SNOMED CT), the control parameters are necessary to make the methodology viable, and the control parameters lead to any loss of matching results. Additional methods for limiting the size of the dictionary used in the PWS generation methodology are used. The authors’ methodology discovered 41% of concepts not found by string matching. The necessity and effectiveness of the control parameters were confirmed. Furthermore, when comparing the results of experiments with and without control parameters, no matches were lost.


2004 ◽  
Vol 5 (4) ◽  
pp. 354-361 ◽  
Author(s):  
Jane Lomax ◽  
Alexa T. McCray

We have recently mapped the Gene Ontology (GO), developed by the Gene Ontology Consortium, into the National Library of Medicine's Unified Medical Language System (UMLS). GO has been developed for the purpose of annotating gene products in genome databases, and the UMLS has been developed as a framework for integrating large numbers of disparate terminologies, primarily for the purpose of providing better access to biomedical information sources. The mapping of GO to UMLS highlighted issues in both terminology systems. After some initial explorations and discussions between the UMLS and GO teams, the GO was integrated with the UMLS. Overall, a total of 23% of the GO terms either matched directly (3%) or linked (20%) to existing UMLS concepts. All GO terms now have a corresponding, official UMLS concept, and the entire vocabulary is available through the web-based UMLS Knowledge Source Server. The mapping of the Gene Ontology, with its focus on structures, processes and functions at the molecular level, to the existing broad coverage UMLS should contribute to linking the language and practices of clinical medicine to the language and practices of genomics.


Author(s):  
Kuo-Chuan Huang ◽  
James Geller ◽  
Michael Halper ◽  
Gai Elhanan ◽  
Yehoshua Perl

Synonym identification during source terminology integration into the Unified Medical Language System (UMLS) is a labor-intensive task needed for every new release of the source. The piecewise synonym (PWS) methodology was previously used for the integration of a small source. The goal of this paper is to determine whether the piecewise synonym methodology with two control parameters scales to a much larger terminology (a subset of SNOMED CT), the control parameters are necessary to make the methodology viable, and the control parameters lead to any loss of matching results. Additional methods for limiting the size of the dictionary used in the PWS generation methodology are used. The authors’ methodology discovered 41% of concepts not found by string matching. The necessity and effectiveness of the control parameters were confirmed. Furthermore, when comparing the results of experiments with and without control parameters, no matches were lost.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-7
Author(s):  
Nada Boudjellal ◽  
Huaping Zhang ◽  
Asif Khan ◽  
Arshad Ahmad ◽  
Rashid Naseem ◽  
...  

The rapidly growing data in many areas, as well as in the biomedical domain, require the assistance of information extraction systems to acquire the much needed knowledge about specific entities such as proteins, drugs, or diseases practically within a short time. Annotated corpora serve the purpose of facilitating the process of building NLP systems. While colossal work has been done in this area for English language, other languages like Arabic seem to lack these resources, especially in the healthcare area. Therefore, in this work, we present a method to develop a silver standard medical corpus for the Arabic language with a dictionary as a minimal supervision tool. The corpus contains 49,856 sentences tagged with 13 entity types corresponding to a subset of UMLS (Unified Medical Language System) concept types. The evaluation of a subset of corpus showed the efficiency of the method used to annotate it with 90% accuracy.


1991 ◽  
Vol 11 (4_suppl) ◽  
pp. S89-S93 ◽  
Author(s):  
James J. Cimino ◽  
Soumitra Sengupta

The authors use an example to illustrate combining Integrated Academic Information Management System (IAIMS) components (applications) into an integral whole, to facilitate using the components simultaneously or in sequence. They examine a model for classifying IAIMS systems, proposing ways in which the Unified Medical Language System (UMLS) can be exploited in them.


Sign in / Sign up

Export Citation Format

Share Document