scholarly journals Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)–based ranking for concept normalization

2020 ◽  
Vol 27 (10) ◽  
pp. 1510-1519
Author(s):  
Dongfang Xu ◽  
Manoj Gopale ◽  
Jiacheng Zhang ◽  
Kris Brown ◽  
Edmon Begoli ◽  
...  

Abstract Objective Concept normalization, the task of linking phrases in text to concepts in an ontology, is useful for many downstream tasks including relation extraction, information retrieval, etc. We present a generate-and-rank concept normalization system based on our participation in the 2019 National NLP Clinical Challenges Shared Task Track 3 Concept Normalization. Materials and Methods The shared task provided 13 609 concept mentions drawn from 100 discharge summaries. We first design a sieve-based system that uses Lucene indices over the training data, Unified Medical Language System (UMLS) preferred terms, and UMLS synonyms to generate a list of possible concepts for each mention. We then design a listwise classifier based on the BERT (Bidirectional Encoder Representations from Transformers) neural network to rank the candidate concepts, integrating UMLS semantic types through a regularizer. Results Our generate-and-rank system was third of 33 in the competition, outperforming the candidate generator alone (81.66% vs 79.44%) and the previous state of the art (76.35%). During postevaluation, the model’s accuracy was increased to 83.56% via improvements to how training data are generated from UMLS and incorporation of our UMLS semantic type regularizer. Discussion Analysis of the model shows that prioritizing UMLS preferred terms yields better performance, that the UMLS semantic type regularizer results in qualitatively better concept predictions, and that the model performs well even on concepts not seen during training. Conclusions Our generate-and-rank framework for UMLS concept normalization integrates key UMLS features like preferred terms and semantic types with a neural network–based ranking model to accurately link phrases in text to UMLS concepts.

2020 ◽  
Vol 21 (S16) ◽  
Author(s):  
Rui Xing ◽  
Jie Luo ◽  
Tengwei Song

Abstract Background Although biomedical publications and literature are growing rapidly, there still lacks structured knowledge that can be easily processed by computer programs. In order to extract such knowledge from plain text and transform them into structural form, the relation extraction problem becomes an important issue. Datasets play a critical role in the development of relation extraction methods. However, existing relation extraction datasets in biomedical domain are mainly human-annotated, whose scales are usually limited due to their labor-intensive and time-consuming nature. Results We construct BioRel, a large-scale dataset for biomedical relation extraction problem, by using Unified Medical Language System as knowledge base and Medline as corpus. We first identify mentions of entities in sentences of Medline and link them to Unified Medical Language System with Metamap. Then, we assign each sentence a relation label by using distant supervision. Finally, we adapt the state-of-the-art deep learning and statistical machine learning methods as baseline models and conduct comprehensive experiments on the BioRel dataset. Conclusions Based on the extensive experimental results, we have shown that BioRel is a suitable large-scale datasets for biomedical relation extraction, which provides both reasonable baseline performance and many remaining challenges for both deep learning and statistical methods.


2020 ◽  
Vol 27 (10) ◽  
pp. 1625-1638 ◽  
Author(s):  
Ling Zheng ◽  
Zhe He ◽  
Duo Wei ◽  
Vipina Keloth ◽  
Jung-Wei Fan ◽  
...  

Abstract Objective The study sought to describe the literature related to the development of methods for auditing the Unified Medical Language System (UMLS), with particular attention to identifying errors and inconsistencies of attributes of the concepts in the UMLS Metathesaurus. Materials and Methods We applied the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) approach by searching the MEDLINE database and Google Scholar for studies referencing the UMLS and any of several terms related to auditing, error detection, and quality assurance. A qualitative analysis and summarization of articles that met inclusion criteria were performed. Results Eighty-three studies were reviewed in detail. We first categorized techniques based on various aspects including concepts, concept names, and synonymy (n = 37), semantic type assignments (n = 36), hierarchical relationships (n = 24), lateral relationships (n = 12), ontology enrichment (n = 8), and ontology alignment (n = 18). We also categorized the methods according to their level of automation (ie, automated systematic, automated heuristic, or manual) and the type of knowledge used (ie, intrinsic or extrinsic knowledge). Conclusions This study is a comprehensive review of the published methods for auditing the various conceptual aspects of the UMLS. Categorizing the auditing techniques according to the various aspects will enable the curators of the UMLS as well as researchers comprehensive easy access to this wealth of knowledge (eg, for auditing lateral relationships in the UMLS). We also reviewed ontology enrichment and alignment techniques due to their critical use of and impact on the UMLS.


2017 ◽  
Vol 44 (5) ◽  
pp. 619-643 ◽  
Author(s):  
Gun-Woo Kim ◽  
Dong-Ho Lee

According to the growing interest in mobile healthcare, multi-document summarisation techniques are increasingly required to cope with health information overload and effectively deliver personalised online healthcare information. However, because of the peculiarities of medical terminology and the diversity of subtopics in health documents, multi-document summarisation must consider technical aspects that are different from those of the general domain. In this article, we propose a personalised health document summarisation system that provides a reliable personal health-related summary to general healthcare consumers via mobile devices. Our system generates a personalised summary from multiple online health documents by exploiting biomedical concepts, semantic types and semantic relations extracted from the Unified Medical Language System (UMLS) and analysing individual health records derived from mobile personal health record (PHR) applications. Furthermore, to increase the diversity and coverage of summarised results and to display them in a user-friendly manner on mobile devices, we create a summary that is categorised into subtopics by grouping semantically related sentences through topic-based clustering. The experimental evaluations demonstrate the effectiveness of our proposed system.


2020 ◽  
Vol 27 (10) ◽  
pp. 1556-1567
Author(s):  
Maxwell A Weinzierl ◽  
Ramon Maldonado ◽  
Sanda M Harabagiu

Abstract Objective We explored how knowledge embeddings (KEs) learned from the Unified Medical Language System (UMLS) Metathesaurus impact the quality of relation extraction on 2 diverse sets of biomedical texts. Materials and Methods Two forms of KEs were learned for concepts and relation types from the UMLS Metathesaurus, namely lexicalized knowledge embeddings (LKEs) and unlexicalized KEs. A knowledge embedding encoder (KEE) enabled learning either LKEs or unlexicalized KEs as well as neural models capable of producing LKEs for mentions of biomedical concepts in texts and relation types that are not encoded in the UMLS Metathesaurus. This allowed us to design the relation extraction with knowledge embeddings (REKE) system, which incorporates either LKEs or unlexicalized KEs produced for relation types of interest and their arguments. Results The incorporation of either LKEs or unlexicalized KE in REKE advances the state of the art in relation extraction on 2 relation extraction datasets: the 2010 i2b2/VA dataset and the 2013 Drug-Drug Interaction Extraction Challenge corpus. Moreover, the impact of LKEs is superior, achieving F1 scores of 78.2 and 82.0, respectively. Discussion REKE not only highlights the importance of incorporating knowledge encoded in the UMLS Metathesaurus in a novel way, through 2 possible forms of KEs, but it also showcases the subtleties of incorporating KEs in relation extraction systems. Conclusions Incorporating LKEs informed by the UMLS Metathesaurus in a relation extraction system operating on biomedical texts shows significant promise. We present the REKE system, which establishes new state-of-the-art results for relation extraction on 2 datasets when using LKEs.


1991 ◽  
Vol 11 (4_suppl) ◽  
pp. S89-S93 ◽  
Author(s):  
James J. Cimino ◽  
Soumitra Sengupta

The authors use an example to illustrate combining Integrated Academic Information Management System (IAIMS) components (applications) into an integral whole, to facilitate using the components simultaneously or in sequence. They examine a model for classifying IAIMS systems, proposing ways in which the Unified Medical Language System (UMLS) can be exploited in them.


Sign in / Sign up

Export Citation Format

Share Document