scholarly journals Sculpting the UMLS Refined Semantic Network

Author(s):  
Zhe He ◽  
C. Paul Morrey ◽  
Yehoshua Perl ◽  
Gai Elhanan ◽  
Ling Chen ◽  
...  

Background: The Refined Semantic Network (RSN) for the UMLS was previously introduced to complement the UMLS Semantic Network (SN). The RSN partitions the UMLS Metathesaurus (META) into disjoint groups of concepts. Each such group is semantically uniform. However, the RSN was initially an order of magnitude larger than the SN, which is undesirable since to be useful, a semantic network should be compact. Most semantic types in the RSN represent combinations of semantic types in the UMLS SN. Such a “combination semantic type” is called Intersection Semantic Type (IST). Many ISTs are assigned to very few concepts. Moreover, when reviewing those concepts, many semantic type assignment inconsistencies were found. After correcting those inconsistencies many ISTs, among them some that contradicted UMLS rules, disappeared, which made the RSN smaller.Objective: The authors performed a longitudinal study with the goal of reducing the size of the RSN to become compact. This goal was achieved by correcting inconsistencies and errors in the IST assignments in the UMLS, which additionally helped identify and correct ambiguities, inconsistencies, and errors in source terminologies widely used in the realm of public health. Methods: In this paper, we discuss the process and steps employed in this longitudinal study and the intermediate results for different stages. The sculpting process includes removing redundant semantic type assignments, expanding semantic type assignments, and removing illegitimate ISTs by auditing ISTs of small extents. However, the emphasis of this paper is not on the auditing methodologies employed during the process, since they were introduced in earlier publications, but on the strategy of employing them in order to transform the RSN into a compact network. For this paper we also performed a comprehensive audit of 168 “small ISTs” in the 2013AA version of the UMLS to finalize the longitudinal study. Results: Over the years it was found that the editors of the UMLS introduced some new inconsistencies that resulted in the reintroduction of unwarranted ISTs that had already been eliminated as a result of their previous corrections. Because of that, the transformation of the RSN into a compact network covering all necessary categories for the UMLS was slowed down. The corrections suggested by an audit of the 2013AA version of the UMLS achieve a compact RSN of equal magnitude as the UMLS SN. The number of ISTs has been reduced to 336. We also demonstrate how auditing the semantic type assignments of UMLS concepts can expose other modeling errors in the UMLS source terminologies, e.g., SNOMED CT, LOINC, and RxNORM that are important for health informatics. Such errors would otherwise stay hidden.Conclusions: It is hoped that the UMLS curators will implement all required corrections and use the RSN along with the SN when maintaining and extending the UMLS. When used correctly, the RSN will support the prevention of the accidental introduction of inconsistent semantic type assignments into the UMLS. Furthermore, this way the RSN will support the exposure of other hidden errors and inconsistencies in health informatics terminologies, which are sources of the UMLS. Notably, the development of the RSN materializes the deeper, more refined Semantic Network for the UMLS that its designers envisioned originally but had not implemented.

2016 ◽  
Vol 55 (02) ◽  
pp. 158-165 ◽  
Author(s):  
Y. Chen ◽  
Z. He ◽  
M. Halper ◽  
L. Chen ◽  
H. Gu

SummaryBackground: The Unified Medical Language System (UMLS) is one of the largest biomedical terminological systems, with over 2.5 million concepts in its Metathesaurus repository. The UMLS’s Semantic Network (SN) with its collection of 133 high-level semantic types serves as an abstraction layer on top of the Metathesaurus. In particular, the SN elaborates an aspect of the Metathesaurus’s concepts via the assignment of one or more types to each concept. Due to the scope and complexity of the Metathesaurus, errors are all but inevitable in this semantic-type assignment process.Objectives: To develop a semi-automated methodology to help assure the quality of semantic-type assignments within the UMLS.Methods: The methodology uses a cross- validation strategy involving SNOMED CT’s hierarchies in combination with UMLS se -mantic types. Semantically uniform, disjoint concept groups are generated programmatically by partitioning the collection of all concepts in the same SNOMED CT hierarchy according to their respective semantic-type assignments in the UMLS. Domain experts are then called upon to review the concepts in any group having a small number of concepts. It is our hypothesis that a semantic-type assignment combination applicable only to a very small number of concepts in a SNOMED CT hierarchy is an indicator of potential problems.Results: The methodology was applied to the UMLS 2013AA release along with the SNOMED CT from January 2013. An overall error rate of 33% was found for concepts proposed by the quality-assurance methodology. Supporting our hypothesis, that number was four times higher than the error rate found in control samples.Conclusion: The results show that the quality-assurance methodology can aid in effective and efficient identification of UMLS semantic-type assignment errors.


2018 ◽  
Vol 57 (01/02) ◽  
pp. 43-53 ◽  
Author(s):  
Zhe He ◽  
Duo Wei ◽  
Gai Elhanan ◽  
Yan Chen ◽  
Huanying Gu

Summary Background: The UMLS assigns semantic types to all its integrated concepts. The semantic types are widely used in various natural language processing tasks in the biomedical domain, such as named entity recognition, semantic disambiguation, and semantic annotation. Due to the size of the UMLS, erroneous semantic type assignments are hard to detect. It is imperative to devise automated techniques to identify errors and inconsistencies in semantic type assignments. Objectives: Designing a methodology to perform programmatic checks to detect semantic type assignment errors for UMLS concepts with one or more SNOMED CT terms and evaluating concepts in a selected set of SNOMED CT hierarchies to verify our hypothesis that UMLS semantic type assignment errors may exist in concepts residing in semantically inconsistent groups. Methods: Our methodology is a four-stage process. 1) partitioning concepts in a SNOMED CT hierarchy into semantically uniform groups based on their assigned semantic tags; 2) partitioning concepts in each group from 1) into the disjoint sub-groups based on their semantic type assignments; 3) mapping all SNOMED CT semantic tags into one or more semantic types in the UMLS; 4) identifying semantically inconsistent groups that have inconsistent assignments between semantic tags and semantic types according to the mapping from 3) and providing concepts in such groups to the domain experts for reviewing. Results: We applied our method on the UMLS 2013AA release. Concepts of the semantically inconsistent groups in the PHYSICAL FORCE and RECORD ARTIFACT hierarchies have error rates 33% and 62.5% respectively, which are greatly larger than error rates 0.6% and 1% in semantically consistent groups of the two hierarchies. Conclusion: Concepts in semantically in - consistent groups are more likely to contain semantic type assignment errors. Our methodology can make auditing more efficient by limiting auditing resources on concepts of semantically inconsistent groups.


2018 ◽  
pp. 553-576
Author(s):  
Rishi Kanth Saripalle

In the domain of biomedical and health informatics, ontologies are widely used to capture knowledge ranging from bioinformatics such as gene, protein, protein interactions, etc. to clinical/healthcare informatics knowledge such as diseases, symptoms, treatment, medication, etc. Currently, one knowledge source that encapsulates a broad spectrum of medical knowledge is the Unified Medical Language System (UMLS), which can be defined as a compendium of diverse medical ontological standards. The primary components of the UMLS are: Semantic Network (UMLS-SN) – designed by interconnecting well-defined semantic types with semantic relationships, and Metathesaurus (UMLS-META) – the base of UMLS system that is comprised of millions of medical concepts from diverse medical standards. However, within the biomedical and health informatics community, the concepts of software engineering and domain modeling (using meta-models such as ERD, UML, and XML) are very successful in designing and implementing biomedical/health domain application models. In the current status, the UMLS-SN is primarily employed for classification of medical concepts in UMLS-META, but UMLS-SN knowledge can't be viewed or employed as a modeling framework for designing ontological//biomedical applicaton models and is restricted to the UMLS environment. Thus, the impact of the biomedical semantics captured by UMLS-SN might be minimal in medical facilities, research and healthcare organizations that are highly influenced by software engineering, meta-models and domain model-based practices. In order fill this gap, the author proposes a meta-modeling framework for UMLS-SN based on the UML Profile (built using UML meta-model) that will result in a customized domain specific meta-model. This specialized meta-model that encapsulates the medical knowledge semantics of UMLS-SN can then be employed for designing ontological models or relevant healthcare application models and simultaneously be coherent with software meta-models and domain modeling practices.


2011 ◽  
Vol 35 (1) ◽  
pp. 112-127 ◽  
Author(s):  
Ekkehard König

This paper presents a detailed analysis of reflexive nominal compounds like self-assessment in English and their counterparts in nine other languages, whose number and use has strongly increased in these languages over the last several decades. The first component of these compounds is shown to be related to intensifiers like selbst in German and its cognate form self- in English, whose multiple uses also underlie different semantic types of reflexive compounds (self-help vs. self-control), whereas the second component typically derives from transitive verbs. Among the central problems discussed in this paper are the question of the productivity of these compounds and the possibility of deriving their meaning in a compositional fashion. The parameters of variation manifested by the sample of languages under comparison in this pilot study concern inter alia the form of the intensifier (native or borrowed, one or two), the semantic type, and the lexical category of the resultant compound.


2019 ◽  
Vol 5 (1) ◽  
pp. 79-90
Author(s):  
Ida Ayu Pristina Pidada ◽  
Mirsa Umiyati ◽  
Ni Wayan Kasni

A number of studies on verbs as one of the linguistic grammatical categories, serving to desccribe events, have been to explore in depth their more distinct types according to their semantic primitives under the natural semantic metalanguage theory approach. This research aims is to features the semantic types and specific roles of the verb ‘to carry’ in Balinese from the natural semantic metalanguage theory perspective. This study is a qualitative study. The semantic types of the verb in question was first classified in order to ease the identification of their specific semantic roles. The results of research show type of semantic roles were restricted to agent for the arguments serving as an actor of the activity described with the each of semantic type of the verb ‘to carry’ and patient for those serving as target of the said activity. This research discloses a 21 type of Balinese verbs which semantically have an intimate relation to the verb ‘to carry’; they are nèngtèng, ningting, nyangkol, nyangkil, nyuun, negen, ngandong, nenggolong, nyelet, nyelepit, ngabin, nampa, ngundit, nangal, nandan, nyekel, nikul, ngenyang, mundut, nyunggi, dan ngayot.


1996 ◽  
Vol 20 (2) ◽  
pp. 365-380
Author(s):  
Renata Kozlowska-Heuchin

The subject of this article is the analysis of clauses of aim, cause, consequence and condition in French in view to the automatic processing. Our theoretical framework is that of lexicon-grammar. This study differs from the usual grammatical analyses. Here, the complex sentence is studied on the model of the simple sentence, defined as an operator accompanied by its arguments. The conjunctive phrase is our starting point for this study, and it is then shown that the noun around which it is formed, is of predicative type and has the main clause and the subordinate as arguments. This is a predicate «of second order». Automatic processing requires extremely accurate notation of syntactic and semantic properties if ambiguity and polysemy are to be correctly handled. Those descriptions based on syntactico-semantic features are insufficient, which is why the concept of « class of objects » is brought in. There are as many types of relations as there are semantic types of predicate. This is the reason why a semantic typology of predicates is sketched out, integrating lexical, syntactic and semantic components. It is shown that each semantic type can have its own appropriate lexical means of expression and specific syntactic behaviour.


2018 ◽  
Vol 25 (12) ◽  
pp. 1618-1625 ◽  
Author(s):  
George Hripcsak ◽  
Matthew E Levine ◽  
Ning Shang ◽  
Patrick B Ryan

Abstract Objective To study the effect on patient cohorts of mapping condition (diagnosis) codes from source billing vocabularies to a clinical vocabulary. Materials and Methods Nine International Classification of Diseases, Ninth Revision, Clinical Modification (ICD9-CM) concept sets were extracted from eMERGE network phenotypes, translated to Systematized Nomenclature of Medicine - Clinical Terms concept sets, and applied to patient data that were mapped from source ICD9-CM and ICD10-CM codes to Systematized Nomenclature of Medicine - Clinical Terms codes using Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership (OMOP) vocabulary mappings. The original ICD9-CM concept set and a concept set extended to ICD10-CM were used to create patient cohorts that served as gold standards. Results Four phenotype concept sets were able to be translated to Systematized Nomenclature of Medicine - Clinical Terms without ambiguities and were able to perform perfectly with respect to the gold standards. The other 5 lost performance when 2 or more ICD9-CM or ICD10-CM codes mapped to the same Systematized Nomenclature of Medicine - Clinical Terms code. The patient cohorts had a total error (false positive and false negative) of up to 0.15% compared to querying ICD9-CM source data and up to 0.26% compared to querying ICD9-CM and ICD10-CM data. Knowledge engineering was required to produce that performance; simple automated methods to generate concept sets had errors up to 10% (one outlier at 250%). Discussion The translation of data from source vocabularies to Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT) resulted in very small error rates that were an order of magnitude smaller than other error sources. Conclusion It appears possible to map diagnoses from disparate vocabularies to a single clinical vocabulary and carry out research using a single set of definitions, thus improving efficiency and transportability of research.


Sign in / Sign up

Export Citation Format

Share Document