scholarly journals Scalable quality assurance for large SNOMED CT hierarchies using subject-based subtaxonomies

2015 ◽  
Vol 22 (3) ◽  
pp. 507-518 ◽  
Author(s):  
Christopher Ochs ◽  
James Geller ◽  
Yehoshua Perl ◽  
Yan Chen ◽  
Junchuan Xu ◽  
...  

Abstract Objective Standards terminologies may be large and complex, making their quality assurance challenging. Some terminology quality assurance (TQA) methodologies are based on abstraction networks (AbNs), compact terminology summaries. We have tested AbNs and the performance of related TQA methodologies on small terminology hierarchies. However, some standards terminologies, for example, SNOMED, are composed of very large hierarchies. Scaling AbN TQA techniques to such hierarchies poses a significant challenge. We present a scalable subject-based approach for AbN TQA. Methods An innovative technique is presented for scaling TQA by creating a new kind of subject-based AbN called a subtaxonomy for large hierarchies. New hypotheses about concentrations of erroneous concepts within the AbN are introduced to guide scalable TQA. Results We test the TQA methodology for a subject-based subtaxonomy for the Bleeding subhierarchy in SNOMED's large Clinical finding hierarchy. To test the error concentration hypotheses, three domain experts reviewed a sample of 300 concepts. A consensus-based evaluation identified 87 erroneous concepts. The subtaxonomy-based TQA methodology was shown to uncover statistically significantly more erroneous concepts when compared to a control sample. Discussion The scalability of TQA methodologies is a challenge for large standards systems like SNOMED. We demonstrated innovative subject-based TQA techniques by identifying groups of concepts with a higher likelihood of having errors within the subtaxonomy. Scalability is achieved by reviewing a large hierarchy by subject. Conclusions An innovative methodology for scaling the derivation of AbNs and a TQA methodology was shown to perform successfully for the largest hierarchy of SNOMED.

2016 ◽  
Vol 55 (02) ◽  
pp. 158-165 ◽  
Author(s):  
Y. Chen ◽  
Z. He ◽  
M. Halper ◽  
L. Chen ◽  
H. Gu

SummaryBackground: The Unified Medical Language System (UMLS) is one of the largest biomedical terminological systems, with over 2.5 million concepts in its Metathesaurus repository. The UMLS’s Semantic Network (SN) with its collection of 133 high-level semantic types serves as an abstraction layer on top of the Metathesaurus. In particular, the SN elaborates an aspect of the Metathesaurus’s concepts via the assignment of one or more types to each concept. Due to the scope and complexity of the Metathesaurus, errors are all but inevitable in this semantic-type assignment process.Objectives: To develop a semi-automated methodology to help assure the quality of semantic-type assignments within the UMLS.Methods: The methodology uses a cross- validation strategy involving SNOMED CT’s hierarchies in combination with UMLS se -mantic types. Semantically uniform, disjoint concept groups are generated programmatically by partitioning the collection of all concepts in the same SNOMED CT hierarchy according to their respective semantic-type assignments in the UMLS. Domain experts are then called upon to review the concepts in any group having a small number of concepts. It is our hypothesis that a semantic-type assignment combination applicable only to a very small number of concepts in a SNOMED CT hierarchy is an indicator of potential problems.Results: The methodology was applied to the UMLS 2013AA release along with the SNOMED CT from January 2013. An overall error rate of 33% was found for concepts proposed by the quality-assurance methodology. Supporting our hypothesis, that number was four times higher than the error rate found in control samples.Conclusion: The results show that the quality-assurance methodology can aid in effective and efficient identification of UMLS semantic-type assignment errors.


2017 ◽  
Vol 24 (4) ◽  
pp. 788-798 ◽  
Author(s):  
Licong Cui ◽  
Wei Zhu ◽  
Shiqiang Tao ◽  
James T Case ◽  
Olivier Bodenreider ◽  
...  

Abstract Objective: Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of missing hierarchical relations and concepts in SNOMED CT. Material and Methods: All non-lattice subgraphs (the structural part) in SNOMED CT are exhaustively extracted using a scalable MapReduce algorithm. Four lexical patterns (the lexical part) are identified among the extracted non-lattice subgraphs. Non-lattice subgraphs exhibiting such lexical patterns are often indicative of missing hierarchical relations or concepts. Each lexical pattern is associated with a potential specific type of error. Results: Applying the structural-lexical method to SNOMED CT (September 2015 US edition), we found 6801 non-lattice subgraphs that matched these lexical patterns, of which 2046 were amenable to visual inspection. We evaluated a random sample of 100 small subgraphs, of which 59 were reviewed in detail by domain experts. All the subgraphs reviewed contained errors confirmed by the experts. The most frequent type of error was missing is-a relations due to incomplete or inconsistent modeling of the concepts. Conclusions: Our hybrid structural-lexical method is innovative and proved effective not only in detecting errors in SNOMED CT, but also in suggesting remediation for these errors.


2020 ◽  
Vol 20 (S10) ◽  
Author(s):  
Ankur Agrawal ◽  
Licong Cui

AbstractBiological and biomedical ontologies and terminologies are used to organize and store various domain-specific knowledge to provide standardization of terminology usage and to improve interoperability. The growing number of such ontologies and terminologies and their increasing adoption in clinical, research and healthcare settings call for effective and efficient quality assurance and semantic enrichment techniques of these ontologies and terminologies. In this editorial, we provide an introductory summary of nine articles included in this supplement issue for quality assurance and enrichment of biological and biomedical ontologies and terminologies. The articles cover a range of standards including SNOMED CT, National Cancer Institute Thesaurus, Unified Medical Language System, North American Association of Central Cancer Registries and OBO Foundry Ontologies.


2014 ◽  
Vol 22 (3) ◽  
pp. 628-639 ◽  
Author(s):  
Christopher Ochs ◽  
James Geller ◽  
Yehoshua Perl ◽  
Yan Chen ◽  
Ankur Agrawal ◽  
...  

Abstract Objective Large and complex terminologies, such as Systematized Nomenclature of Medicine–Clinical Terms (SNOMED CT), are prone to errors and inconsistencies. Abstraction networks are compact summarizations of the content and structure of a terminology. Abstraction networks have been shown to support terminology quality assurance. In this paper, we introduce an abstraction network derivation methodology which can be applied to SNOMED CT target hierarchies whose classes are defined using only hierarchical relationships (ie, without attribute relationships) and similar description-logic-based terminologies. Methods We introduce the tribal abstraction network (TAN), based on the notion of a tribe—a subhierarchy rooted at a child of a hierarchy root, assuming only the existence of concepts with multiple parents. The TAN summarizes a hierarchy that does not have attribute relationships using sets of concepts, called tribal units that belong to exactly the same multiple tribes. Tribal units are further divided into refined tribal units which contain closely related concepts. A quality assurance methodology that utilizes TAN summarizations is introduced. Results A TAN is derived for the Observable entity hierarchy of SNOMED CT, summarizing its content. A TAN-based quality assurance review of the concepts of the hierarchy is performed, and erroneous concepts are shown to appear more frequently in large refined tribal units than in small refined tribal units. Furthermore, more erroneous concepts appear in large refined tribal units of more tribes than of fewer tribes. Conclusions In this paper we introduce the TAN for summarizing SNOMED CT target hierarchies. A TAN was derived for the Observable entity hierarchy of SNOMED CT. A quality assurance methodology utilizing the TAN was introduced and demonstrated.


2020 ◽  
Vol 27 (10) ◽  
pp. 1568-1575 ◽  
Author(s):  
Fengbo Zheng ◽  
Jay Shi ◽  
Yuntao Yang ◽  
W Jim Zheng ◽  
Licong Cui

Abstract Objective The Unified Medical Language System (UMLS) integrates various source terminologies to support interoperability between biomedical information systems. In this article, we introduce a novel transformation-based auditing method that leverages the UMLS knowledge to systematically identify missing hierarchical IS-A relations in the source terminologies. Materials and Methods Given a concept name in the UMLS, we first identify its base and secondary noun chunks. For each identified noun chunk, we generate replacement candidates that are more general than the noun chunk. Then, we replace the noun chunks with their replacement candidates to generate new potential concept names that may serve as supertypes of the original concept. If a newly generated name is an existing concept name in the same source terminology with the original concept, then a potentially missing IS-A relation between the original and the new concept is identified. Results Applying our transformation-based method to English-language concept names in the UMLS (2019AB release), a total of 39 359 potentially missing IS-A relations were detected in 13 source terminologies. Domain experts evaluated a random sample of 200 potentially missing IS-A relations identified in the SNOMED CT (U.S. edition) and 100 in Gene Ontology. A total of 173 of 200 and 63 of 100 potentially missing IS-A relations were confirmed by domain experts, indicating that our method achieved a precision of 86.5% and 63% for the SNOMED CT and Gene Ontology, respectively. Conclusions Our results showed that our transformation-based method is effective in identifying missing IS-A relations in the UMLS source terminologies.


2012 ◽  
Vol 51 (06) ◽  
pp. 529-538 ◽  
Author(s):  
K. Rosenbeck Gøeg ◽  
A. Randorff Højen

SummaryClinical practice as well as research and quality-assurance benefit from unambiguous clinical information resulting from the use of a common terminology like the Systematized Nomenclature of Medicine – Clinical Terms (SNOMED CT). A common terminology is a necessity to enable consistent reuse of data, and supporting semantic interoperability. Managing use of terminology for large cross specialty Electronic Health Record systems (EHR systems) or just beyond the level of single EHR systems requires that mappings are kept consistent. The objective of this study is to provide a clear methodology for SNOMED CT mapping to enhance applicability of SNOMED CT despite incompleteness and redundancy. Such mapping guidelines are presented based on an in depth analysis of 14 different EHR templates retrieved from five Danish and Swedish EHR systems. Each mapping is assessed against defined quality criteria and mapping guidelines are specified. Future work will include guideline validation.


2021 ◽  
Vol 27 (1) ◽  
pp. 146045822198939
Author(s):  
Euisung Jung ◽  
Hemant Jain ◽  
Atish P Sinha ◽  
Carmelo Gaudioso

A natural language processing (NLP) application requires sophisticated lexical resources to support its processing goals. Different solutions, such as dictionary lookup and MetaMap, have been proposed in the healthcare informatics literature to identify disease terms with more than one word (multi-gram disease named entities). Although a lot of work has been done in the identification of protein- and gene-named entities in the biomedical field, not much research has been done on the recognition and resolution of terminologies in the clinical trial subject eligibility analysis. In this study, we develop a specialized lexicon for improving NLP and text mining analysis in the breast cancer domain, and evaluate it by comparing it with the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT). We use a hybrid methodology, which combines the knowledge of domain experts, terms from multiple online dictionaries, and the mining of text from sample clinical trials. Use of our methodology introduces 4243 unique lexicon items, which increase bigram entity match by 38.6% and trigram entity match by 41%. Our lexicon, which adds a significant number of new terms, is very useful for matching patients to clinical trials automatically based on eligibility matching. Beyond clinical trial matching, the specialized lexicon developed in this study could serve as a foundation for future healthcare text mining applications.


Author(s):  
AH Mirza ◽  
L McClelland ◽  
M Bentley ◽  
S Mazengarb ◽  
NS Jones

Clinical governance encompasses quality assurance, measures to ensure self-development, comparing standards and learning from errors and suboptimal results. The management of risk poses a significant challenge in itself, as to reduce it to zero would require practice coming to a standstill. The implementation of structures to provide a safe environment for the patients we treat remains one of the greatest challenges faced by healthcare organisations today. Additionally, the potential litigious outcome provides an added conspicuous incentive to not only continuously review and address adverse events but to ensure that our patients remain free from harm.


Sign in / Sign up

Export Citation Format

Share Document