Scalable quality assurance for large SNOMED CT hierarchies using subject-based subtaxonomies

Abstract Objective Standards terminologies may be large and complex, making their quality assurance challenging. Some terminology quality assurance (TQA) methodologies are based on abstraction networks (AbNs), compact terminology summaries. We have tested AbNs and the performance of related TQA methodologies on small terminology hierarchies. However, some standards terminologies, for example, SNOMED, are composed of very large hierarchies. Scaling AbN TQA techniques to such hierarchies poses a significant challenge. We present a scalable subject-based approach for AbN TQA. Methods An innovative technique is presented for scaling TQA by creating a new kind of subject-based AbN called a subtaxonomy for large hierarchies. New hypotheses about concentrations of erroneous concepts within the AbN are introduced to guide scalable TQA. Results We test the TQA methodology for a subject-based subtaxonomy for the Bleeding subhierarchy in SNOMED's large Clinical finding hierarchy. To test the error concentration hypotheses, three domain experts reviewed a sample of 300 concepts. A consensus-based evaluation identified 87 erroneous concepts. The subtaxonomy-based TQA methodology was shown to uncover statistically significantly more erroneous concepts when compared to a control sample. Discussion The scalability of TQA methodologies is a challenge for large standards systems like SNOMED. We demonstrated innovative subject-based TQA techniques by identifying groups of concepts with a higher likelihood of having errors within the subtaxonomy. Scalability is achieved by reviewing a large hierarchy by subject. Conclusions An innovative methodology for scaling the derivation of AbNs and a TQA methodology was shown to perform successfully for the largest hierarchy of SNOMED.

Download Full-text

Quality Assurance of UMLS Semantic Type Assignments Using SNOMED CT Hierarchies

Methods of Information in Medicine ◽

10.3414/me14-01-0104 ◽

2016 ◽

Vol 55 (02) ◽

pp. 158-165 ◽

Cited By ~ 10

Author(s):

Y. Chen ◽

Z. He ◽

M. Halper ◽

L. Chen ◽

H. Gu

Keyword(s):

Quality Assurance ◽

Error Rate ◽

Semantic Network ◽

Snomed Ct ◽

Semantic Type ◽

Domain Experts ◽

Unified Medical Language System ◽

Type Assignment ◽

Semantic Types ◽

High Level

SummaryBackground: The Unified Medical Language System (UMLS) is one of the largest biomedical terminological systems, with over 2.5 million concepts in its Metathesaurus repository. The UMLS’s Semantic Network (SN) with its collection of 133 high-level semantic types serves as an abstraction layer on top of the Metathesaurus. In particular, the SN elaborates an aspect of the Metathesaurus’s concepts via the assignment of one or more types to each concept. Due to the scope and complexity of the Metathesaurus, errors are all but inevitable in this semantic-type assignment process.Objectives: To develop a semi-automated methodology to help assure the quality of semantic-type assignments within the UMLS.Methods: The methodology uses a cross- validation strategy involving SNOMED CT’s hierarchies in combination with UMLS se -mantic types. Semantically uniform, disjoint concept groups are generated programmatically by partitioning the collection of all concepts in the same SNOMED CT hierarchy according to their respective semantic-type assignments in the UMLS. Domain experts are then called upon to review the concepts in any group having a small number of concepts. It is our hypothesis that a semantic-type assignment combination applicable only to a very small number of concepts in a SNOMED CT hierarchy is an indicator of potential problems.Results: The methodology was applied to the UMLS 2013AA release along with the SNOMED CT from January 2013. An overall error rate of 33% was found for concepts proposed by the quality-assurance methodology. Supporting our hypothesis, that number was four times higher than the error rate found in control samples.Conclusion: The results show that the quality-assurance methodology can aid in effective and efficient identification of UMLS semantic-type assignment errors.

Download Full-text

Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocw175 ◽

2017 ◽

Vol 24 (4) ◽

pp. 788-798 ◽

Cited By ~ 20

Author(s):

Licong Cui ◽

Wei Zhu ◽

Shiqiang Tao ◽

James T Case ◽

Olivier Bodenreider ◽

...

Keyword(s):

Quality Assurance ◽

Random Sample ◽

Visual Inspection ◽

Snomed Ct ◽

Structural Part ◽

Domain Experts ◽

Frequent Type ◽

Hierarchical Relations ◽

Objective Quality

Abstract Objective: Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of missing hierarchical relations and concepts in SNOMED CT. Material and Methods: All non-lattice subgraphs (the structural part) in SNOMED CT are exhaustively extracted using a scalable MapReduce algorithm. Four lexical patterns (the lexical part) are identified among the extracted non-lattice subgraphs. Non-lattice subgraphs exhibiting such lexical patterns are often indicative of missing hierarchical relations or concepts. Each lexical pattern is associated with a potential specific type of error. Results: Applying the structural-lexical method to SNOMED CT (September 2015 US edition), we found 6801 non-lattice subgraphs that matched these lexical patterns, of which 2046 were amenable to visual inspection. We evaluated a random sample of 100 small subgraphs, of which 59 were reviewed in detail by domain experts. All the subgraphs reviewed contained errors confirmed by the experts. The most frequent type of error was missing is-a relations due to incomplete or inconsistent modeling of the concepts. Conclusions: Our hybrid structural-lexical method is innovative and proved effective not only in detecting errors in SNOMED CT, but also in suggesting remediation for these errors.

Download Full-text

Quality assurance and enrichment of biological and biomedical ontologies and terminologies

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01342-4 ◽

2020 ◽

Vol 20 (S10) ◽

Author(s):

Ankur Agrawal ◽

Licong Cui

Keyword(s):

Quality Assurance ◽

Cancer Registries ◽

Supplement Issue ◽

Biomedical Ontologies ◽

Snomed Ct ◽

Unified Medical Language System ◽

Domain Specific ◽

Healthcare Settings ◽

Domain Specific Knowledge ◽

Enrichment Techniques

AbstractBiological and biomedical ontologies and terminologies are used to organize and store various domain-specific knowledge to provide standardization of terminology usage and to improve interoperability. The growing number of such ontologies and terminologies and their increasing adoption in clinical, research and healthcare settings call for effective and efficient quality assurance and semantic enrichment techniques of these ontologies and terminologies. In this editorial, we provide an introductory summary of nine articles included in this supplement issue for quality assurance and enrichment of biological and biomedical ontologies and terminologies. The articles cover a range of standards including SNOMED CT, National Cancer Institute Thesaurus, Unified Medical Language System, North American Association of Central Cancer Registries and OBO Foundry Ontologies.

Download Full-text

A Context-based Crowd Sourcing Tool for Quality Assurance of SNOMED CT

10.1109/bibm52615.2021.9669688 ◽

2021 ◽

Author(s):

Kashifuddin Qazi ◽

Ankur Agrawal

Keyword(s):

Quality Assurance ◽

Crowd Sourcing ◽

Snomed Ct

Download Full-text

Lexically suggest, logically define: Quality assurance of the use of qualifiers and expected results of post-coordination in SNOMED CT

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2011.10.002 ◽

2012 ◽

Vol 45 (2) ◽

pp. 199-209 ◽

Cited By ~ 27

Author(s):

Alan Rector ◽

Luigi Iannone

Keyword(s):

Quality Assurance ◽

Snomed Ct

Download Full-text

A tribal abstraction network for SNOMED CT target hierarchies without attribute relationships

Journal of the American Medical Informatics Association ◽

10.1136/amiajnl-2014-003173 ◽

2014 ◽

Vol 22 (3) ◽

pp. 628-639 ◽

Cited By ~ 20

Author(s):

Christopher Ochs ◽

James Geller ◽

Yehoshua Perl ◽

Yan Chen ◽

Ankur Agrawal ◽

...

Keyword(s):

Quality Assurance ◽

Description Logic ◽

Snomed Ct ◽

Hierarchical Relationships ◽

Similar Description ◽

Systematized Nomenclature Of Medicine

Abstract Objective Large and complex terminologies, such as Systematized Nomenclature of Medicine–Clinical Terms (SNOMED CT), are prone to errors and inconsistencies. Abstraction networks are compact summarizations of the content and structure of a terminology. Abstraction networks have been shown to support terminology quality assurance. In this paper, we introduce an abstraction network derivation methodology which can be applied to SNOMED CT target hierarchies whose classes are defined using only hierarchical relationships (ie, without attribute relationships) and similar description-logic-based terminologies. Methods We introduce the tribal abstraction network (TAN), based on the notion of a tribe—a subhierarchy rooted at a child of a hierarchy root, assuming only the existence of concepts with multiple parents. The TAN summarizes a hierarchy that does not have attribute relationships using sets of concepts, called tribal units that belong to exactly the same multiple tribes. Tribal units are further divided into refined tribal units which contain closely related concepts. A quality assurance methodology that utilizes TAN summarizations is introduced. Results A TAN is derived for the Observable entity hierarchy of SNOMED CT, summarizing its content. A TAN-based quality assurance review of the concepts of the hierarchy is performed, and erroneous concepts are shown to appear more frequently in large refined tribal units than in small refined tribal units. Furthermore, more erroneous concepts appear in large refined tribal units of more tribes than of fewer tribes. Conclusions In this paper we introduce the TAN for summarizing SNOMED CT target hierarchies. A TAN was derived for the Observable entity hierarchy of SNOMED CT. A quality assurance methodology utilizing the TAN was introduced and demonstrated.

Download Full-text

A transformation-based method for auditing the IS-A hierarchy of biomedical terminologies in the Unified Medical Language System

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa123 ◽

2020 ◽

Vol 27 (10) ◽

pp. 1568-1575 ◽

Cited By ~ 1

Author(s):

Fengbo Zheng ◽

Jay Shi ◽

Yuntao Yang ◽

W Jim Zheng ◽

Licong Cui

Keyword(s):

Gene Ontology ◽

Information Systems ◽

Random Sample ◽

English Language ◽

Snomed Ct ◽

Domain Experts ◽

Language System ◽

Unified Medical Language System ◽

Medical Language ◽

Original Concept

Abstract Objective The Unified Medical Language System (UMLS) integrates various source terminologies to support interoperability between biomedical information systems. In this article, we introduce a novel transformation-based auditing method that leverages the UMLS knowledge to systematically identify missing hierarchical IS-A relations in the source terminologies. Materials and Methods Given a concept name in the UMLS, we first identify its base and secondary noun chunks. For each identified noun chunk, we generate replacement candidates that are more general than the noun chunk. Then, we replace the noun chunks with their replacement candidates to generate new potential concept names that may serve as supertypes of the original concept. If a newly generated name is an existing concept name in the same source terminology with the original concept, then a potentially missing IS-A relation between the original and the new concept is identified. Results Applying our transformation-based method to English-language concept names in the UMLS (2019AB release), a total of 39 359 potentially missing IS-A relations were detected in 13 source terminologies. Domain experts evaluated a random sample of 200 potentially missing IS-A relations identified in the SNOMED CT (U.S. edition) and 100 in Gene Ontology. A total of 173 of 200 and 63 of 100 potentially missing IS-A relations were confirmed by domain experts, indicating that our method achieved a precision of 86.5% and 63% for the SNOMED CT and Gene Ontology, respectively. Conclusions Our results showed that our transformation-based method is effective in identifying missing IS-A relations in the UMLS source terminologies.

Download Full-text

SNOMED CT Implementation

Methods of Information in Medicine ◽

10.3414/me11-02-0023 ◽

2012 ◽

Vol 51 (06) ◽

pp. 529-538 ◽

Cited By ~ 17

Author(s):

K. Rosenbeck Gøeg ◽

A. Randorff Højen

Keyword(s):

Quality Assurance ◽

Clinical Practice ◽

Electronic Health Record ◽

Quality Criteria ◽

Clinical Information ◽

Health Record ◽

Snomed Ct ◽

Depth Analysis ◽

Systematized Nomenclature Of Medicine ◽

Future Work

SummaryClinical practice as well as research and quality-assurance benefit from unambiguous clinical information resulting from the use of a common terminology like the Systematized Nomenclature of Medicine – Clinical Terms (SNOMED CT). A common terminology is a necessity to enable consistent reuse of data, and supporting semantic interoperability. Managing use of terminology for large cross specialty Electronic Health Record systems (EHR systems) or just beyond the level of single EHR systems requires that mappings are kept consistent. The objective of this study is to provide a clear methodology for SNOMED CT mapping to enhance applicability of SNOMED CT despite incompleteness and redundancy. Such mapping guidelines are presented based on an in depth analysis of 14 different EHR templates retrieved from five Danish and Swedish EHR systems. Each mapping is assessed against defined quality criteria and mapping guidelines are specified. Future work will include guideline validation.

Download Full-text

Building a specialized lexicon for breast cancer clinical trial subject eligibility analysis

Health Informatics Journal ◽

10.1177/1460458221989392 ◽

2021 ◽

Vol 27 (1) ◽

pp. 146045822198939

Author(s):

Euisung Jung ◽

Hemant Jain ◽

Atish P Sinha ◽

Carmelo Gaudioso

Keyword(s):

Breast Cancer ◽

Clinical Trial ◽

Clinical Trials ◽

Text Mining ◽

Language Processing ◽

Snomed Ct ◽

Lexical Resources ◽

Named Entities ◽

Domain Experts ◽

Trial Subject

A natural language processing (NLP) application requires sophisticated lexical resources to support its processing goals. Different solutions, such as dictionary lookup and MetaMap, have been proposed in the healthcare informatics literature to identify disease terms with more than one word (multi-gram disease named entities). Although a lot of work has been done in the identification of protein- and gene-named entities in the biomedical field, not much research has been done on the recognition and resolution of terminologies in the clinical trial subject eligibility analysis. In this study, we develop a specialized lexicon for improving NLP and text mining analysis in the breast cancer domain, and evaluate it by comparing it with the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT). We use a hybrid methodology, which combines the knowledge of domain experts, terms from multiple online dictionaries, and the mining of text from sample clinical trials. Use of our methodology introduces 4243 unique lexicon items, which increase bigram entity match by 38.6% and trigram entity match by 41%. Our lexicon, which adds a significant number of new terms, is very useful for matching patients to clinical trials automatically based on eligibility matching. Beyond clinical trial matching, the specialized lexicon developed in this study could serve as a foundation for future healthcare text mining applications.

Download Full-text

Clinical Governance in Action

Bulletin of The Royal College of Surgeons of England ◽

10.1308/147363513x13500508918458 ◽

2013 ◽

Vol 95 (1) ◽

pp. 1-4 ◽

Cited By ~ 1

Author(s):

AH Mirza ◽

L McClelland ◽

M Bentley ◽

S Mazengarb ◽

NS Jones

Keyword(s):

Quality Assurance ◽

Adverse Events ◽

Clinical Governance ◽

Learning From Errors ◽

Safe Environment ◽

Significant Challenge ◽

Self Development ◽

Management Of Risk

Clinical governance encompasses quality assurance, measures to ensure self-development, comparing standards and learning from errors and suboptimal results. The management of risk poses a significant challenge in itself, as to reduce it to zero would require practice coming to a standstill. The implementation of structures to provide a safe environment for the patients we treat remains one of the greatest challenges faced by healthcare organisations today. Additionally, the potential litigious outcome provides an added conspicuous incentive to not only continuously review and address adverse events but to ensure that our patients remain free from harm.

Download Full-text