scholarly journals Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT

2017 ◽  
Vol 24 (4) ◽  
pp. 788-798 ◽  
Author(s):  
Licong Cui ◽  
Wei Zhu ◽  
Shiqiang Tao ◽  
James T Case ◽  
Olivier Bodenreider ◽  
...  

Abstract Objective: Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of missing hierarchical relations and concepts in SNOMED CT. Material and Methods: All non-lattice subgraphs (the structural part) in SNOMED CT are exhaustively extracted using a scalable MapReduce algorithm. Four lexical patterns (the lexical part) are identified among the extracted non-lattice subgraphs. Non-lattice subgraphs exhibiting such lexical patterns are often indicative of missing hierarchical relations or concepts. Each lexical pattern is associated with a potential specific type of error. Results: Applying the structural-lexical method to SNOMED CT (September 2015 US edition), we found 6801 non-lattice subgraphs that matched these lexical patterns, of which 2046 were amenable to visual inspection. We evaluated a random sample of 100 small subgraphs, of which 59 were reviewed in detail by domain experts. All the subgraphs reviewed contained errors confirmed by the experts. The most frequent type of error was missing is-a relations due to incomplete or inconsistent modeling of the concepts. Conclusions: Our hybrid structural-lexical method is innovative and proved effective not only in detecting errors in SNOMED CT, but also in suggesting remediation for these errors.

2016 ◽  
Vol 55 (02) ◽  
pp. 158-165 ◽  
Author(s):  
Y. Chen ◽  
Z. He ◽  
M. Halper ◽  
L. Chen ◽  
H. Gu

SummaryBackground: The Unified Medical Language System (UMLS) is one of the largest biomedical terminological systems, with over 2.5 million concepts in its Metathesaurus repository. The UMLS’s Semantic Network (SN) with its collection of 133 high-level semantic types serves as an abstraction layer on top of the Metathesaurus. In particular, the SN elaborates an aspect of the Metathesaurus’s concepts via the assignment of one or more types to each concept. Due to the scope and complexity of the Metathesaurus, errors are all but inevitable in this semantic-type assignment process.Objectives: To develop a semi-automated methodology to help assure the quality of semantic-type assignments within the UMLS.Methods: The methodology uses a cross- validation strategy involving SNOMED CT’s hierarchies in combination with UMLS se -mantic types. Semantically uniform, disjoint concept groups are generated programmatically by partitioning the collection of all concepts in the same SNOMED CT hierarchy according to their respective semantic-type assignments in the UMLS. Domain experts are then called upon to review the concepts in any group having a small number of concepts. It is our hypothesis that a semantic-type assignment combination applicable only to a very small number of concepts in a SNOMED CT hierarchy is an indicator of potential problems.Results: The methodology was applied to the UMLS 2013AA release along with the SNOMED CT from January 2013. An overall error rate of 33% was found for concepts proposed by the quality-assurance methodology. Supporting our hypothesis, that number was four times higher than the error rate found in control samples.Conclusion: The results show that the quality-assurance methodology can aid in effective and efficient identification of UMLS semantic-type assignment errors.


2020 ◽  
Vol 27 (10) ◽  
pp. 1568-1575 ◽  
Author(s):  
Fengbo Zheng ◽  
Jay Shi ◽  
Yuntao Yang ◽  
W Jim Zheng ◽  
Licong Cui

Abstract Objective The Unified Medical Language System (UMLS) integrates various source terminologies to support interoperability between biomedical information systems. In this article, we introduce a novel transformation-based auditing method that leverages the UMLS knowledge to systematically identify missing hierarchical IS-A relations in the source terminologies. Materials and Methods Given a concept name in the UMLS, we first identify its base and secondary noun chunks. For each identified noun chunk, we generate replacement candidates that are more general than the noun chunk. Then, we replace the noun chunks with their replacement candidates to generate new potential concept names that may serve as supertypes of the original concept. If a newly generated name is an existing concept name in the same source terminology with the original concept, then a potentially missing IS-A relation between the original and the new concept is identified. Results Applying our transformation-based method to English-language concept names in the UMLS (2019AB release), a total of 39 359 potentially missing IS-A relations were detected in 13 source terminologies. Domain experts evaluated a random sample of 200 potentially missing IS-A relations identified in the SNOMED CT (U.S. edition) and 100 in Gene Ontology. A total of 173 of 200 and 63 of 100 potentially missing IS-A relations were confirmed by domain experts, indicating that our method achieved a precision of 86.5% and 63% for the SNOMED CT and Gene Ontology, respectively. Conclusions Our results showed that our transformation-based method is effective in identifying missing IS-A relations in the UMLS source terminologies.


2015 ◽  
Vol 22 (3) ◽  
pp. 507-518 ◽  
Author(s):  
Christopher Ochs ◽  
James Geller ◽  
Yehoshua Perl ◽  
Yan Chen ◽  
Junchuan Xu ◽  
...  

Abstract Objective Standards terminologies may be large and complex, making their quality assurance challenging. Some terminology quality assurance (TQA) methodologies are based on abstraction networks (AbNs), compact terminology summaries. We have tested AbNs and the performance of related TQA methodologies on small terminology hierarchies. However, some standards terminologies, for example, SNOMED, are composed of very large hierarchies. Scaling AbN TQA techniques to such hierarchies poses a significant challenge. We present a scalable subject-based approach for AbN TQA. Methods An innovative technique is presented for scaling TQA by creating a new kind of subject-based AbN called a subtaxonomy for large hierarchies. New hypotheses about concentrations of erroneous concepts within the AbN are introduced to guide scalable TQA. Results We test the TQA methodology for a subject-based subtaxonomy for the Bleeding subhierarchy in SNOMED's large Clinical finding hierarchy. To test the error concentration hypotheses, three domain experts reviewed a sample of 300 concepts. A consensus-based evaluation identified 87 erroneous concepts. The subtaxonomy-based TQA methodology was shown to uncover statistically significantly more erroneous concepts when compared to a control sample. Discussion The scalability of TQA methodologies is a challenge for large standards systems like SNOMED. We demonstrated innovative subject-based TQA techniques by identifying groups of concepts with a higher likelihood of having errors within the subtaxonomy. Scalability is achieved by reviewing a large hierarchy by subject. Conclusions An innovative methodology for scaling the derivation of AbNs and a TQA methodology was shown to perform successfully for the largest hierarchy of SNOMED.


2020 ◽  
Vol 20 (S10) ◽  
Author(s):  
Ankur Agrawal ◽  
Licong Cui

AbstractBiological and biomedical ontologies and terminologies are used to organize and store various domain-specific knowledge to provide standardization of terminology usage and to improve interoperability. The growing number of such ontologies and terminologies and their increasing adoption in clinical, research and healthcare settings call for effective and efficient quality assurance and semantic enrichment techniques of these ontologies and terminologies. In this editorial, we provide an introductory summary of nine articles included in this supplement issue for quality assurance and enrichment of biological and biomedical ontologies and terminologies. The articles cover a range of standards including SNOMED CT, National Cancer Institute Thesaurus, Unified Medical Language System, North American Association of Central Cancer Registries and OBO Foundry Ontologies.


1994 ◽  
Vol 102 (2) ◽  
pp. 182-187 ◽  
Author(s):  
Mark E. Sherman ◽  
Mark H. Schiffman ◽  
Attila T. Lorincz ◽  
M. Michele Manos ◽  
David R. Scott ◽  
...  

2014 ◽  
Vol 05 (01) ◽  
pp. 127-152 ◽  
Author(s):  
E. Sundvall ◽  
K.R. Gøeg ◽  
A.R. Højen

SummaryInconsistent use of SNOMED CT concepts may reduce comparability of information in health information systems. Terminology implementation should be approached by common strategies for navigating and selecting proper concepts. This study aims to explore ways of illustrating common pathways and ancestors of particular sets of concepts, to support consistent use of SNOMED CT and also assess potential applications for such visualizations.The open source prototype presented is an interactive web-based re-implementation of the terminology visualization tool TermViz that provides an overview of concepts and their hierarchical relations. It provides terminological features such as interactively rearranging graphs, fetching more concept nodes, highlighting least common parents and shared pathways in merged graphs etc.Four teams of three to four people used the prototype to complete a terminology mapping task and then, in focus group interviews, discussed the user experience and potential future tool usage. Potential purposes discussed included SNOMED CT search and training, consistent selection of concepts and content management.The evaluation indicated that the tool may be useful in many contexts especially if integrated with existing systems, and that the graph layout needs further tuning and development.Citation: Højen AR, Sundvall E, Gøeg KR. Methods and applications for visualization of SNOMED CT concept sets. Appl Clin Inf 2014; 5: 127–152http://dx.doi.org/10.4338/ACI-2013-09-RA-0071


Publications ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 17 ◽  
Author(s):  
Bo-Christer Björk ◽  
Sari Kanto-Karvonen ◽  
J. Tuomas Harviainen

Predatory journals are Open Access journals of highly questionable scientific quality. Such journals pretend to use peer review for quality assurance, and spam academics with requests for submissions, in order to collect author payments. In recent years predatory journals have received a lot of negative media. While much has been said about the harm that such journals cause to academic publishing in general, an overlooked aspect is how much articles in such journals are actually read and in particular cited, that is if they have any significant impact on the research in their fields. Other studies have already demonstrated that only some of the articles in predatory journals contain faulty and directly harmful results, while a lot of the articles present mediocre and poorly reported studies. We studied citation statistics over a five-year period in Google Scholar for 250 random articles published in such journals in 2014 and found an average of 2.6 citations per article, and that 56% of the articles had no citations at all. For comparison, a random sample of articles published in the approximately 25,000 peer reviewed journals included in the Scopus index had an average of 18, 1 citations in the same period with only 9% receiving no citations. We conclude that articles published in predatory journals have little scientific impact.


Sign in / Sign up

Export Citation Format

Share Document