Improving Rare Disease Classification Using Imperfect Knowledge Graph

Author(s):  
Xuedong Li ◽  
Yue Wang ◽  
Dongwu Wang ◽  
Walter Yuan ◽  
Dezhong Peng ◽  
...  
Author(s):  
Xuedong Li ◽  
Yue Wang ◽  
Dongwu Wang ◽  
Walter Yuan ◽  
Dezhong Peng ◽  
...  

Abstract Background Accurately recognizing rare diseases based on symptom description is an important task in patient triage, early risk stratification, and target therapies. However, due to the very nature of rare diseases, the lack of historical data poses a great challenge to machine learning-based approaches. On the other hand, medical knowledge in automatically constructed knowledge graphs (KGs) has the potential to compensate the lack of labeled training examples. This work aims to develop a rare disease classification algorithm that makes effective use of a knowledge graph, even when the graph is imperfect. Method We develop a text classification algorithm that represents a document as a combination of a “bag of words” and a “bag of knowledge terms,” where a “knowledge term” is a term shared between the document and the subgraph of KG relevant to the disease classification task. We use two Chinese disease diagnosis corpora to evaluate the algorithm. The first one, HaoDaiFu, contains 51,374 chief complaints categorized into 805 diseases. The second data set, ChinaRe, contains 86,663 patient descriptions categorized into 44 disease categories. Results On the two evaluation data sets, the proposed algorithm delivers robust performance and outperforms a wide range of baselines, including resampling, deep learning, and feature selection approaches. Both classification-based metric (macro-averaged F1 score) and ranking-based metric (mean reciprocal rank) are used in evaluation. Conclusion Medical knowledge in large-scale knowledge graphs can be effectively leveraged to improve rare diseases classification models, even when the knowledge graph is incomplete.


10.2196/18395 ◽  
2020 ◽  
Vol 8 (10) ◽  
pp. e18395
Author(s):  
Qian Zhu ◽  
Dac-Trung Nguyen ◽  
Gioconda Alyea ◽  
Karen Hanson ◽  
Eric Sid ◽  
...  

Background Although many efforts have been made to develop comprehensive disease resources that capture rare disease information for the purpose of clinical decision making and education, there is no standardized protocol for defining and harmonizing rare diseases across multiple resources. This introduces data redundancy and inconsistency that may ultimately increase confusion and difficulty for the wide use of these resources. To overcome such encumbrances, we report our preliminary study to identify phenotypical similarity among genetic and rare diseases (GARD) that are presenting similar clinical manifestations, and support further data harmonization. Objective To support rare disease data harmonization, we aim to systematically identify phenotypically similar GARD diseases from a disease-oriented integrative knowledge graph and determine their similarity types. Methods We identified phenotypically similar GARD diseases programmatically with 2 methods: (1) We measured disease similarity by comparing disease mappings between GARD and other rare disease resources, incorporating manual assessment; 2) we derived clinical manifestations presenting among sibling diseases from disease classifications and prioritized the identified similar diseases based on their phenotypes and genotypes. Results For disease similarity comparison, approximately 87% (341/392) identified, phenotypically similar disease pairs were validated; 80% (271/392) of these disease pairs were accurately identified as phenotypically similar based on similarity score. The evaluation result shows a high precision (94%) and a satisfactory quality (86% F measure). By deriving phenotypical similarity from Monarch Disease Ontology (MONDO) and Orphanet disease classification trees, we identified a total of 360 disease pairs with at least 1 shared clinical phenotype and gene, which were applied for prioritizing clinical relevance. A total of 662 phenotypically similar disease pairs were identified and will be applied for GARD data harmonization. Conclusions We successfully identified phenotypically similar rare diseases among the GARD diseases via 2 approaches, disease mapping comparison and phenotypical similarity derivation from disease classification systems. The results will not only direct GARD data harmonization in expanding translational science research but will also accelerate data transparency and consistency across different disease resources and terminologies, helping to build a robust and up-to-date knowledge resource on rare diseases.


2020 ◽  
Author(s):  
Qian Zhu ◽  
Dac-Trung Nguyen ◽  
Gioconda Alyea ◽  
Karen Hanson ◽  
Eric Sid ◽  
...  

BACKGROUND Although many efforts have been made to develop comprehensive disease resources that capture rare disease information for the purpose of clinical decision making and education, there is no standardized protocol for defining and harmonizing rare diseases across multiple resources. This introduces data redundancy and inconsistency that may ultimately increase confusion and difficulty for the wide use of these resources. To overcome such encumbrances, we report our preliminary study to identify phenotypical similarity among genetic and rare diseases (GARD) that are presenting similar clinical manifestations, and support further data harmonization. OBJECTIVE To support rare disease data harmonization, we aim to systematically identify phenotypically similar GARD diseases from a disease-oriented integrative knowledge graph and determine their similarity types. METHODS We identified phenotypically similar GARD diseases programmatically with 2 methods: (1) We measured disease similarity by comparing disease mappings between GARD and other rare disease resources, incorporating manual assessment; 2) we derived clinical manifestations presenting among sibling diseases from disease classifications and prioritized the identified similar diseases based on their phenotypes and genotypes. RESULTS For disease similarity comparison, approximately 87% (341/392) identified, phenotypically similar disease pairs were validated; 80% (271/392) of these disease pairs were accurately identified as phenotypically similar based on similarity score. The evaluation result shows a high precision (94%) and a satisfactory quality (86% F measure). By deriving phenotypical similarity from Monarch Disease Ontology (MONDO) and Orphanet disease classification trees, we identified a total of 360 disease pairs with at least 1 shared clinical phenotype and gene, which were applied for prioritizing clinical relevance. A total of 662 phenotypically similar disease pairs were identified and will be applied for GARD data harmonization. CONCLUSIONS We successfully identified phenotypically similar rare diseases among the GARD diseases via 2 approaches, disease mapping comparison and phenotypical similarity derivation from disease classification systems. The results will not only direct GARD data harmonization in expanding translational science research but will also accelerate data transparency and consistency across different disease resources and terminologies, helping to build a robust and up-to-date knowledge resource on rare diseases.


2020 ◽  
Vol 102 ◽  
pp. 534-548
Author(s):  
Zhenfeng Lei ◽  
Yuan Sun ◽  
Y.A. Nanehkaran ◽  
Shuangyuan Yang ◽  
Md. Saiful Islam ◽  
...  

Author(s):  

Methemoglobinemia is a rare disease classification related with congenital or acquired (usually iatrogenic) hemoglobin oxygenation disorder. Despite the fact that number of potentially methemoglobin forming agents is very long, methemoglobinemia is still a case-report of patients admitting to the Emergency Departments. The patient was brought after consuming resin hardener for suicidal purposes from the Center of lower reference to the hospital Emergency Department with suspected burns of the gastrointestinal tract. The patient presented cyanosis, dyspnoea and brown-colored urine on admission.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Qian Zhu ◽  
Ðắc-Trung Nguyễn ◽  
Timothy Sheils ◽  
Gioconda Alyea ◽  
Eric Sid ◽  
...  

Abstract Background Limited knowledge and unclear underlying biology of many rare diseases pose significant challenges to patients, clinicians, and scientists. To address these challenges, there is an urgent need to inspire and encourage scientists to propose and pursue innovative research studies that aim to uncover the genetic and molecular causes of more rare diseases and ultimately to identify effective therapeutic solutions. A clear understanding of current research efforts, knowledge/research gaps, and funding patterns as scientific evidence is crucial to systematically accelerate the pace of research discovery in rare diseases, which is an overarching goal of this study. Methods To semantically represent NIH funding data for rare diseases and advance its use of effectively promoting rare disease research, we identified NIH funded projects for rare diseases by mapping GARD diseases to the project based on project titles; subsequently we presented and managed those identified projects in a knowledge graph using Neo4j software, hosted at NCATS, based on a pre-defined data model that captures semantics among the data. With this developed knowledge graph, we were able to perform several case studies to demonstrate scientific evidence generation for supporting rare disease research discovery. Results Of 5001 rare diseases belonging to 32 distinct disease categories, we identified 1294 diseases that are mapped to 45,647 distinct, NIH-funded projects obtained from the NIH ExPORTER by implementing semantic annotation of project titles. To capture semantic relationships presenting amongst mapped research funding data, we defined a data model comprised of seven primary classes and corresponding object and data properties. A Neo4j knowledge graph based on this predefined data model has been developed, and we performed multiple case studies over this knowledge graph to demonstrate its use in directing and promoting rare disease research. Conclusion We developed an integrative knowledge graph with rare disease funding data and demonstrated its use as a source from where we can effectively identify and generate scientific evidence to support rare disease research. With the success of this preliminary study, we plan to implement advanced computational approaches for analyzing more funding related data, e.g., project abstracts and PubMed article abstracts, and linking to other types of biomedical data to perform more sophisticated research gap analysis and identify opportunities for future research in rare diseases.


2006 ◽  
Vol 12 ◽  
pp. 93-94
Author(s):  
Khurshid Ahmad Khan ◽  
Stephen A. Brietzke

Sign in / Sign up

Export Citation Format

Share Document