scholarly journals Phenotypically Similar Rare Disease Identification from an Integrative Knowledge Graph for Data Harmonization: Preliminary Study

10.2196/18395 ◽  
2020 ◽  
Vol 8 (10) ◽  
pp. e18395
Author(s):  
Qian Zhu ◽  
Dac-Trung Nguyen ◽  
Gioconda Alyea ◽  
Karen Hanson ◽  
Eric Sid ◽  
...  

Background Although many efforts have been made to develop comprehensive disease resources that capture rare disease information for the purpose of clinical decision making and education, there is no standardized protocol for defining and harmonizing rare diseases across multiple resources. This introduces data redundancy and inconsistency that may ultimately increase confusion and difficulty for the wide use of these resources. To overcome such encumbrances, we report our preliminary study to identify phenotypical similarity among genetic and rare diseases (GARD) that are presenting similar clinical manifestations, and support further data harmonization. Objective To support rare disease data harmonization, we aim to systematically identify phenotypically similar GARD diseases from a disease-oriented integrative knowledge graph and determine their similarity types. Methods We identified phenotypically similar GARD diseases programmatically with 2 methods: (1) We measured disease similarity by comparing disease mappings between GARD and other rare disease resources, incorporating manual assessment; 2) we derived clinical manifestations presenting among sibling diseases from disease classifications and prioritized the identified similar diseases based on their phenotypes and genotypes. Results For disease similarity comparison, approximately 87% (341/392) identified, phenotypically similar disease pairs were validated; 80% (271/392) of these disease pairs were accurately identified as phenotypically similar based on similarity score. The evaluation result shows a high precision (94%) and a satisfactory quality (86% F measure). By deriving phenotypical similarity from Monarch Disease Ontology (MONDO) and Orphanet disease classification trees, we identified a total of 360 disease pairs with at least 1 shared clinical phenotype and gene, which were applied for prioritizing clinical relevance. A total of 662 phenotypically similar disease pairs were identified and will be applied for GARD data harmonization. Conclusions We successfully identified phenotypically similar rare diseases among the GARD diseases via 2 approaches, disease mapping comparison and phenotypical similarity derivation from disease classification systems. The results will not only direct GARD data harmonization in expanding translational science research but will also accelerate data transparency and consistency across different disease resources and terminologies, helping to build a robust and up-to-date knowledge resource on rare diseases.

2020 ◽  
Author(s):  
Qian Zhu ◽  
Dac-Trung Nguyen ◽  
Gioconda Alyea ◽  
Karen Hanson ◽  
Eric Sid ◽  
...  

BACKGROUND Although many efforts have been made to develop comprehensive disease resources that capture rare disease information for the purpose of clinical decision making and education, there is no standardized protocol for defining and harmonizing rare diseases across multiple resources. This introduces data redundancy and inconsistency that may ultimately increase confusion and difficulty for the wide use of these resources. To overcome such encumbrances, we report our preliminary study to identify phenotypical similarity among genetic and rare diseases (GARD) that are presenting similar clinical manifestations, and support further data harmonization. OBJECTIVE To support rare disease data harmonization, we aim to systematically identify phenotypically similar GARD diseases from a disease-oriented integrative knowledge graph and determine their similarity types. METHODS We identified phenotypically similar GARD diseases programmatically with 2 methods: (1) We measured disease similarity by comparing disease mappings between GARD and other rare disease resources, incorporating manual assessment; 2) we derived clinical manifestations presenting among sibling diseases from disease classifications and prioritized the identified similar diseases based on their phenotypes and genotypes. RESULTS For disease similarity comparison, approximately 87% (341/392) identified, phenotypically similar disease pairs were validated; 80% (271/392) of these disease pairs were accurately identified as phenotypically similar based on similarity score. The evaluation result shows a high precision (94%) and a satisfactory quality (86% F measure). By deriving phenotypical similarity from Monarch Disease Ontology (MONDO) and Orphanet disease classification trees, we identified a total of 360 disease pairs with at least 1 shared clinical phenotype and gene, which were applied for prioritizing clinical relevance. A total of 662 phenotypically similar disease pairs were identified and will be applied for GARD data harmonization. CONCLUSIONS We successfully identified phenotypically similar rare diseases among the GARD diseases via 2 approaches, disease mapping comparison and phenotypical similarity derivation from disease classification systems. The results will not only direct GARD data harmonization in expanding translational science research but will also accelerate data transparency and consistency across different disease resources and terminologies, helping to build a robust and up-to-date knowledge resource on rare diseases.


Author(s):  
Xuedong Li ◽  
Yue Wang ◽  
Dongwu Wang ◽  
Walter Yuan ◽  
Dezhong Peng ◽  
...  

Abstract Background Accurately recognizing rare diseases based on symptom description is an important task in patient triage, early risk stratification, and target therapies. However, due to the very nature of rare diseases, the lack of historical data poses a great challenge to machine learning-based approaches. On the other hand, medical knowledge in automatically constructed knowledge graphs (KGs) has the potential to compensate the lack of labeled training examples. This work aims to develop a rare disease classification algorithm that makes effective use of a knowledge graph, even when the graph is imperfect. Method We develop a text classification algorithm that represents a document as a combination of a “bag of words” and a “bag of knowledge terms,” where a “knowledge term” is a term shared between the document and the subgraph of KG relevant to the disease classification task. We use two Chinese disease diagnosis corpora to evaluate the algorithm. The first one, HaoDaiFu, contains 51,374 chief complaints categorized into 805 diseases. The second data set, ChinaRe, contains 86,663 patient descriptions categorized into 44 disease categories. Results On the two evaluation data sets, the proposed algorithm delivers robust performance and outperforms a wide range of baselines, including resampling, deep learning, and feature selection approaches. Both classification-based metric (macro-averaged F1 score) and ranking-based metric (mean reciprocal rank) are used in evaluation. Conclusion Medical knowledge in large-scale knowledge graphs can be effectively leveraged to improve rare diseases classification models, even when the knowledge graph is incomplete.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Qian Zhu ◽  
Ðắc-Trung Nguyễn ◽  
Timothy Sheils ◽  
Gioconda Alyea ◽  
Eric Sid ◽  
...  

Abstract Background Limited knowledge and unclear underlying biology of many rare diseases pose significant challenges to patients, clinicians, and scientists. To address these challenges, there is an urgent need to inspire and encourage scientists to propose and pursue innovative research studies that aim to uncover the genetic and molecular causes of more rare diseases and ultimately to identify effective therapeutic solutions. A clear understanding of current research efforts, knowledge/research gaps, and funding patterns as scientific evidence is crucial to systematically accelerate the pace of research discovery in rare diseases, which is an overarching goal of this study. Methods To semantically represent NIH funding data for rare diseases and advance its use of effectively promoting rare disease research, we identified NIH funded projects for rare diseases by mapping GARD diseases to the project based on project titles; subsequently we presented and managed those identified projects in a knowledge graph using Neo4j software, hosted at NCATS, based on a pre-defined data model that captures semantics among the data. With this developed knowledge graph, we were able to perform several case studies to demonstrate scientific evidence generation for supporting rare disease research discovery. Results Of 5001 rare diseases belonging to 32 distinct disease categories, we identified 1294 diseases that are mapped to 45,647 distinct, NIH-funded projects obtained from the NIH ExPORTER by implementing semantic annotation of project titles. To capture semantic relationships presenting amongst mapped research funding data, we defined a data model comprised of seven primary classes and corresponding object and data properties. A Neo4j knowledge graph based on this predefined data model has been developed, and we performed multiple case studies over this knowledge graph to demonstrate its use in directing and promoting rare disease research. Conclusion We developed an integrative knowledge graph with rare disease funding data and demonstrated its use as a source from where we can effectively identify and generate scientific evidence to support rare disease research. With the success of this preliminary study, we plan to implement advanced computational approaches for analyzing more funding related data, e.g., project abstracts and PubMed article abstracts, and linking to other types of biomedical data to perform more sophisticated research gap analysis and identify opportunities for future research in rare diseases.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Mercedes Guilabert ◽  
Alba Martínez-García ◽  
Marina Sala-González ◽  
Olga Solas ◽  
José Joaquín Mira

Abstract Objective To measure the experience of the person having a rare disease in order to identify objectives for optimal care in the health care received by these patients. Methods. A cross-sectional study was conducted in Spain involving patients associated with the Spanish Rare Diseases Federation [Federación Española de Enfermedades Raras] (FEDER). A modified version of the PREM IEXPAC [Instrumento para evaluar la Experiencia del Paciente Crónico] instrument was used (IEXPAC-rare-diseases). Scores ranged between 0 (worst experience) and 10 (best experience). Results A total of 261 caregivers (in the case of paediatric population) and patients with rare diseases (response rate 54.4%) replied. 232 (88.9%) were adult patients and 29 (11.1%) caregivers of minor patients. Most males, 227 (87%), with an average age of 38 (SD 13.6) years. The mean time since confirmation of diagnosis was 7.8 (SD 8.0) years. The score in this PREM was 3.5 points out to 10 (95%CI 3.2–3.8, SD 2.0). Caregivers of paediatric patients scored higher, except for coordination of social and healthcare services. Conclusions There are wide and important areas for improvement in the care of patients with rare diseases. This study involves a first assesment of the experience of patients with rare diseases in Spain.


Author(s):  
Qian Zhu ◽  
Dac-Trung Nguyen ◽  
Eric Sid ◽  
Anne Pariser

Abstract Objective In this study, we aimed to evaluate the capability of the Unified Medical Language System (UMLS) as one data standard to support data normalization and harmonization of datasets that have been developed for rare diseases. Through analysis of data mappings between multiple rare disease resources and the UMLS, we propose suggested extensions of the UMLS that will enable its adoption as a global standard in rare disease. Methods We analyzed data mappings between the UMLS and existing datasets on over 7,000 rare diseases that were retrieved from four publicly accessible resources: Genetic And Rare Diseases Information Center (GARD), Orphanet, Online Mendelian Inheritance in Men (OMIM), and the Monarch Disease Ontology (MONDO). Two types of disease mappings were assessed, (1) curated mappings extracted from those four resources; and (2) established mappings generated by querying the rare disease-based integrative knowledge graph developed in the previous study. Results We found that 100% of OMIM concepts, and over 50% of concepts from GARD, MONDO, and Orphanet were normalized by the UMLS and accurately categorized into the appropriate UMLS semantic groups. We analyzed 58,636 UMLS mappings, which resulted in 3,876 UMLS concepts across these resources. Manual evaluation of a random set of 500 UMLS mappings demonstrated a high level of accuracy (99%) of developing those mappings, which consisted of 414 mappings of synonyms (82.8%), 76 are subtypes (15.2%), and five are siblings (1%). Conclusion The mapping results illustrated in this study that the UMLS was able to accurately represent rare disease concepts, and their associated information, such as genes and phenotypes, and can effectively be used to support data harmonization across existing resources developed on collecting rare disease data. We recommend the adoption of the UMLS as a data standard for rare disease to enable the existing rare disease datasets to support future applications in a clinical and community settings.


2019 ◽  
Vol 51 (01) ◽  
pp. 049-052
Author(s):  
Benedikt Hofmeister ◽  
Celina von Stülpnagel ◽  
Steffen Berweck ◽  
Angela Abicht ◽  
Gerhard Kluger ◽  
...  

AbstractNicolaides–Baraitser syndrome (NCBRS) is a rare disease caused by a mutation in the SMARCA2 gene. Clinical features include craniofacial dysmorphia and abnormalities of the limbs, as well as intellectual disorder and often epilepsy. Hepatotoxicity is a rare complication of the therapy with valproic acid (VPA) and a mutation of the polymerase γ (POLG) might lead to a higher sensitivity for liver hepatotoxicity. We present a patient with the coincidence of two rare diseases, the NCBRS and additionally a POLG1 mutation in combination with a liver hepatotoxicity. The co-occurrence in children for two different genetic diseases is discussed with the help of literature review.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Friederike Ehrhart ◽  
Egon L. Willighagen ◽  
Martina Kutmon ◽  
Max van Hoften ◽  
Leopold M. G. Curfs ◽  
...  

AbstractHere, we describe a dataset with information about monogenic, rare diseases with a known genetic background, supplemented with manually extracted provenance for the disease itself and the discovery of the underlying genetic cause. We assembled a collection of 4166 rare monogenic diseases and linked them to 3163 causative genes, annotated with OMIM and Ensembl identifiers and HGNC symbols. The PubMed identifiers of the scientific publications, which for the first time described the rare diseases, and the publications, which found the genes causing the diseases were added using information from OMIM, PubMed, Wikipedia, whonamedit.com, and Google Scholar. The data are available under CC0 license as spreadsheet and as RDF in a semantic model modified from DisGeNET, and was added to Wikidata. This dataset relies on publicly available data and publications with a PubMed identifier, but by our effort to make the data interoperable and linked, we can now analyse this data. Our analysis revealed the timeline of rare disease and causative gene discovery and links them to developments in methods.


2021 ◽  
Vol 16 ◽  
Author(s):  
Erica Winter ◽  
Scott Schliebner

: Characterized by small, highly heterogeneous patient populations, rare disease trials magnify the challenges often encountered in traditional clinical trials. In recent years, there have been increased efforts by stakeholders to improve drug development in rare diseases through novel approaches to clinical trial designs and statistical analyses. We highlight and discuss some of the current and emerging approaches aimed at overcoming challenges in rare disease clinical trials, with a focus on the ultimate stakeholder, the patient.


2021 ◽  
Vol 14 (10) ◽  
pp. e244916
Author(s):  
Saranya B Gomathy ◽  
Animesh Das ◽  
Awadh Kishor Pandit ◽  
Achal Kumar Srivastava

Wunderlich syndrome is a rare condition characterised by acute spontaneous non-traumatic renal haemorrhage into the subcapsular and perirenal spaces. Our case of anti-GAD65-associated autoimmune encephalitis (AE), aged 30 years, developed this complication following use of enoxaparin and was managed by selective glue embolisation of subsegmental branches of right renal cortical arteries. Our case had opsoclonus as one of the clinical manifestations, which has till now been described in only two patients of this AE. This patient received all forms of induction therapies (steroids, plasmapheresis, intravenous immunoglobulin and rituximab) following which she had good improvement in her clinical condition. The good response to immunotherapy is also a point of discussion as this has been rarely associated with anti-GAD65 AE.


Sign in / Sign up

Export Citation Format

Share Document