scholarly journals Identifying digenic disease genes using machine learning in the undiagnosed diseases network

2020 ◽  
Author(s):  
Souhrid Mukherjee ◽  
Joy D Cogan ◽  
John H Newman ◽  
John A Phillips ◽  
Rizwan Hamid ◽  
...  

ABSTRACTRare diseases affect hundreds of millions of people worldwide, and diagnosing their genetic causes is challenging. The Undiagnosed Diseases Network (UDN) was formed in 2014 to identify and treat novel rare genetic diseases, and despite many successes, more than half of UDN patients remain undiagnosed. The central hypothesis of this work is that many unsolved rare genetic disorders are caused by multiple variants in more than one gene. However, given the large number of variants in each individual genome, experimentally evaluating even just pairs of variants for potential to cause disease is currently infeasible. To address this challenge, we developed DiGePred, a random forest classifier for identifying candidate digenic disease gene pairs using features derived from biological networks, genomics, evolutionary history, and functional annotations. We trained the DiGePred classifier using DIDA, the largest available database of known digenic disease causing gene pairs, and several sets of non-digenic gene pairs, including variant pairs derived from unaffected relatives of UDN patients. DiGePred achieved high precision and recall in cross-validation and on a held out test set (PR area under the curve >77%), and we further demonstrate its utility using novel digenic pairs from the recent literature. In contrast to other approaches, DiGePred also appropriately controls the number of false positives when applied in realistic clinical settings like the UDN. Finally, to facilitate the rapid screening of variant gene pairs for digenic disease potential, we freely provide the predictions of DiGePred on all human gene pairs. Our work facilitates the discovery of genetic causes for rare non-monogenic diseases by providing a means to rapidly evaluate variant gene pairs for the potential to cause digenic disease.

Author(s):  
Souhrid Mukherjee ◽  
Joy D. Cogan ◽  
John H. Newman ◽  
John A. Phillips ◽  
Rizwan Hamid ◽  
...  

2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Dustin Baldridge ◽  
◽  
Michael F. Wangler ◽  
Angela N. Bowman ◽  
Shinya Yamamoto ◽  
...  

AbstractDecreased sequencing costs have led to an explosion of genetic and genomic data. These data have revealed thousands of candidate human disease variants. Establishing which variants cause phenotypes and diseases, however, has remained challenging. Significant progress has been made, including advances by the National Institutes of Health (NIH)-funded Undiagnosed Diseases Network (UDN). However, 6000–13,000 additional disease genes remain to be identified. The continued discovery of rare diseases and their genetic underpinnings provides benefits to affected patients, of whom there are more than 400 million worldwide, and also advances understanding the mechanisms of more common diseases. Platforms employing model organisms enable discovery of novel gene-disease relationships, help establish variant pathogenicity, and often lead to the exploration of underlying mechanisms of pathophysiology that suggest new therapies. The Model Organism Screening Center (MOSC) of the UDN is a unique resource dedicated to utilizing informatics and functional studies in model organisms, including worm (Caenorhabditis elegans), fly (Drosophila melanogaster), and zebrafish (Danio rerio), to aid in diagnosis. The MOSC has directly contributed to the diagnosis of challenging cases, including multiple patients with complex, multi-organ phenotypes. In addition, the MOSC provides a framework for how basic scientists and clinicians can collaborate to drive diagnoses. Customized experimental plans take into account patient presentations, specific genes and variant(s), and appropriateness of each model organism for analysis. The MOSC also generates bioinformatic and experimental tools and reagents for the wider scientific community. Two elements of the MOSC that have been instrumental in its success are (1) multidisciplinary teams with expertise in variant bioinformatics and in human and model organism genetics, and (2) mechanisms for ongoing communication with clinical teams. Here we provide a position statement regarding the central role of model organisms for continued discovery of disease genes, and we advocate for the continuation and expansion of MOSC-type research entities as a Model Organisms Network (MON) to be funded through grant applications submitted to the NIH, family groups focused on specific rare diseases, other philanthropic organizations, industry partnerships, and other sources of support.


Author(s):  
Shilpa Nadimpalli Kobren ◽  
◽  
Dustin Baldridge ◽  
Matt Velinder ◽  
Joel B. Krier ◽  
...  

Abstract Purpose Genomic sequencing has become an increasingly powerful and relevant tool to be leveraged for the discovery of genetic aberrations underlying rare, Mendelian conditions. Although the computational tools incorporated into diagnostic workflows for this task are continually evolving and improving, we nevertheless sought to investigate commonalities across sequencing processing workflows to reveal consensus and standard practice tools and highlight exploratory analyses where technical and theoretical method improvements would be most impactful. Methods We collected details regarding the computational approaches used by a genetic testing laboratory and 11 clinical research sites in the United States participating in the Undiagnosed Diseases Network via meetings with bioinformaticians, online survey forms, and analyses of internal protocols. Results We found that tools for processing genomic sequencing data can be grouped into four distinct categories. Whereas well-established practices exist for initial variant calling and quality control steps, there is substantial divergence across sites in later stages for variant prioritization and multimodal data integration, demonstrating a diversity of approaches for solving the most mysterious undiagnosed cases. Conclusion The largest differences across diagnostic workflows suggest that advances in structural variant detection, noncoding variant interpretation, and integration of additional biomedical data may be especially promising for solving chronically undiagnosed cases.


2021 ◽  
Vol 132 ◽  
pp. S187
Author(s):  
Laurie Findley ◽  
Jill Rosenfeld ◽  
Rebecca Spillman ◽  
Heidi Cope ◽  
Kelly Schoch ◽  
...  

2020 ◽  
Vol 129 (4) ◽  
pp. 243-254 ◽  
Author(s):  
D. Taruscio ◽  
G. Baynam ◽  
H. Cederroth ◽  
S.C. Groft ◽  
E.W. Klee ◽  
...  

2019 ◽  
Vol 28 (2) ◽  
pp. 194-201 ◽  
Author(s):  
Ellen F. Macnamara ◽  
Kelly Schoch ◽  
Emily G. Kelley ◽  
Elizabeth Fieg ◽  
Elly Brokamp ◽  
...  

2018 ◽  
Vol 196 ◽  
pp. 291-297.e2 ◽  
Author(s):  
Chloe M. Reuter ◽  
Elise Brimble ◽  
Colette DeFilippo ◽  
Annika M. Dries ◽  
Gregory M. Enns ◽  
...  

2013 ◽  
Vol 41 (20) ◽  
pp. 9209-9217 ◽  
Author(s):  
Hyun Wook Han ◽  
Jung Hun Ohn ◽  
Jisook Moon ◽  
Ju Han Kim

Sign in / Sign up

Export Citation Format

Share Document