genome interpretation
Recently Published Documents


TOTAL DOCUMENTS

39
(FIVE YEARS 15)

H-INDEX

9
(FIVE YEARS 3)

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Francisco M. De La Vega ◽  
Shimul Chowdhury ◽  
Barry Moore ◽  
Erwin Frise ◽  
Jeanette McCarthy ◽  
...  

Abstract Background Clinical interpretation of genetic variants in the context of the patient’s phenotype is becoming the largest component of cost and time expenditure for genome-based diagnosis of rare genetic diseases. Artificial intelligence (AI) holds promise to greatly simplify and speed genome interpretation by integrating predictive methods with the growing knowledge of genetic disease. Here we assess the diagnostic performance of Fabric GEM, a new, AI-based, clinical decision support tool for expediting genome interpretation. Methods We benchmarked GEM in a retrospective cohort of 119 probands, mostly NICU infants, diagnosed with rare genetic diseases, who received whole-genome or whole-exome sequencing (WGS, WES). We replicated our analyses in a separate cohort of 60 cases collected from five academic medical centers. For comparison, we also analyzed these cases with current state-of-the-art variant prioritization tools. Included in the comparisons were trio, duo, and singleton cases. Variants underpinning diagnoses spanned diverse modes of inheritance and types, including structural variants (SVs). Patient phenotypes were extracted from clinical notes by two means: manually and using an automated clinical natural language processing (CNLP) tool. Finally, 14 previously unsolved cases were reanalyzed. Results GEM ranked over 90% of the causal genes among the top or second candidate and prioritized for review a median of 3 candidate genes per case, using either manually curated or CNLP-derived phenotype descriptions. Ranking of trios and duos was unchanged when analyzed as singletons. In 17 of 20 cases with diagnostic SVs, GEM identified the causal SVs as the top candidate and in 19/20 within the top five, irrespective of whether SV calls were provided or inferred ab initio by GEM using its own internal SV detection algorithm. GEM showed similar performance in absence of parental genotypes. Analysis of 14 previously unsolved cases resulted in a novel finding for one case, candidates ultimately not advanced upon manual review for 3 cases, and no new findings for 10 cases. Conclusions GEM enabled diagnostic interpretation inclusive of all variant types through automated nomination of a very short list of candidate genes and disorders for final review and reporting. In combination with deep phenotyping by CNLP, GEM enables substantial automation of genetic disease diagnosis, potentially decreasing cost and expediting case review.


2021 ◽  
Author(s):  
Shameer Khader ◽  
Benjamin Glicksberg ◽  
Kipp W Johnson ◽  
Marcus Badgeley ◽  
Joel Dudley

A complete understanding of phenomic space is critical for elucidating genome-phenome relationships and assessing disease risk from genome sequencing. We developed a new genome interpretation metric called Pleiotropic Variability Score (PVS) to incorporate phenomic variability into the variant interpretation. PVS uses ontologies of human diseases and medical phenotypes, namely human phenotype ontology (HPO) and disease ontology (DO), to compute the similarities of disease and clinical phenotypes associated with a genetic variant based on semantic reasoning algorithms. We tested 78 unique semantic similarity methods and integrated six robust metrics to define the pleiotropy score of SNPs. We computed PVS for 12, 541 SNPs (10, 021 SNPs mapped to DO phenotype and 8, 569 SNPs mapped to HPO phenotypes) using a repertoire of 382 HPO and 317 DO unique phenotype terms compiled from the genotype-phenotype catalog. We validated the utility of PVS by computing pleiotropy using an electronic health record-linked genomic database (BioME, n=11,210) and generated allele-specific pleiotropy. Further, we demonstrate PVS application in personalized medicine using personalized pleiotropy score reports for individuals with genomic data that could potentially aid in variant interpretation. We further developed a software framework to incorporate PVS into VCF files and consolidate pleiotropy assessment as part of genome interpretation pipelines. As the genome-phenome catalogs are growing, PVS will be a useful metric to assess genetic variation to find SNPs with highly pleiotropic effects. Additionally, genome analysts can prioritize variants with varying degrees of pleiotropy for explorative studies to understand the specific roles of SNPs and pleiotropic hubs in mediating novel phenotypes and drug development.


2021 ◽  
Vol 12 ◽  
Author(s):  
Manuel Corpas ◽  
Karyn Megy ◽  
Vanisha Mistry ◽  
Antonio Metastasio ◽  
Edmund Lehmann

Although best practices have emerged on how to analyse and interpret personal genomes, the utility of whole genome screening remains underdeveloped. A large amount of information can be gathered from various types of analyses via whole genome sequencing including pathogenicity screening, genetic risk scoring, fitness, nutrition, and pharmacogenomic analysis. We recognize different levels of confidence when assessing the validity of genetic markers and apply rigorous standards for evaluation of phenotype associations. We illustrate the application of this approach on a family of five. By applying analyses of whole genomes from different methodological perspectives, we are able to build a more comprehensive picture to assist decision making in preventative healthcare and well-being management. Our interpretation and reporting outputs provide input for a clinician to develop a healthcare plan for the individual, based on genetic and other healthcare data.


Author(s):  
Yaqiong Wang ◽  
Aashish N. Adhikari ◽  
Uma Sunderam ◽  
Mark N. Kvale ◽  
Robert J. Currier ◽  
...  

AbstractMotivationGenome sequencing is being used routinely in clinical and research applications, but subsequent variant interpretation pipelines can vary widely. A systematic approach for exploring parameter choices and selection plays an important role in designing robust pipelines for specific clinical applications.ResultsWe present a framework to be applied in scenarios with limited data whereby expert knowledge informs pipeline refinement. Starting from initial reference variant interpretation pipelines with commonly used parameters, we derived pipelines by perturbing the parameters one by one to determine which parameters can yield meaningful changes in a pipeline’s performance. We updated the reference pipeline by fixing the value of parameters which have small impact on the pipeline’s performance. Then we conducted new rounds of perturbation as the process converged, yielding a stable pipeline which is robust. We applied the framework for genetic disease prediction in de-identified exomes from a cohort of 138 individuals with rare Mendelian inborn errors of metabolism (IEMs) and systematically explored how perturbing different parameters affected the pipeline’s sensitivity and specificity. For this application, we perturbed commonly used parameters in variant interpretation pipelines, including choices of genes, variant callers, transcript models, databases of allele frequencies, databases of curated disease variants, and tools for variant impact prediction. Our analyses showed that choice of variant callers, variant impact prediction tools, MAF threshold, and MAF databases can meaningfully alter results from a pipeline. This work informs the development of exome analysis pipelines designed for newborn metabolic disorder screening and suggests the general application of perturbation analysis in genome interpretation pipeline design.


2020 ◽  
Vol 2 (1) ◽  
Author(s):  
Daniele Raimondi ◽  
Jaak Simm ◽  
Adam Arany ◽  
Piero Fariselli ◽  
Isabelle Cleynen ◽  
...  

Abstract Whole exome sequencing (WES) data are allowing researchers to pinpoint the causes of many Mendelian disorders. In time, sequencing data will be crucial to solve the genome interpretation puzzle, which aims at uncovering the genotype-to-phenotype relationship, but for the moment many conceptual and technical problems need to be addressed. In particular, very few attempts at the in-silico diagnosis of oligo-to-polygenic disorders have been made so far, due to the complexity of the challenge, the relative scarcity of the data and issues such as batch effects and data heterogeneity, which are confounder factors for machine learning (ML) methods. Here, we propose a method for the exome-based in-silico diagnosis of Crohn’s disease (CD) patients which addresses many of the current methodological issues. First, we devise a rational ML-friendly feature representation for WES data based on the gene mutational burden concept, which is suitable for small sample sizes datasets. Second, we propose a Neural Network (NN) with parameter tying and heavy regularization, in order to limit its complexity and thus the risk of over-fitting. We trained and tested our NN on 3 CD case-controls datasets, comparing the performance with the participants of previous CAGI challenges. We show that, notwithstanding the limited NN complexity, it outperforms the previous approaches. Moreover, we interpret the NN predictions by analyzing the learned patterns at the variant and gene level and investigating the decision process leading to each prediction.


2019 ◽  
Vol 40 (9) ◽  
pp. 1314-1320 ◽  
Author(s):  
Gregory McInnes ◽  
Roxana Daneshjou ◽  
Panagiostis Katsonis ◽  
Olivier Lichtarge ◽  
Rajgopal Srinivasan ◽  
...  

2019 ◽  
Vol 40 (9) ◽  
pp. 1197-1201 ◽  
Author(s):  
Gaia Andreoletti ◽  
Lipika R. Pal ◽  
John Moult ◽  
Steven E. Brenner

2019 ◽  
Author(s):  
Jing Zhang ◽  
Donghoon Lee ◽  
Vineet Dhiman ◽  
Peng Jiang ◽  
Jie Xu ◽  
...  

AbstractENCODE comprises thousands of functional genomics datasets, and the encyclopedia covers hundreds of cell types, providing a universal annotation for genome interpretation. However, for particular applications, it may be advantageous to use a customized annotation. Here, we develop such a custom annotation by leveraging advanced assays, such as eCLIP, Hi-C, and whole-genome STARR-seq on a number of data-rich ENCODE cell types. A key aspect of this annotation is comprehensive and experimentally derived networks of both transcription factors and RNA-binding proteins (TFs and RBPs). Cancer, a disease of system-wide dysregulation, is an ideal application for such a network-based annotation. Specifically, for cancer-associated cell types, we put regulators into hierarchies and measure their network change (rewiring) during oncogenesis. We also extensively survey TF-RBP crosstalk, highlighting how SUB1, a previously uncharacterized RBP, drives aberrant tumor expression and amplifies the effect of MYC, a well-known oncogenic TF. Furthermore, we show how our annotation allows us to place oncogenic transformations in the context of a broad cell space; here, many normal-to-tumor transitions move towards a stem-like state, while oncogene knockdowns show an opposing trend. Finally, we organize the resource into a coherent workflow to prioritize key elements and variants, in addition to regulators. We showcase the application of this prioritization to somatic burdening, cancer differential expression and GWAS. Targeted validations of the prioritized regulators, elements and variants using siRNA knockdowns, CRISPR-based editing, and luciferase assays demonstrate the value of the ENCODE resource.


Sign in / Sign up

Export Citation Format

Share Document