scholarly journals Annotation of Human Exome Gene Variants with Consensus Pathogenicity

Author(s):  
Victor Zharavin ◽  
James Balmford ◽  
Patrick Metzger ◽  
Melanie Boerries ◽  
Harald Binder ◽  
...  

Pathogenicity is unknown for the majority of human gene variants. For prioritization of sequenced somatic and germline mutation variants, in silico approaches can be utilized. In this study, 84 million non-synonymous Single Nucleotide Variants (SNVs) in the human coding genome were annotated using consensus Variant Effect Prediction (cVEP) method. An algorithm, implemented as a stacked ensemble of supervised learners, performed combination of the 39 functional, conservation mutation impact scores from dbNSFP4.0. Adding gene indispensability score, accounting for differences in the pathogenicities of the variants in the essential and the mutation-tolerant genes, improved the predictions. For each SNV the consensus combination gives either a continuous-value pathogenicity score, or a categorical score in five classes: pathogenic, likely pathogenic, uncertain significance, likely benign, benign. The provided class database is aimed for direct use in clinical practice. The trained prediction models were 5-fold cross-validated on the evidence-based categorical annotations from the ClinVar database. The rankings of the scores based on their ability to predict pathogenicity were obtained. A two-step strategy using the rankings, scores and class annotations is suggested for filtering and prioritization of the human exome mutations in clinical and biological applications of NGS technology.

Genes ◽  
2020 ◽  
Vol 11 (9) ◽  
pp. 1076
Author(s):  
Victor Jaravine ◽  
James Balmford ◽  
Patrick Metzger ◽  
Melanie Boerries ◽  
Harald Binder ◽  
...  

A novel approach is developed to address the challenge of annotating with phenotypic effects those exome variants for which relevant empirical data are lacking or minimal. The predictive annotation method is implemented as a stacked ensemble of supervised base-learners, including distributed random forest and gradient boosting machines. Ensemble models were trained and cross-validated on evidence-based categorical variant effect annotations from the ClinVar database, and were applied to 84 million non-synonymous single nucleotide variants (SNVs). The consensus model combined 39 functional mutation impacts, cross-species conservation score, and gene indispensability score. The indispensability score, accounting for differences in variant pathogenicities including in essential and mutation-tolerant genes, considerably improved the predictions. The consensus combination is consistent with as many input scores as possible while minimizing false predictions. The input scores are ranked based on their ability to predict effects. The score rankings and categorical phenotypic variant effect predictions are aimed for direct use in clinical and biological applications to prioritize human exome variants and mutations.


2021 ◽  
Author(s):  
Chang Li ◽  
Degui Zhi ◽  
Kai Wang ◽  
Xiaoming Liu

We present the pathogenicity prediction models MetaRNN and MetaRNN-indel to help identify and prioritize rare nonsynonymous single nucleotide variants (nsSNVs) and non-frameshift insertion/deletions (nfINDELs) using deep learning and context annotations. Employing independent test datasets, we demonstrate that these new models outperform state-of-the-art competitors and achieve a more interpretable score distribution. MetaRNN executables and precomputed scores are available at http://www.liulab.science/MetaRNN.


2020 ◽  
Vol 36 (20) ◽  
pp. 4977-4983 ◽  
Author(s):  
Jing-Bo Zhou ◽  
Yao Xiong ◽  
Ke An ◽  
Zhi-Qiang Ye ◽  
Yun-Dong Wu

Abstract Motivation Despite of the lack of folded structure, intrinsically disordered regions (IDRs) of proteins play versatile roles in various biological processes, and many nonsynonymous single nucleotide variants (nsSNVs) in IDRs are associated with human diseases. The continuous accumulation of nsSNVs resulted from the wide application of NGS has driven the development of disease-association prediction methods for decades. However, their performance on nsSNVs in IDRs remains inferior, possibly due to the domination of nsSNVs from structured regions in training data. Therefore, it is highly demanding to build a disease-association predictor specifically for nsSNVs in IDRs with better performance. Results We present IDRMutPred, a machine learning-based tool specifically for predicting disease-associated germline nsSNVs in IDRs. Based on 17 selected optimal features that are extracted from sequence alignments, protein annotations, hydrophobicity indices and disorder scores, IDRMutPred was trained using three ensemble learning algorithms on the training dataset containing only IDR nsSNVs. The evaluation on the two testing datasets shows that all the three prediction models outperform 17 other popular general predictors significantly, achieving the ACC between 0.856 and 0.868 and MCC between 0.713 and 0.737. IDRMutPred will prioritize disease-associated IDR germline nsSNVs more reliably than general predictors. Availability and implementation The software is freely available at http://www.wdspdb.com/IDRMutPred. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Troy A. McDiarmid ◽  
Vinci Au ◽  
Aaron D. Loewen ◽  
Joseph Liang ◽  
Kota Mizumoto ◽  
...  

AbstractOur ability to sequence genomes has vastly surpassed our ability to interpret the genetic variation we discover. This presents a major challenge in the clinical setting, where the recent application of whole exome and whole genome sequencing has uncovered thousands of genetic variants of uncertain significance. Here, we present a strategy for targeted human gene replacement and phenomic characterization based on CRISPR-Cas9 genome engineering in the genetic model organism Caenorhabditis elegans that will facilitate assessment of the functional conservation of human genes and structure-function analysis of disease-associated variants with unprecedented precision. We validate our strategy by demonstrating that direct single-copy replacement of the C. elegans ortholog (daf-18) with the critical human disease-associated gene Phosphatase and Tensin Homolog (PTEN) is sufficient to rescue multiple phenotypic abnormalities caused by complete deletion of daf-18, including complex chemosensory and mechanosenory impairments. In addition, we used our strategy to generate animals harboring a single copy of the known pathogenic lipid phosphatase inactive PTEN variant (PTEN-G129E) and showed that our automated in vivo phenotypic assays could accurately and efficiently classify this missense variant as loss-of-function. The integrated nature of the human transgenes allows for analysis of both homozygous and heterozygous variants and greatly facilitates high-throughput precision medicine drug screens. By combining genome engineering with rapid and automated phenotypic characterization, our strategy streamlines identification of novel conserved gene functions in complex sensory and learning phenotypes that can be used as in vivo functional assays to decipher variants of uncertain significance.


2021 ◽  

Background: Dravet syndrome (DS) is a rare and severe epileptic syndrome of childhood with a prevalence around 1/40,000 people worldwide. Approximately 80% of patients with DS present SCN1A pathogenic variants, which encodes an alpha subunit of a neural voltage- dependent sodium channel. SCN1A variants were also related to DS. There is a correlation between PCDH19 pathogenic variants, encodes the protocadherin 19, and a similar disease to DS known as DS-like phenotype. Objectives: To clarify the differences between DS and DS-like phenotype according to the SCN1A and PCDH19 variants. Methodology: A review from March/2019 to November/2020 was conducted in PubMed and VHL databases, following PRISMA criteria. Results: 19 studies were included and a significant proportion of patients with DS carrying SCN1A was greater than patients with DS-like phenotype harboring PCDH19 variants (76.6% vs. 23.4%). Considering SCN1A and PCDH19, 47 variants were pathogenic and 12 of uncertain significance; 25% were deletions and 75% were single- nucleotide variants. Autism was predominantly observed in patients with DS-like carrying PCDH19 variants compared to SCN1A variants carriers (62.5% vs. 37.5%, p=0.044). In addition, it was noticed a significant predisposition to hyperthermia during seizures in patients with variants in the PCDH19 (p=0.003). There was no significance differences between both groups and cognitive deficit, ataxia, behavior problems, and motor deficit. Conclusions: The study is the first to point out differences between the DS and DS-like phenotype according to the SCN1A and PCDH19 variants.


2021 ◽  
Vol 12 ◽  
Author(s):  
Peter Sparber ◽  
Svetlana Mikhaylova ◽  
Varvara Galkina ◽  
Yulia Itkis ◽  
Mikhail Skoblov

Pathogenic variants in the SCN1A gene are associated with a spectrum of epileptic disorders ranging in severity from familial febrile seizures to Dravet syndrome. Large proportions of reported pathogenic variants in SCN1A are annotated as missense variants and are often classified as variants of uncertain significance when no functional data are available. Although loss-of-function variants are associated with a more severe phenotype in SCN1A, the molecular mechanism of single nucleotide variants is often not clear, and genotype-phenotype correlations in SCN1A-related epilepsy remain uncertain. Coding variants can affect splicing by creating novel cryptic splicing sites in exons or by disrupting exonic cis-regulation elements crucial for proper pre-mRNA splicing. Here, we report a novel case of Dravet syndrome caused by an undescribed missense variant, c.4852G>A (p.(Gly1618Ser)). By midigene splicing assay, we demonstrated that the identified variant is in fact splice-affecting. To our knowledge, this is the first report on the functional investigation of a missense variant affecting splicing in Dravet syndrome.


Sign in / Sign up

Export Citation Format

Share Document