variant effect prediction Latest Research Papers

AbstractThe development of an accurate and reliable variant effect prediction tool is important for research in human genetic diseases. A large number of predictors have been developed towards this goal, yet many of these predictors suffer from the problem of data circularity. Here we present MTBAN (Mutation effect predictor using the Temporal convolutional network and the Born-Again Networks), a method for predicting the deleteriousness of variants. We apply a form of knowledge distillation technique known as the Born-Again Networks (BAN) to a previously developed deep autoregressive generative model, mutationTCN, to achieve an improved performance in variant effect prediction. As the model is fully unsupervised and trained only on the evolutionarily related sequences of a protein, it does not suffer from the problem of data circularity which is common across supervised predictors. When evaluated on a test dataset consisting of deleterious and benign human protein variants, MTBAN shows an outstanding predictive ability compared to other well-known variant effect predictors. We also offer a user-friendly web server to predict variant effects using MTBAN, freely accessible at http://mtban.kaist.ac.kr. To our knowledge, MTBAN is the first variant effect prediction tool based on a deep generative model that provides a user-friendly web server for the prediction of deleteriousness of variants.

Download Full-text

Predicting functional consequences of mutations using molecular interaction network features

Human Genetics ◽

10.1007/s00439-021-02329-5 ◽

2021 ◽

Author(s):

Kivilcim Ozturk ◽

Hannah Carter

Keyword(s):

Protein Sequence ◽

Interaction Network ◽

Single Amino Acid ◽

Missense Mutations ◽

Protein Activity ◽

Missense Variants ◽

Benchmark Datasets ◽

Variant Effect ◽

Context Specific ◽

Variant Effect Prediction

AbstractVariant interpretation remains a central challenge for precision medicine. Missense variants are particularly difficult to understand as they change only a single amino acid in a protein sequence yet can have large and varied effects on protein activity. Numerous tools have been developed to identify missense variants with putative disease consequences from protein sequence and structure. However, biological function arises through higher order interactions among proteins and molecules within cells. We therefore sought to capture information about the potential of missense mutations to perturb protein interaction networks by integrating protein structure and interaction data. We developed 16 network-based annotations for missense mutations that provide orthogonal information to features classically used to prioritize variants. We then evaluated them in the context of a proven machine-learning framework for variant effect prediction across multiple benchmark datasets to demonstrate their potential to improve variant classification. Interestingly, network features resulted in larger performance gains for classifying somatic mutations than for germline variants, possibly due to different constraints on what mutations are tolerated at the cellular versus organismal level. Our results suggest that modeling variant potential to perturb context-specific interactome networks is a fruitful strategy to advance in silico variant effect prediction.

Download Full-text

Erratum to: Evotuning protocols for Transformer-based variant effect prediction on multi-domain proteins

Briefings in Bioinformatics ◽

10.1093/bib/bbab287 ◽

2021 ◽

Author(s):

Hideki Yamaguchi ◽

Yutaka Saito

Keyword(s):

Variant Effect ◽

Variant Effect Prediction

Download Full-text

MTBAN: An Enhanced Variant Effect Predictor Based on a Deep Generative Model

10.21203/rs.3.rs-649705/v1 ◽

2021 ◽

Author(s):

Ha Young Kim ◽

Woosung Jeon ◽

Dongsup Kim

Keyword(s):

Genetic Diseases ◽

Web Server ◽

Predictive Ability ◽

Generative Model ◽

Prediction Tool ◽

Convolutional Network ◽

Variant Effect ◽

Born Again ◽

User Friendly ◽

Variant Effect Prediction

Abstract The development of an accurate and reliable variant effect prediction tool is important for research in human genetic diseases. A large number of predictors have been developed towards this goal, yet many of these predictors suffer from the problem of data circularity. Here we present MTBAN (Mutation effect predictor using the Temporal convolutional network and the Born-Again Networks), a method for predicting the deleteriousness of variants. We apply a form of knowledge distillation technique known as the Born-Again Networks (BAN) to a previously developed deep autoregressive generative model, mutationTCN, to achieve an improved performance in variant effect prediction. As the model is fully unsupervised and trained only on the evolutionarily related sequences of a protein, it does not suffer from the problem of data circularity which is common across supervised predictors. When evaluated on a test dataset consisting of deleterious and benign human protein variants, MTBAN shows an outstanding predictive ability compared to other well-known variant effect predictors. We also offer a user-friendly web server to predict variant effects using MTBAN, freely accessible at http://mtban.kaist.ac.kr. To our knowledge, MTBAN is the first variant effect prediction tool based on a deep generative model that provides a user-friendly web server for the prediction of deleteriousness of variants.

Download Full-text

MutationTaster2021

Nucleic Acids Research ◽

10.1093/nar/gkab266 ◽

2021 ◽

Author(s):

Robin Steinhaus ◽

Sebastian Proft ◽

Markus Schuelke ◽

David N Cooper ◽

Jana Marie Schwarz ◽

...

Keyword(s):

Prediction Model ◽

Clinical Phenotype ◽

The Novel ◽

User Friendliness ◽

Major Overhaul ◽

Splice Site Prediction ◽

Disease Mutations ◽

Many Sources ◽

Variant Effect Prediction ◽

Mutation Search

Abstract Here we present an update to MutationTaster, our DNA variant effect prediction tool. The new version uses a different prediction model and attains higher accuracy than its predecessor, especially for rare benign variants. In addition, we have integrated many sources of data that only became available after the last release (such as gnomAD and ExAC pLI scores) and changed the splice site prediction model. To more easily assess the relevance of detected known disease mutations to the clinical phenotype of the patient, MutationTaster now provides information on the diseases they cause. Further changes represent a major overhaul of the interfaces to increase user-friendliness whilst many changes under the hood have been designed to accelerate the processing of uploaded VCF files. We also offer an API for the rapid automated query of smaller numbers of variants from within other software. MutationTaster2021 integrates our disease mutation search engine, MutationDistiller, to prioritise variants from VCF files using the patient's clinical phenotype. The novel version is available at https://www.genecascade.org/MutationTaster2021/. This website is free and open to all users and there is no login requirement.

Download Full-text

Predicting functional consequences of mutations using molecular interaction network features

10.1101/2021.03.05.433991 ◽

2021 ◽

Author(s):

Kivilcim Ozturk ◽

Hannah Carter

Keyword(s):

Protein Sequence ◽

Interaction Network ◽

Single Amino Acid ◽

Missense Mutations ◽

Protein Activity ◽

Missense Variants ◽

Benchmark Datasets ◽

Variant Effect ◽

Context Specific ◽

Variant Effect Prediction

AbstractVariant interpretation remains a central challenge for precision medicine. Missense variants are particularly difficult to understand as they change only a single amino acid in protein sequence yet can have large and varied effects on protein activity. Numerous tools have been developed to identify missense variants with putative disease consequences from protein sequence and structure. However, biological function arises through higher order interactions among proteins and molecules within cells. We therefore sought to capture information about the potential of missense mutations to perturb protein interaction networks by integrating protein structure and interaction data. We developed 16 network-based annotations for missense mutations that provide orthogonal information to features classically used to prioritize variants. We then evaluated them in the context of a proven machine-learning framework for variant effect prediction across multiple benchmark datasets to demonstrate their potential to improve variant classification. Interestingly, network features resulted in larger performance gains for classifying somatic mutations than for germline variants, possibly due to different constraints on what mutations are tolerated at the cellular versus organismal level. Our results suggest that modeling variant potential to perturb context-specific interactome networks is a fruitful strategy to advance in silico variant effect prediction.

Download Full-text

Evotuning protocols for Transformer-based variant effect prediction on multi-domain proteins

10.1101/2021.03.05.434175 ◽

2021 ◽

Author(s):

Hideki Yamaguchi ◽

Yutaka Saito

Keyword(s):

Language Processing ◽

Structural Information ◽

Level Structure ◽

Domain Architecture ◽

Fine Tuning ◽

Homology Search ◽

Learning Approaches ◽

Variant Effect ◽

The University ◽

Variant Effect Prediction

AbstractAccurate variant effect prediction has broad impacts on protein engineering. Recent machine learning approaches toward this end are based on representation learning, by which feature vectors are learned and generated from unlabeled sequences. However, it is unclear how to effectively learn evolutionary properties of an engineering target protein from homologous sequences, taking into account the protein’s sequence-level structure called domain architecture (DA). Additionally, no optimal protocols are established for incorporating such properties into Transformer, the neural network well-known to perform the best in natural language processing research. This article proposes DA-aware evolutionary fine-tuning, or “evotuning”, protocols for Transformer-based variant effect prediction, considering various combinations of homology search, fine-tuning, and sequence vectorization strategies. We exhaustively evaluated our protocols on diverse proteins with different functions and DAs. The results indicated that our protocols achieved significantly better performances than previous DA-unaware ones. The visualizations of attention maps suggested that the structural information was incorporated by evotuning without direct supervision, possibly leading to better prediction accuracy.Short descriptions of the authorsHideki Yamaguchi is a PhD candidate at the Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo. Yutaka Saito, PhD, is a senior researcher at Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), and a visiting associate professor at The University of Tokyo.Availabilityhttps://github.com/dlnp2/evotuning_protocols_for_transformers

Download Full-text

A missense variant effect prediction and annotation resource for SARS-CoV-2

10.1101/2021.02.24.432721 ◽

2021 ◽

Author(s):

Alistair Dunham ◽

Gwendolyn M Jang ◽

Monita Muralidharan ◽

Danielle Swaney ◽

Pedro Beltrao

Keyword(s):

Structural Models ◽

Missense Variant ◽

Evolutionary Conservation ◽

Single Amino Acid ◽

Variant Frequency ◽

Expert Analysis ◽

Variant Effect ◽

Complex Structural ◽

The Impact ◽

Variant Effect Prediction

AbstractThe COVID19 pandemic is a global crisis severely impacting many people across the world. An important part of the response is monitoring viral variants and determining the impact they have on viral properties, such as infectivity, disease severity and interactions with drugs and vaccines. In this work we generate and make available computational variant effect predictions for all possible single amino-acid substitutions to SARS-CoV-2 in order to complement and facilitate experiments and expert analysis. The resulting dataset contains predictions from evolutionary conservation and protein and complex structural models, combined with viral phosphosites, experimental results and variant frequencies. We demonstrate predictions’ effectiveness by comparing them with expectations from variant frequency and prior experiments. We then identify higher frequency variants with significant predicted effects as well as finding variants measured to impact antibody binding that are least likely to impact other viral functions. A web portal is available at sars.mutfunc.com, where the dataset can be searched and downloaded.

Download Full-text

CADD-Splice—improving genome-wide variant effect prediction using deep learning-derived splice scores

Genome Medicine ◽

10.1186/s13073-021-00835-9 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Philipp Rentzsch ◽

Max Schubach ◽

Jay Shendure ◽

Martin Kircher

Keyword(s):

Prediction Models ◽

Splice Variants ◽

Superior Performance ◽

Data Set ◽

Pathogenic Variants ◽

Genome Wide ◽

Donor And Acceptor ◽

Human Proteins ◽

Variant Effect ◽

Variant Effect Prediction

Abstract Background Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies. Methods It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants. Results We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance. Conclusions While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction.

Download Full-text

variant effect prediction
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Bioinformatics of Variant Effect Prediction

An enhanced variant effect predictor based on a deep generative model and the Born-Again Networks

Predicting functional consequences of mutations using molecular interaction network features

Erratum to: Evotuning protocols for Transformer-based variant effect prediction on multi-domain proteins

MTBAN: An Enhanced Variant Effect Predictor Based on a Deep Generative Model

MutationTaster2021

Predicting functional consequences of mutations using molecular interaction network features

Evotuning protocols for Transformer-based variant effect prediction on multi-domain proteins

A missense variant effect prediction and annotation resource for SARS-CoV-2

CADD-Splice—improving genome-wide variant effect prediction using deep learning-derived splice scores

Export Citation Format

variant effect predictionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Bioinformatics of Variant Effect Prediction

An enhanced variant effect predictor based on a deep generative model and the Born-Again Networks

Predicting functional consequences of mutations using molecular interaction network features

Erratum to: Evotuning protocols for Transformer-based variant effect prediction on multi-domain proteins

MTBAN: An Enhanced Variant Effect Predictor Based on a Deep Generative Model

MutationTaster2021

Predicting functional consequences of mutations using molecular interaction network features

Evotuning protocols for Transformer-based variant effect prediction on multi-domain proteins

A missense variant effect prediction and annotation resource for SARS-CoV-2

CADD-Splice—improving genome-wide variant effect prediction using deep learning-derived splice scores

variant effect prediction
Recently Published Documents