scholarly journals An enhanced variant effect predictor based on a deep generative model and the Born-Again Networks

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ha Young Kim ◽  
Woosung Jeon ◽  
Dongsup Kim

AbstractThe development of an accurate and reliable variant effect prediction tool is important for research in human genetic diseases. A large number of predictors have been developed towards this goal, yet many of these predictors suffer from the problem of data circularity. Here we present MTBAN (Mutation effect predictor using the Temporal convolutional network and the Born-Again Networks), a method for predicting the deleteriousness of variants. We apply a form of knowledge distillation technique known as the Born-Again Networks (BAN) to a previously developed deep autoregressive generative model, mutationTCN, to achieve an improved performance in variant effect prediction. As the model is fully unsupervised and trained only on the evolutionarily related sequences of a protein, it does not suffer from the problem of data circularity which is common across supervised predictors. When evaluated on a test dataset consisting of deleterious and benign human protein variants, MTBAN shows an outstanding predictive ability compared to other well-known variant effect predictors. We also offer a user-friendly web server to predict variant effects using MTBAN, freely accessible at http://mtban.kaist.ac.kr. To our knowledge, MTBAN is the first variant effect prediction tool based on a deep generative model that provides a user-friendly web server for the prediction of deleteriousness of variants.

2021 ◽  
Author(s):  
Ha Young Kim ◽  
Woosung Jeon ◽  
Dongsup Kim

Abstract The development of an accurate and reliable variant effect prediction tool is important for research in human genetic diseases. A large number of predictors have been developed towards this goal, yet many of these predictors suffer from the problem of data circularity. Here we present MTBAN (Mutation effect predictor using the Temporal convolutional network and the Born-Again Networks), a method for predicting the deleteriousness of variants. We apply a form of knowledge distillation technique known as the Born-Again Networks (BAN) to a previously developed deep autoregressive generative model, mutationTCN, to achieve an improved performance in variant effect prediction. As the model is fully unsupervised and trained only on the evolutionarily related sequences of a protein, it does not suffer from the problem of data circularity which is common across supervised predictors. When evaluated on a test dataset consisting of deleterious and benign human protein variants, MTBAN shows an outstanding predictive ability compared to other well-known variant effect predictors. We also offer a user-friendly web server to predict variant effects using MTBAN, freely accessible at http://mtban.kaist.ac.kr. To our knowledge, MTBAN is the first variant effect prediction tool based on a deep generative model that provides a user-friendly web server for the prediction of deleteriousness of variants.


2021 ◽  
Author(s):  
Asieh Amousoltani Arani ◽  
Mohammadreza Sehhati ◽  
Mohammad Amin Tabatabaiefar

A new feature space, which can discriminate deleterious variants, was constructed by the integration of various input data using the proposed supervised nonnegative matrix tri-factorization (sNMTF) algorithm.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Philipp Rentzsch ◽  
Max Schubach ◽  
Jay Shendure ◽  
Martin Kircher

Abstract Background Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies. Methods It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants. Results We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance. Conclusions While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction.


2020 ◽  
Vol 48 (W1) ◽  
pp. W147-W153 ◽  
Author(s):  
Douglas E V Pires ◽  
Carlos H M Rodrigues ◽  
David B Ascher

Abstract Significant efforts have been invested into understanding and predicting the molecular consequences of mutations in protein coding regions, however nearly all approaches have been developed using globular, soluble proteins. These methods have been shown to poorly translate to studying the effects of mutations in membrane proteins. To fill this gap, here we report, mCSM-membrane, a user-friendly web server that can be used to analyse the impacts of mutations on membrane protein stability and the likelihood of them being disease associated. mCSM-membrane derives from our well-established mutation modelling approach that uses graph-based signatures to model protein geometry and physicochemical properties for supervised learning. Our stability predictor achieved correlations of up to 0.72 and 0.67 (on cross validation and blind tests, respectively), while our pathogenicity predictor achieved a Matthew's Correlation Coefficient (MCC) of up to 0.77 and 0.73, outperforming previously described methods in both predicting changes in stability and in identifying pathogenic variants. mCSM-membrane will be an invaluable and dedicated resource for investigating the effects of single-point mutations on membrane proteins through a freely available, user friendly web server at http://biosig.unimelb.edu.au/mcsm_membrane.


2014 ◽  
Vol 2014 ◽  
pp. 1-4 ◽  
Author(s):  
Santosh Kumar Upadhyay ◽  
Shailesh Sharma

Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated protein (Cas) system facilitates targeted genome editing in organisms. Despite high demand of this system, finding a reliable tool for the determination of specific target sites in large genomic data remained challenging. Here, we report SSFinder, a python script to perform high throughput detection of specific target sites in large nucleotide datasets. The SSFinder is a user-friendly tool, compatible with Windows, Mac OS, and Linux operating systems, and freely available online.


Author(s):  
Shiqian He ◽  
Liang Kong ◽  
Jing Chen

Accurate detection of N6-methyladenine (6mA) sites by biochemical experiments will help to reveal their biological functions, still, these wet experiments are laborious and expensive. Therefore, it is necessary to introduce a powerful computational model to identify the 6mA sites on a genomic scale, especially for plant genomes. In view of this, we proposed a model called iDNA6mA-Rice-DL for the effective identification of 6mA sites in rice genome, which is an intelligent computing model based on deep learning method. Traditional machine learning methods assume the preparation of the features for analysis. However, our proposed model automatically encodes and extracts key DNA features through an embedded layer and several groups of dense layers. We use an independent dataset to evaluate the generalization ability of our model. An area under the receiver operating characteristic curve (auROC) of 0.98 with an accuracy of 95.96% was obtained. The experiment results demonstrate that our model had good performance in predicting 6mA sites in the rice genome. A user-friendly local web server has been established. The Docker image of the local web server can be freely downloaded at https://hub.docker.com/r/his1server/idna6ma-rice-dl .


2021 ◽  
pp. 159-179
Author(s):  
Ariel José Berenstein ◽  
Franco Gino Brunello ◽  
Adrian Turjanski ◽  
Marcelo A. Martì

2019 ◽  
Vol 47 (W1) ◽  
pp. W52-W58 ◽  
Author(s):  
Ling Xu ◽  
Zhaobin Dong ◽  
Lu Fang ◽  
Yongjiang Luo ◽  
Zhaoyuan Wei ◽  
...  

Abstract OrthoVenn is a powerful web platform for the comparison and analysis of whole-genome orthologous clusters. Here we present an updated version, OrthoVenn2, which provides new features that facilitate the comparative analysis of orthologous clusters among up to 12 species. Additionally, this update offers improvements to data visualization and interpretation, including an occurrence pattern table for interrogating the overlap of each orthologous group for the queried species. Within the occurrence table, the functional annotations and summaries of the disjunctions and intersections of clusters between the chosen species can be displayed through an interactive Venn diagram. To facilitate a broader range of comparisons, a larger number of species, including vertebrates, metazoa, protists, fungi, plants and bacteria, have been added in OrthoVenn2. Finally, a stand-alone version is available to perform large dataset comparisons and to visualize results locally without limitation of species number. In summary, OrthoVenn2 is an efficient and user-friendly web server freely accessible at https://orthovenn2.bioinfotoolkits.net.


Author(s):  
Alessandro Brunelli ◽  
Silvia Cicconi ◽  
Herbert Decaluwe ◽  
Zalan Szanto ◽  
Pierre Emmanuel Falcoz

Abstract OBJECTIVES To develop a simplified version of the Eurolung risk model to predict cardiopulmonary morbidity and 30-day mortality after lung resection from the ESTS database. METHODS A total of 82 383 lung resections (63 681 lobectomies, 3617 bilobectomies, 7667 pneumonectomies and 7418 segmentectomies) recorded in the ESTS database (January 2007–December 2018) were analysed. Multiple imputations with chained equations were performed on the predictors included in the original Eurolung models. Stepwise selection was then applied for determining the best logistic model. To develop the parsimonious models, different models were tested eliminating variables one by one starting from the less significant. The models’ prediction power was evaluated estimating area under curve (AUC) with the 10-fold cross-validation technique. RESULTS Cardiopulmonary morbidity model (Eurolung1): the best parsimonious Eurolung1 model contains 5 variables. The logit of the parsimonious Eurolung1 model was as follows: −2.852 + 0.021 × age + 0.472 × male −0.015 × ppoFEV1 + 0.662×thoracotomy + 0.324 × extended resection. Pooled AUC is 0.710 [95% confidence interval (CI) 0.677–0.743]. Mortality model (Eurolung2): the best parsimonious model contains 6 variables. The logit of the parsimonious Eurolung2 model was as follows: −6.350 + 0.047 × age + 0.889 × male −0.055 × BMI −0.010 × ppoFEV1 + 0.892 × thoracotomy + 0.983 × pneumonectomy. Pooled AUC is 0.737 (95% CI 0.702–0.770). An aggregate parsimonious Eurolung2 was also generated by repeating the logistic regression after categorization of the numeric variables. Patients were grouped into 7 risk classes showing incremental risk of mortality (P < 0.0001). CONCLUSIONS We were able to develop simplified and updated versions of the Eurolung risk models retaining the predictive ability of the full original models. They represent a more user-friendly tool designed to inform the multidisciplinary discussion and shared decision-making process of lung resection candidates.


Sign in / Sign up

Export Citation Format

Share Document