scholarly journals SCONES: Self-Consistent Neural Network for Protein Stability Prediction Upon Mutation

Author(s):  
Yashas Samaga B L ◽  
Shampa Raghunathan ◽  
U. Deva Priyakumar

<div>Engineering proteins to have desired properties by mutating amino acids at specific sites is commonplace. Such engineered proteins must be stable to function. Experimental methods used to determine stability at throughputs required to scan the protein sequence space thoroughly are laborious. To this end, many machine learning based methods have been developed to predict thermodynamic stability changes upon mutation. These methods have been evaluated for symmetric consistency by testing with hypothetical reverse mutations. In this work, we propose transitive data augmentation, evaluating transitive consistency, and a new machine learning based method, first of its kind, that incorporates both symmetric and transitive properties into the architecture. Our method, called SCONES, is an interpretable neural network that estimates a residue's contributions towards protein stability dG in its local structural environment. The difference between independently predicted contributions of the reference and mutant residues in a missense mutation is reported as dG. We show that this self-consistent machine learning architecture is immune to many common biases in datasets, relies less on data than existing methods, and is robust to overfitting.</div><div><br></div>

2021 ◽  
Author(s):  
Yashas Samaga B L ◽  
Shampa Raghunathan ◽  
U. Deva Priyakumar

<div>Engineering proteins to have desired properties by mutating amino acids at specific sites is commonplace. Such engineered proteins must be stable to function. Experimental methods used to determine stability at throughputs required to scan the protein sequence space thoroughly are laborious. To this end, many machine learning based methods have been developed to predict thermodynamic stability changes upon mutation. These methods have been evaluated for symmetric consistency by testing with hypothetical reverse mutations. In this work, we propose transitive data augmentation, evaluating transitive consistency, and a new machine learning based method, first of its kind, that incorporates both symmetric and transitive properties into the architecture. Our method, called SCONES, is an interpretable neural network that estimates a residue's contributions towards protein stability dG in its local structural environment. The difference between independently predicted contributions of the reference and mutant residues in a missense mutation is reported as dG. We show that this self-consistent machine learning architecture is immune to many common biases in datasets, relies less on data than existing methods, and is robust to overfitting.</div><div><br></div>


2020 ◽  
Vol 117 (31) ◽  
pp. 18869-18879 ◽  
Author(s):  
Christopher Culley ◽  
Supreeta Vijayakumar ◽  
Guido Zampieri ◽  
Claudio Angione

Metabolic modeling and machine learning are key components in the emerging next generation of systems and synthetic biology tools, targeting the genotype–phenotype–environment relationship. Rather than being used in isolation, it is becoming clear that their value is maximized when they are combined. However, the potential of integrating these two frameworks for omic data augmentation and integration is largely unexplored. We propose, rigorously assess, and compare machine-learning–based data integration techniques, combining gene expression profiles with computationally generated metabolic flux data to predict yeast cell growth. To this end, we create strain-specific metabolic models for 1,143Saccharomyces cerevisiaemutants and we test 27 machine-learning methods, incorporating state-of-the-art feature selection and multiview learning approaches. We propose a multiview neural network using fluxomic and transcriptomic data, showing that the former increases the predictive accuracy of the latter and reveals functional patterns that are not directly deducible from gene expression alone. We test the proposed neural network on a further 86 strains generated in a different experiment, therefore verifying its robustness to an additional independent dataset. Finally, we show that introducing mechanistic flux features improves the predictions also for knockout strains whose genes were not modeled in the metabolic reconstruction. Our results thus demonstrate that fusing experimental cues with in silico models, based on known biochemistry, can contribute with disjoint information toward biologically informed and interpretable machine learning. Overall, this study provides tools for understanding and manipulating complex phenotypes, increasing both the prediction accuracy and the extent of discernible mechanistic biological insights.


Microscopy ◽  
2020 ◽  
Vol 69 (2) ◽  
pp. 92-109 ◽  
Author(s):  
Teruyasu Mizoguchi ◽  
Shin Kiyohara

Abstract Materials characterization is indispensable for materials development. In particular, spectroscopy provides atomic configuration, chemical bonding and vibrational information, which are crucial for understanding the mechanism underlying the functions of a material. Despite its importance, the interpretation of spectra using human-driven methods, such as manual comparison of experimental spectra with reference/simulated spectra, is becoming difficult owing to the rapid increase in experimental spectral data. To overcome the limitations of such methods, we develop new data-driven approaches based on machine learning. Specifically, we use hierarchical clustering, a decision tree and a feedforward neural network to investigate the electron energy loss near edge structures (ELNES) spectrum, which is identical to the X-ray absorption near edge structure (XANES) spectrum. Hierarchical clustering and the decision tree are used to interpret and predict ELNES/XANES, while the feedforward neural network is used to obtain hidden information about the material structure and properties from the spectra. Further, we construct a prediction model that is robust against noise by data augmentation. Finally, we apply our method to noisy spectra and predict six properties accurately. In summary, the proposed approaches can pave the way for fast and accurate spectrum interpretation/prediction as well as local measurement of material functions.


Author(s):  
Michael Fortunato ◽  
Connor W. Coley ◽  
Brian Barnes ◽  
Klavs F. Jensen

This work presents efforts to augment the performance of data-driven machine learning algorithms for reaction template recommendation used in computer-aided synthesis planning software. Often, machine learning models designed to perform the task of prioritizing reaction templates or molecular transformations are focused on reporting high accuracy metrics for the one-to-one mapping of product molecules in reaction databases to the template extracted from the recorded reaction. The available templates that get selected for inclusion in these machine learning models have been previously limited to those that appear frequently in the reaction databases and exclude potentially useful transformations. By augmenting open-access datasets of organic reactions with artificially calculated template applicability and pretraining a template relevance neural network on this augmented applicability dataset, we report an increase in the template applicability recall and an increase in the diversity of predicted precursors. The augmentation and pretraining effectively teaches the neural network an increased set of templates that could theoretically lead to successful reactions for a given target. Even on a small dataset of well curated reactions, the data augmentation and pretraining methods resulted in an increase in top-1 accuracy, especially for rare templates, indicating these strategies can be very useful for small datasets.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Octav Caldararu ◽  
Tom L. Blundell ◽  
Kasper P. Kepp

Abstract Background Prediction of the change in fold stability (ΔΔG) of a protein upon mutation is of major importance to protein engineering and screening of disease-causing variants. Many prediction methods can use 3D structural information to predict ΔΔG. While the performance of these methods has been extensively studied, a new problem has arisen due to the abundance of crystal structures: How precise are these methods in terms of structure input used, which structure should be used, and how much does it matter? Thus, there is a need to quantify the structural sensitivity of protein stability prediction methods. Results We computed the structural sensitivity of six widely-used prediction methods by use of saturated computational mutagenesis on a diverse set of 87 structures of 25 proteins. Our results show that structural sensitivity varies massively and surprisingly falls into two very distinct groups, with methods that take detailed account of the local environment showing a sensitivity of ~ 0.6 to 0.8 kcal/mol, whereas machine-learning methods display much lower sensitivity (~ 0.1 kcal/mol). We also observe that the precision correlates with the accuracy for mutation-type-balanced data sets but not generally reported accuracy of the methods, indicating the importance of mutation-type balance in both contexts. Conclusions The structural sensitivity of stability prediction methods varies greatly and is caused mainly by the models and less by the actual protein structural differences. As a new recommended standard, we therefore suggest that ΔΔG values are evaluated on three protein structures when available and the associated standard deviation reported, to emphasize not just the accuracy but also the precision of the method in a specific study. Our observation that machine-learning methods deemphasize structure may indicate that folded wild-type structures alone, without the folded mutant and unfolded structures, only add modest value for assessing protein stability effects, and that side-chain-sensitive methods overstate the significance of the folded wild-type structure.


2021 ◽  
Vol 12 ◽  
Author(s):  
Mary M. Maleckar ◽  
Lena Myklebust ◽  
Julie Uv ◽  
Per Magne Florvaag ◽  
Vilde Strøm ◽  
...  

Background: Remodeling due to myocardial infarction (MI) significantly increases patient arrhythmic risk. Simulations using patient-specific models have shown promise in predicting personalized risk for arrhythmia. However, these are computationally- and time- intensive, hindering translation to clinical practice. Classical machine learning (ML) algorithms (such as K-nearest neighbors, Gaussian support vector machines, and decision trees) as well as neural network techniques, shown to increase prediction accuracy, can be used to predict occurrence of arrhythmia as predicted by simulations based solely on infarct and ventricular geometry. We present an initial combined image-based patient-specific in silico and machine learning methodology to assess risk for dangerous arrhythmia in post-infarct patients. Furthermore, we aim to demonstrate that simulation-supported data augmentation improves prediction models, combining patient data, computational simulation, and advanced statistical modeling, improving overall accuracy for arrhythmia risk assessment.Methods: MRI-based computational models were constructed from 30 patients 5 days post-MI (the “baseline” population). In order to assess the utility biophysical model-supported data augmentation for improving arrhythmia prediction, we augmented the virtual baseline patient population. Each patient ventricular and ischemic geometry in the baseline population was used to create a subfamily of geometric models, resulting in an expanded set of patient models (the “augmented” population). Arrhythmia induction was attempted via programmed stimulation at 17 sites for each virtual patient corresponding to AHA LV segments and simulation outcome, “arrhythmia,” or “no-arrhythmia,” were used as ground truth for subsequent statistical prediction (machine learning, ML) models. For each patient geometric model, we measured and used choice data features: the myocardial volume and ischemic volume, as well as the segment-specific myocardial volume and ischemia percentage, as input to ML algorithms. For classical ML techniques (ML), we trained k-nearest neighbors, support vector machine, logistic regression, xgboost, and decision tree models to predict the simulation outcome from these geometric features alone. To explore neural network ML techniques, we trained both a three - and a four-hidden layer multilayer perceptron feed forward neural networks (NN), again predicting simulation outcomes from these geometric features alone. ML and NN models were trained on 70% of randomly selected segments and the remaining 30% was used for validation for both baseline and augmented populations.Results: Stimulation in the baseline population (30 patient models) resulted in reentry in 21.8% of sites tested; in the augmented population (129 total patient models) reentry occurred in 13.0% of sites tested. ML and NN models ranged in mean accuracy from 0.83 to 0.86 for the baseline population, improving to 0.88 to 0.89 in all cases.Conclusion: Machine learning techniques, combined with patient-specific, image-based computational simulations, can provide key clinical insights with high accuracy rapidly and efficiently. In the case of sparse or missing patient data, simulation-supported data augmentation can be employed to further improve predictive results for patient benefit. This work paves the way for using data-driven simulations for prediction of dangerous arrhythmia in MI patients.


2020 ◽  
Vol 17 (6) ◽  
pp. 2645-2652
Author(s):  
Sachin Dahiya ◽  
Tarun Gulati

Plant disease severely affects the crop production. Food security is always a challenge because the population of the world is increasing at a rapid rate. Diseases in plants can be controlled at the initial stage with the help of automatic system that can be able to detect the wide variety of diseases before its spreading to the whole cultivation area. With the development of various machine learning and deep learning algorithms it is now possible to design such an automatic system. Deep neural network like convolution neural network are able to detect the plant disease with high accuracy. In this paper we have discussed about the deep learning techniques, CNN and its parameters, data augmentation, transfer learning and various factor that affects the performance of DL model. Recent studies that apply the machine intelligence in plant leaf disease detection are also discussed.


2020 ◽  
Vol 10 (6) ◽  
pp. 1997
Author(s):  
Xin Shu ◽  
Chang Liu ◽  
Tong Li

As we all know, the output of the tactile sensing array on the gripper can be used to predict grasping stability. Some methods utilize traditional tactile features to make the decision and some advanced methods use machine learning or deep learning ways to build a prediction model. While these methods are all limited to the specific sensing array and have two common disadvantages. On the one hand, these models cannot perform well on different sensors. On the other hand, they do not have the ability of inferencing on multiple sensors in an end-to-end manner. Thus, we aim to find the internal relationships among different sensors and inference the grasping stability of multiple sensors in an end-to-end way. In this paper, we propose the MM-CNN (mask multi-head convolutional neural network), which can be utilized to predict the grasping stability on the output of multiple sensors with the weight sharing mechanism. We train this model and evaluate it on our own collected datasets. This model achieves 99.49% and 94.25% prediction accuracy on two different sensing arrays, separately. In addition, we show that our proposed structure is also available for other CNN backbones and can be easily integrated.


2021 ◽  
Vol 12 (2) ◽  
pp. 123
Author(s):  
A A JE Veggy Priyangka ◽  
I Made Surya Kumara

Indonesia is one of the countries with the population majority of farming. The agricultural sector in Indonesia is supported by fertile land and a tropical climate. Rice is one of the agricultural sectors in Indonesia. Rice production in Indonesia has decreased every year. Thus, rice production factors are very significant. Rice disease is one of the factors causing the decline in rice production in Indonesia. Technological developments have made it easier to recognize the types of rice plant diseases. Machine learning is one of the technologies used to identify types of rice diseases. The classification system of rice plant disease used the Convolutional Neural Network method. Convolutional Neural Network (CNN) is a machine learning method used in object recognition. This method applies to the VGG19 architecture, which has features to improve results. The image used as training and test data consists of 105 images, divided into training and test images. Parameter testing using epoch variations and data augmentation. The research results obtained a test accuracy of 95.24%.


Sign in / Sign up

Export Citation Format

Share Document