scholarly journals Predicting changes in protein thermodynamic stability upon point mutation with deep 3D convolutional neural networks

2020 ◽  
Author(s):  
Bian Li ◽  
Yucheng T. Yang ◽  
John A. Capra ◽  
Mark B. Gerstein

AbstractPredicting mutation-induced changes in protein thermodynamic stability (∆∆G) is of great interest in protein engineering, variant interpretation, and understanding protein biophysics. We introduce ThermoNet, a deep, 3D-convolutional neural network designed for structure-based prediction of ∆∆Gs upon point mutation. To leverage the image-processing power inherent in convolutional neural networks, we treat protein structures as if they were multi-channel 3D images. In particular, the inputs to ThermoNet are uniformly constructed as multi-channel voxel grids based on biophysical properties derived from raw atom coordinates. We train and evaluate ThermoNet with a curated data set that accounts for protein homology and is balanced with direct and reverse mutations; this provides a framework for addressing biases that have likely influenced many previous ∆∆G prediction methods. ThermoNet demonstrates performance comparable to the best available methods on the widely used Ssym test set. However, ThermoNet accurately predicts the effects of both stabilizing and destabilizing mutations, while most other methods exhibit a strong bias towards predicting destabilization. We further show that homology between Ssym and widely used training sets like S2648 and VariBench has likely led to overestimated performance in previous studies. Finally, we demonstrate the practical utility of ThermoNet in predicting the ∆∆Gs for two clinically relevant proteins, p53 and myoglobin, and for pathogenic and benign missense variants from ClinVar. Overall, our results suggest that 3D convolutional neural networks can model the complex, non-linear interactions perturbed by mutations, directly from biophysical properties of atoms.Author SummaryThe thermodynamic stability of a protein, usually represented as the Gibbs free energy for the biophysical process of protein folding (∆G), is a fundamental thermodynamic quantity. Predicting mutation-induced changes in protein thermodynamic stability (∆∆G) is of great interest in protein engineering, variant interpretation, and understanding protein biophysics. However, predicting ∆∆Gs in an accurate and unbiased manner has been a long-standing challenge in the field of computational biology. In this work, we introduce ThermoNet, a deep, 3D-convolutional neural network designed for structure-based ∆∆G prediction. To leverage the image-processing power inherent in convolutional neural networks, we treat protein structures as if they were multi-channel 3D images. ThermoNet demonstrates performance comparable to the best available methods. However, ThermoNet accurately predicts the effects of both stabilizing and destabilizing mutations, while most other methods exhibit a strong bias towards predicting destabilization. We also demonstrate that the presence of homologous proteins in commonly used training and testing sets for ∆∆G prediction methods has likely influenced previous performance estimates. Finally, we highlight the practical utility of ThermoNet by applying it to predicting the ∆∆Gs for two clinically relevant proteins, p53 and myoglobin, and for pathogenic and benign missense variants from ClinVar.

2020 ◽  
Vol 16 (11) ◽  
pp. e1008291
Author(s):  
Bian Li ◽  
Yucheng T. Yang ◽  
John A. Capra ◽  
Mark B. Gerstein

Predicting mutation-induced changes in protein thermodynamic stability (ΔΔG) is of great interest in protein engineering, variant interpretation, and protein biophysics. We introduce ThermoNet, a deep, 3D-convolutional neural network (3D-CNN) designed for structure-based prediction of ΔΔGs upon point mutation. To leverage the image-processing power inherent in CNNs, we treat protein structures as if they were multi-channel 3D images. In particular, the inputs to ThermoNet are uniformly constructed as multi-channel voxel grids based on biophysical properties derived from raw atom coordinates. We train and evaluate ThermoNet with a curated data set that accounts for protein homology and is balanced with direct and reverse mutations; this provides a framework for addressing biases that have likely influenced many previous ΔΔG prediction methods. ThermoNet demonstrates performance comparable to the best available methods on the widely used Ssym test set. In addition, ThermoNet accurately predicts the effects of both stabilizing and destabilizing mutations, while most other methods exhibit a strong bias towards predicting destabilization. We further show that homology between Ssym and widely used training sets like S2648 and VariBench has likely led to overestimated performance in previous studies. Finally, we demonstrate the practical utility of ThermoNet in predicting the ΔΔGs for two clinically relevant proteins, p53 and myoglobin, and for pathogenic and benign missense variants from ClinVar. Overall, our results suggest that 3D-CNNs can model the complex, non-linear interactions perturbed by mutations, directly from biophysical properties of atoms.


2019 ◽  
Vol 35 (17) ◽  
pp. 3208-3210 ◽  
Author(s):  
Yangzhen Wang ◽  
Feng Su ◽  
Shanshan Wang ◽  
Chaojuan Yang ◽  
Yonglu Tian ◽  
...  

Abstract Motivation Functional imaging at single-neuron resolution offers a highly efficient tool for studying the functional connectomics in the brain. However, mainstream neuron-detection methods focus on either the morphologies or activities of neurons, which may lead to the extraction of incomplete information and which may heavily rely on the experience of the experimenters. Results We developed a convolutional neural networks and fluctuation method-based toolbox (ImageCN) to increase the processing power of calcium imaging data. To evaluate the performance of ImageCN, nine different imaging datasets were recorded from awake mouse brains. ImageCN demonstrated superior neuron-detection performance when compared with other algorithms. Furthermore, ImageCN does not require sophisticated training for users. Availability and implementation ImageCN is implemented in MATLAB. The source code and documentation are available at https://github.com/ZhangChenLab/ImageCN. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Rishal Aggarwal ◽  
Akash Gupta ◽  
Vineeth Chelur ◽  
C. V. Jawahar ◽  
U. Deva Priyakumar

<div> A structure-based drug design pipeline involves the development of potential drug molecules or ligands that form stable complexes with a given receptor at its binding site. A prerequisite to this is finding druggable and functionally relevant binding sites on the 3D structure of the protein. Although several methods for detecting binding sites have been developed beforehand, a majority of them surprisingly fail in the identification and ranking of binding sites accurately. The rapid adoption and success of deep learning algorithms in various sections of structural biology beckons the usage of such algorithms for accurate binding site detection. As a combination of geometry based software and deep learning, we report a novel framework, DeepPocket that utilises 3D convolutional neural networks for the rescoring of pockets identified by Fpocket and further segments these identified cavities on the protein surface. Apart from this, we also propose another dataset SC6K containing protein structures submitted in the Protein Data Bank (PDB) from January 2018 till February 2020 for ligand binding site (LBS) detection. DeepPocket's results on various binding site datasets and SC6K highlights its better performance over current state-of-the-art methods and good generalization ability over novel structures. </div><div><br></div>


Author(s):  
Sajid Nazir ◽  
Shushma Patel ◽  
Dilip Patel

The increased processing power of graphical processing units (GPUs) and the availability of large image datasets has fostered a renewed interest in extracting semantic information from images. Promising results for complex image categorization problems have been achieved using deep learning, with neural networks comprised of many layers. Convolutional neural networks (CNN) are one such architecture which provides more opportunities for image classification. Advances in CNN enable the development of training models using large labelled image datasets, but the hyper parameters need to be specified, which is challenging and complex due to the large number of parameters. A substantial amount of computational power and processing time is required to determine the optimal hyper parameters to define a model yielding good results. This article provides a survey of the hyper parameter search and optimization methods for CNN architectures.


2021 ◽  
Author(s):  
Anastasiya V Kulikova ◽  
Daniel J Diaz ◽  
James M Loy ◽  
Andrew D Ellington ◽  
Claus O Wilke

The fundamental problem of protein biochemistry is to predict protein structure from amino acid sequence. The inverse problem, predicting either entire sequences or individual mutations that are consistent with a given protein structure, has received much less attention even though it has important applications in both protein engineering and evolutionary biology. Here, we ask whether 3D convolutional neural networks (3D CNNs) can learn the local fitness landscape of protein structure to reliably predict either the wild-type amino acid or the consensus in a multiple sequence alignment from the local structural context surrounding a site of interest. We find that the network can predict wild type with good accuracy, and that network confidence is a reliable measure of whether a given prediction is likely going to be correct or not. Predictions of consensus are less accurate, and are primarily driven by whether or not the consensus matches the wild type. Our work suggests that high-confidence mis-predictions of the wild type may identify sites that are primed for mutation and likely targets for protein engineering.


Symmetry ◽  
2020 ◽  
Vol 12 (3) ◽  
pp. 416
Author(s):  
Omar Bilalovic ◽  
Zikrija Avdagic ◽  
Samir Omanovic ◽  
Ingmar Besic ◽  
Vedad Letic ◽  
...  

Mathematical modelling to compute ground truth from 3D images is an area of research that can strongly benefit from machine learning methods. Deep neural networks (DNNs) are state-of-the-art methods design for solving these kinds of difficulties. Convolutional neural networks (CNNs), as one class of DNNs, can overcome special requirements of quantitative analysis especially when image segmentation is needed. This article presents a system that uses a cascade of CNNs with symmetric blocks of layers in chain, dedicated to 3D image segmentation from microscopic images of 3D nuclei. The system is designed through eight experiments that differ in following aspects: number of training slices and 3D samples for training, usage of pre-trained CNNs and number of slices and 3D samples for validation. CNNs parameters are optimized using linear, brute force, and random combinatorics, followed by voter and median operations. Data augmentation techniques such as reflection, translation and rotation are used in order to produce sufficient training set for CNNs. Optimal CNN parameters are reached by defining 11 standard and two proposed metrics. Finally, benchmarking demonstrates that CNNs improve segmentation accuracy, reliability and increased annotation accuracy, confirming the relevance of CNNs to generate high-throughput mathematical ground truth 3D images.


2019 ◽  
Author(s):  
Amelia J. Solon ◽  
Vernon J. Lawhern ◽  
Jonathan Touryan ◽  
Jonathan R. McDaniel ◽  
Anthony J. Ries ◽  
...  

AbstractDeep convolutional neural networks (CNN) have previously been shown to be useful tools for signal decoding and analysis in a variety of complex domains, such as image processing and speech recognition. By learning from large amounts of data, the representations encoded by these deep networks are often invariant to moderate changes in the underlying feature spaces. Recently, we proposed a CNN architecture that could be applied to electroencephalogram (EEG) decoding and analysis. In this article, we train our CNN model using data from prior experiments in order to later decode the P300 evoked response from an unseen, hold-out experiment. We analyze the CNN output as a function of the underlying variability in the P300 response and demonstrate that the CNN output is sensitive to the experiment-induced changes in the neural response. We then assess the utility of our approach as a means of improving the overall signal-to-noise ratio in the EEG record. Finally, we show an example of how CNN-based decoding can be applied to the analysis of complex data.


2021 ◽  
Author(s):  
Rishal Aggarwal ◽  
Akash Gupta ◽  
Vineeth Chelur ◽  
C. V. Jawahar ◽  
U. Deva Priyakumar

<div> A structure-based drug design pipeline involves the development of potential drug molecules or ligands that form stable complexes with a given receptor at its binding site. A prerequisite to this is finding druggable and functionally relevant binding sites on the 3D structure of the protein. Although several methods for detecting binding sites have been developed beforehand, a majority of them surprisingly fail in the identification and ranking of binding sites accurately. The rapid adoption and success of deep learning algorithms in various sections of structural biology beckons the usage of such algorithms for accurate binding site detection. As a combination of geometry based software and deep learning, we report a novel framework, DeepPocket that utilises 3D convolutional neural networks for the rescoring of pockets identified by Fpocket and further segments these identified cavities on the protein surface. Apart from this, we also propose another dataset SC6K containing protein structures submitted in the Protein Data Bank (PDB) from January 2018 till February 2020 for ligand binding site (LBS) detection. DeepPocket's results on various binding site datasets and SC6K highlights its better performance over current state-of-the-art methods and good generalization ability over novel structures. </div><div><br></div>


2017 ◽  
Author(s):  
◽  
Son Phong Nguyen

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] Computational protein structure prediction is very important for many applications in bioinformatics. Many prediction methods have been developed, including Modeller, HHpred, I-TASSER, Robetta, and MUFOLD. In the process of predicting protein structures, it is essential to accurately assess the quality of generated models. Consensus quality assessment (QA) methods, such as Pcons-net and MULTICOM-refine, which are based on structure similarity, performed well on QA tasks. The drawback of consensus QA methods is that they require a pool of diverse models to work well, which is not always available. More importantly, they cannot evaluate the quality of a single protein model, which is a very common task in protein predictions and other applications. Although many single-model quality assessment methods, such as ProQ2, MQAPmulti, OPUS-CA, DOPE, DFIRE, and RW, etc. have been developed to address that problem, their accuracy is not good enough for most real applications. In this dissertation, based on the idea of using C-[alpha] atoms distance matrix and deep learning methods, two methods have been proposed for assessing quality of protein structures. First, a novel algorithm based on deep learning techniques, called DL-Pro, is proposed. From training examples of distance matrices corresponding to good and bad models, DL-Pro learns a stacked autoencoder network as a classifier. In experiments on selected targets from the Critical Assessment of Structure Prediction (CASP) competition, DL-Pro obtained promising results, outperforming state-of-the-art energy/scoring functions, including OPUS-CA, DOPE, DFIRE, and RW. Second, a new method DeepCon-QA is developed to predict quality of single protein model. Based on the idea of using protein vector representation and distance matrix, DeepCon-QA was able to achieve comparable performance with the best state-of-the-art QA method in our experiments. It also takes advantage the strength of deep convolutional neural networks to “learn” and “understand” the input data to be able to predict output data precisely. On the other hand, this dissertation also proposes several new methods for solving loop modeling problem. Five new loop modeling methods based on machine learning techniques, called NearLooper, ConLooper, ResLooper, HyLooper1 and HyLooper2 are proposed. NearLooper is based on the nearest neighbor technique; ConLooper applies deep convolutional neural networks to predict Cα atoms distance matrix as an orientation-independent representation of protein structure; ResLooper uses residual neural networks instead of deep convolutional neural networks; HyLooper1 combines the results of NearLooper and ConLooper while HyLooper2 combines NearLooper and ResLooper. Three commonly used benchmarks for loop modeling are used to compare the performance between these methods and existing state-of-the-art methods. The experiment results show promising performance in which our best method improves existing state-of-the-art methods by 28% and 54% of average RMSD on two datasets while being comparable on the other one.


Sign in / Sign up

Export Citation Format

Share Document