scholarly journals Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks

2018 ◽  
Vol 35 (14) ◽  
pp. 2403-2410 ◽  
Author(s):  
Jack Hanson ◽  
Kuldip Paliwal ◽  
Thomas Litfin ◽  
Yuedong Yang ◽  
Yaoqi Zhou

Abstract Motivation Sequence-based prediction of one dimensional structural properties of proteins has been a long-standing subproblem of protein structure prediction. Recently, prediction accuracy has been significantly improved due to the rapid expansion of protein sequence and structure libraries and advances in deep learning techniques, such as residual convolutional networks (ResNets) and Long-Short-Term Memory Cells in Bidirectional Recurrent Neural Networks (LSTM-BRNNs). Here we leverage an ensemble of LSTM-BRNN and ResNet models, together with predicted residue-residue contact maps, to continue the push towards the attainable limit of prediction for 3- and 8-state secondary structure, backbone angles (θ, τ, ϕ and ψ), half-sphere exposure, contact numbers and solvent accessible surface area (ASA). Results The new method, named SPOT-1D, achieves similar, high performance on a large validation set and test set (≈1000 proteins in each set), suggesting robust performance for unseen data. For the large test set, it achieves 87% and 77% in 3- and 8-state secondary structure prediction and 0.82 and 0.86 in correlation coefficients between predicted and measured ASA and contact numbers, respectively. Comparison to current state-of-the-art techniques reveals substantial improvement in secondary structure and backbone angle prediction. In particular, 44% of 40-residue fragment structures constructed from predicted backbone Cα-based θ and τ angles are less than 6 Å root-mean-squared-distance from their native conformations, nearly 20% better than the next best. The method is expected to be useful for advancing protein structure and function prediction. Availability and implementation SPOT-1D and its data is available at: http://sparks-lab.org/. Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Vol 36 (20) ◽  
pp. 5021-5026 ◽  
Author(s):  
Gang Xu ◽  
Qinghua Wang ◽  
Jianpeng Ma

Abstract Motivation Predictions of protein backbone torsion angles (ϕ and ψ) and secondary structure from sequence are crucial subproblems in protein structure prediction. With the development of deep learning approaches, their accuracies have been significantly improved. To capture the long-range interactions, most studies integrate bidirectional recurrent neural networks into their models. In this study, we introduce and modify a recently proposed architecture named Transformer to capture the interactions between the two residues theoretically with arbitrary distance. Moreover, we take advantage of multitask learning to improve the generalization of neural network by introducing related tasks into the training process. Similar to many previous studies, OPUS-TASS uses an ensemble of models and achieves better results. Results OPUS-TASS uses the same training and validation sets as SPOT-1D. We compare the performance of OPUS-TASS and SPOT-1D on TEST2016 (1213 proteins) and TEST2018 (250 proteins) proposed in the SPOT-1D paper, CASP12 (55 proteins), CASP13 (32 proteins) and CASP-FM (56 proteins) proposed in the SAINT paper, and a recently released PDB structure collection from CAMEO (93 proteins) named as CAMEO93. On these six test sets, OPUS-TASS achieves consistent improvements in both backbone torsion angles prediction and secondary structure prediction. On CAMEO93, SPOT-1D achieves the mean absolute errors of 16.89 and 23.02 for ϕ and ψ predictions, respectively, and the accuracies for 3- and 8-state secondary structure predictions are 87.72 and 77.15%, respectively. In comparison, OPUS-TASS achieves 16.56 and 22.56 for ϕ and ψ predictions, and 89.06 and 78.87% for 3- and 8-state secondary structure predictions, respectively. In particular, after using our torsion angles refinement method OPUS-Refine as the post-processing procedure for OPUS-TASS, the mean absolute errors for final ϕ and ψ predictions are further decreased to 16.28 and 21.98, respectively. Availability and implementation The training and the inference codes of OPUS-TASS and its data are available at https://github.com/thuxugang/opus_tass. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (7) ◽  
pp. 2119-2125 ◽  
Author(s):  
Zongyang Du ◽  
Shuo Pan ◽  
Qi Wu ◽  
Zhenling Peng ◽  
Jianyi Yang

Abstract Motivation Threading is one of the most effective methods for protein structure prediction. In recent years, the increasing accuracy in protein contact map prediction opens a new avenue to improve the performance of threading algorithms. Several preliminary studies suggest that with predicted contacts, the performance of threading algorithms can be improved greatly. There is still much room to explore to make better use of predicted contacts. Results We have developed a new contact-assisted threading algorithm named CATHER using both conventional sequential profiles and contact map predicted by a deep learning-based algorithm. Benchmark tests on an independent test set and the CASP12 targets demonstrated that CATHER made significant improvement over other methods which only use either sequential profile or predicted contact map. Our method was ranked at the Top 10 among all 39 participated server groups on the 32 free modeling targets in the blind tests of the CASP13 experiment. These data suggest that it is promising to push forward the threading algorithms by using predicted contacts. Availability and implementation http://yanglab.nankai.edu.cn/CATHER/. Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Vol 32 (6) ◽  
pp. 843-849 ◽  
Author(s):  
Rhys Heffernan ◽  
Abdollah Dehzangi ◽  
James Lyons ◽  
Kuldip Paliwal ◽  
Alok Sharma ◽  
...  

Abstract Motivation: Solvent exposure of amino acid residues of proteins plays an important role in understanding and predicting protein structure, function and interactions. Solvent exposure can be characterized by several measures including solvent accessible surface area (ASA), residue depth (RD) and contact numbers (CN). More recently, an orientation-dependent contact number called half-sphere exposure (HSE) was introduced by separating the contacts within upper and down half spheres defined according to the Cα-Cβ (HSEβ) vector or neighboring Cα-Cα vectors (HSEα). HSEα calculated from protein structures was found to better describe the solvent exposure over ASA, CN and RD in many applications. Thus, a sequence-based prediction is desirable, as most proteins do not have experimentally determined structures. To our best knowledge, there is no method to predict HSEα and only one method to predict HSEβ. Results: This study developed a novel method for predicting both HSEα and HSEβ (SPIDER-HSE) that achieved a consistent performance for 10-fold cross validation and two independent tests. The correlation coefficients between predicted and measured HSEβ (0.73 for upper sphere, 0.69 for down sphere and 0.76 for contact numbers) for the independent test set of 1199 proteins are significantly higher than existing methods. Moreover, predicted HSEα has a higher correlation coefficient (0.46) to the stability change by residue mutants than predicted HSEβ (0.37) and ASA (0.43). The results, together with its easy Cα-atom-based calculation, highlight the potential usefulness of predicted HSEα for protein structure prediction and refinement as well as function prediction. Availability and implementation: The method is available at http://sparks-lab.org. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i285-i291 ◽  
Author(s):  
Md Hossain Shuvo ◽  
Sutanu Bhattacharya ◽  
Debswapna Bhattacharya

Abstract Motivation Protein model quality estimation, in many ways, informs protein structure prediction. Despite their tight coupling, existing model quality estimation methods do not leverage inter-residue distance information or the latest technological breakthrough in deep learning that has recently revolutionized protein structure prediction. Results We present a new distance-based single-model quality estimation method called QDeep by harnessing the power of stacked deep residual neural networks (ResNets). Our method first employs stacked deep ResNets to perform residue-level ensemble error classifications at multiple predefined error thresholds, and then combines the predictions from the individual error classifiers for estimating the quality of a protein structural model. Experimental results show that our method consistently outperforms existing state-of-the-art methods including ProQ2, ProQ3, ProQ3D, ProQ4, 3DCNN, MESHI, and VoroMQA in multiple independent test datasets across a wide-range of accuracy measures; and that predicted distance information significantly contributes to the improved performance of QDeep. Availability and implementation https://github.com/Bhattacharya-Lab/QDeep. Supplementary information Supplementary data are available at Bioinformatics online.


2012 ◽  
Vol 28 (23) ◽  
pp. 3058-3065 ◽  
Author(s):  
Jana Sperschneider ◽  
Amitava Datta ◽  
Michael J. Wise

Abstract Motivation Laboratory RNA structure determination is demanding and costly and thus, computational structure prediction is an important task. Single sequence methods for RNA secondary structure prediction are limited by the accuracy of the underlying folding model, if a structure is supported by a family of evolutionarily related sequences, one can be more confident that the prediction is accurate. RNA pseudoknots are functional elements, which have highly conserved structures. However, few comparative structure prediction methods can handle pseudoknots due to the computational complexity. Results A comparative pseudoknot prediction method called DotKnot-PW is introduced based on structural comparison of secondary structure elements and H-type pseudoknot candidates. DotKnot-PW outperforms other methods from the literature on a hand-curated test set of RNA structures with experimental support. Availability DotKnot-PW and the RNA structure test set are available at the web site http://dotknot.csse.uwa.edu.au/pw. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Badri Adhikari ◽  
Jie Hou ◽  
Jianlin Cheng

AbstractMotivationSignificant improvements in the prediction of protein residue-residue contacts are observed in the recent years. These contacts, predicted using a variety of coevolution-based and machine learning methods, are the key contributors to the recent progress in ab initio protein structure prediction, as demonstrated in the recent CASP experiments. Continuing the development of new methods to reliably predict contact maps is essential to further improve ab initio structure prediction.ResultsIn this paper we discuss DNCON2, an improved protein contact map predictor based on two-level deep convolutional neural networks. It consists of six convolutional neural networks – the first five predict contacts at 6, 7.5, 8, 8.5, and 10 Å distance thresholds, and the last one uses these five predictions as additional features to predict final contact maps. On the free-modeling datasets in CASP10, 11, and 12 experiments, DNCON2 achieves mean precisions of 35%, 50%, and 53.4%, respectively, higher than 30.6% by MetaPSICOV on CASP10 dataset, 34% by MetaPSICOV on CASP11 dataset, and 46.3% by Raptor-X on CASP12 dataset, when top L/5 long-range contacts are evaluated. We attribute the improved performance of DNCON2 to the inclusion of short- and medium-range contacts into training, two-level approach to prediction, use of the state-of-the-art optimization and activation functions, and a novel deep learning architecture that allows each filter in a convolutional layer to access all the input features of a protein of arbitrary length.AvailabilityThe web server of DNCON2 is at http://sysbio.rnet.missouri.edu/dncon2/ where training and testing datasets as well as the predictions for CASP10, 11, and 12 free-modeling datasets can also be downloaded. Its source code is available at https://github.com/multicom-toolbox/DNCON2/[email protected] informationSupplementary data are available online.


ChemPhysChem ◽  
2014 ◽  
Vol 15 (15) ◽  
pp. 3378-3390 ◽  
Author(s):  
Falk Hoffmann ◽  
Ioan Vancea ◽  
Sanjay G. Kamat ◽  
Birgit Strodel

Sign in / Sign up

Export Citation Format

Share Document