scholarly journals DeepDist: real-value inter-residue distance prediction with deep residual convolutional network

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Tianqi Wu ◽  
Zhiye Guo ◽  
Jie Hou ◽  
Jianlin Cheng

Abstract Background Driven by deep learning, inter-residue contact/distance prediction has been significantly improved and substantially enhanced ab initio protein structure prediction. Currently, most of the distance prediction methods classify inter-residue distances into multiple distance intervals instead of directly predicting real-value distances. The output of the former has to be converted into real-value distances to be used in tertiary structure prediction. Results To explore the potentials of predicting real-value inter-residue distances, we develop a multi-task deep learning distance predictor (DeepDist) based on new residual convolutional network architectures to simultaneously predict real-value inter-residue distances and classify them into multiple distance intervals. Tested on 43 CASP13 hard domains, DeepDist achieves comparable performance in real-value distance prediction and multi-class distance prediction. The average mean square error (MSE) of DeepDist’s real-value distance prediction is 0.896 Å2 when filtering out the predicted distance ≥ 16 Å, which is lower than 1.003 Å2 of DeepDist’s multi-class distance prediction. When distance predictions are converted into contact predictions at 8 Å threshold (the standard threshold in the field), the precision of top L/5 and L/2 contact predictions of DeepDist’s multi-class distance prediction is 79.3% and 66.1%, respectively, higher than 78.6% and 64.5% of its real-value distance prediction and the best results in the CASP13 experiment. Conclusions DeepDist can predict inter-residue distances well and improve binary contact prediction over the existing state-of-the-art methods. Moreover, the predicted real-value distances can be directly used to reconstruct protein tertiary structures better than multi-class distance predictions due to the lower MSE. Finally, we demonstrate that predicting the real-value distance map and multi-class distance map at the same time performs better than predicting real-value distances alone.

Author(s):  
Tianqi Wu ◽  
Zhiye Guo ◽  
Jie Hou ◽  
Jianlin Cheng

AbstractMotivationDriven by deep learning techniques, inter-residue contact/distance prediction has been significantly improved and substantially enhanced ab initio protein structure prediction. Currently all the distance prediction methods classify inter-residue distances into multiple distance intervals (i.e. a multi-classification problem) instead of directly predicting real-value distances (i.e. a regression problem). The output of the former has to be converted into real-value distances in order to be used in tertiary structure prediction.ResultsTo explore the potentials of predicting real-value inter-residue distances, we develop a multi-task deep learning distance predictor (DeepDist) based on new residual convolutional network architectures to simultaneously predict real-value inter-residue distances and classify them into multiple distance intervals. We demonstrate that predicting the real-value distance map and multi-class distance map at the same time performs better than predicting real-value distances alone, indicating their complementarity. On 43 CASP13 hard domains, the average mean square error (MSE) of DeepDist’s real-value distance predictions is 0.896 Å when filtering out the predicted distance >=16 Å, which is lower than 1.003 Å of DeepDist’s multi-class distance predictions. When the predicted real-value distances are converted to binary contact predictions at 8Å threshold, the precisions of top L/5 and L/2 contact predictions are 78.6% and 64.5%, respectively, higher than the best results reported in the CASP13 experiment. These results demonstrate that the real-value distance prediction can predict inter-residue distances well and improve binary contact prediction over the existing state-of-the-art methods. Moreover, the predicted real-value distances can be directly used to reconstruct protein tertiary structures better than multi-class distance predictions due to the lower MSE.


2019 ◽  
Author(s):  
Jie Hou ◽  
Tianqi Wu ◽  
Renzhi Cao ◽  
Jianlin Cheng

AbstractPrediction of residue-residue distance relationships (e.g. contacts) has become the key direction to advance protein tertiary structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, contact distance-driven template-free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction, in addition to an update of other components such as template library, sequence database, and alignment tools. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template-free and template-based protein structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue-residue features such as co-evolution scores to substantially improve inter-residue contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template-based modeling targets from scratch. Deep learning also successfully integrated 1D structural features, 2D contact information, and 3D structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system in the CASP13 experiment clearly shows that protein contact distance prediction and model selection driven by powerful deep learning holds the key of solving protein structure prediction problem. However, there are still major challenges in accurately predicting protein contact distance when there are few homologous sequences to generate co-evolutionary signals, folding proteins from noisy contact distances, and ranking models of hard targets.


2021 ◽  
Vol 15 (8) ◽  
pp. 821-830
Author(s):  
He Huang ◽  
Xinqi Gong

Proteins are large molecules consisting of a linear sequence of amino acids. Protein performs biological functions with specific 3D structures. The main factors that drive proteins to form these structures are constraint between residues. These constraints usually lead to important inter-residue relationships, including short-range inter-residue contacts and long-range interresidue distances. Thus, a highly accurate prediction of inter-residue contact and distance information is of great significance for protein tertiary structure computations. Some methods have been proposed for inter-residue contact prediction, most of which focus on contact map prediction and some reviews have summarized the progresses. However, inter-residue distance prediction is found to provide better guidance for protein structure prediction than contact map prediction in recent years. The methods for inter-residue distance prediction can be roughly divided into two types according to the consideration of distance value: one is based on multi-classification with discrete value and the other is based on regression with continuous value. Here, we summarize these algorithms and show that they have obtained good results. Compared to contact map prediction, distance map prediction is in its infancy. There is a lot to do in the future including improving distance map prediction precision and incorporating them into residue-residue distanceguided ab initio protein folding.


2021 ◽  
Author(s):  
Xiao Chen ◽  
Jian Liu ◽  
Zhiye Guo ◽  
Tianqi Wu ◽  
Jie Hou ◽  
...  

AbstractThe inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (EMA) in the 13th Critical Assessment of Protein Structure Prediction (CASP13). During the 2020 CASP14 experiment, we developed and tested several EMA predictors that used deep learning with the new features based on inter-residue distance/contact predictions as well as the existing model quality features. The average global distance test (GDT-TS) score loss of ranking CASP14 structural models by three multi-model MULTICOM EMA predictors (MULTICOM-CONSTRUCT, MULTICOM-AI, and MULTICOM-CLUSTER) is 0.073, 0.079, and 0.081, respectively, which are ranked first, second, and third places out of 68 CASP14 EMA predictors. The single-model EMA predictor (MULTICOM-DEEP) is ranked 10th place among all the single-model EMA methods in terms of GDT_TS score loss. The results show that deep learning and contact/distance predictions are useful in ranking and selecting protein structural models.


2021 ◽  
Author(s):  
Jian Liu ◽  
Tianqi Wu ◽  
Zhiye Guo ◽  
Jie Hou ◽  
Jianlin Cheng

Substantial progresses in protein structure prediction have been made by utilizing deep-learning and residue-residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system in the three main aspects: (1) a new deep-learning based protein inter-residue distance predictor (DeepDist) to improve template-free (ab initio) tertiary structure prediction, (2) an enhanced template-based tertiary structure prediction method, and (3) distance-based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked 7th out of 146 predictors in protein tertiary structure prediction and ranked 3rd out of 136 predictors in inter-domain structure prediction. The results of MULTICOM demonstrate that the template-free modeling based on deep learning and residue-residue distance prediction can predict the correct topology for almost all template-based modeling targets and a majority of hard targets (template-free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. The performance of template-free tertiary structure prediction largely depends on the accuracy of distance predictions that is closely related to the quality of multiple sequence alignments. The structural model quality assessment works reasonably well on targets for which a sufficient number of good models can be predicted, but may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed.


Author(s):  
Jian Liu ◽  
Tianqi Wu ◽  
Zhiye Guo ◽  
Jie Hou ◽  
Jianlin Cheng

Substantial progresses in protein structure prediction have been made by utilizing deep-learning and residue-residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system in three main aspects: (1) a new deep learning based protein inter-residue distance predictor (DeepDist) to improve template-free (ab initio) tertiary structure prediction, (2) an enhanced template-based tertiary structure prediction method, and (3) distance-based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked 7th out of 146 predictors in protein tertiary structure prediction and ranked 3rd out of 136 predictors in inter-domain structure predic-tion. The results of MULTICOM demonstrate that the template-free modeling based on deep learning and residue-residue distance prediction can predict the correct topology for almost all template-based modeling targets and a majority of hard targets (template-free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. The performance of template-free tertiary structure prediction largely depends on the accuracy of distance pre-dictions that is closely related to the quality of multiple sequence alignments. The structural model quality assessment works reasonably well on targets for which a sufficient number of good models can be predicted, but may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed.


Sign in / Sign up

Export Citation Format

Share Document