scholarly journals Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13

2019 ◽  
Vol 87 (12) ◽  
pp. 1165-1178 ◽  
Author(s):  
Jie Hou ◽  
Tianqi Wu ◽  
Renzhi Cao ◽  
Jianlin Cheng
2019 ◽  
Author(s):  
Jie Hou ◽  
Tianqi Wu ◽  
Renzhi Cao ◽  
Jianlin Cheng

AbstractPrediction of residue-residue distance relationships (e.g. contacts) has become the key direction to advance protein tertiary structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, contact distance-driven template-free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction, in addition to an update of other components such as template library, sequence database, and alignment tools. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template-free and template-based protein structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue-residue features such as co-evolution scores to substantially improve inter-residue contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template-based modeling targets from scratch. Deep learning also successfully integrated 1D structural features, 2D contact information, and 3D structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system in the CASP13 experiment clearly shows that protein contact distance prediction and model selection driven by powerful deep learning holds the key of solving protein structure prediction problem. However, there are still major challenges in accurately predicting protein contact distance when there are few homologous sequences to generate co-evolutionary signals, folding proteins from noisy contact distances, and ranking models of hard targets.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Tianqi Wu ◽  
Zhiye Guo ◽  
Jie Hou ◽  
Jianlin Cheng

Abstract Background Driven by deep learning, inter-residue contact/distance prediction has been significantly improved and substantially enhanced ab initio protein structure prediction. Currently, most of the distance prediction methods classify inter-residue distances into multiple distance intervals instead of directly predicting real-value distances. The output of the former has to be converted into real-value distances to be used in tertiary structure prediction. Results To explore the potentials of predicting real-value inter-residue distances, we develop a multi-task deep learning distance predictor (DeepDist) based on new residual convolutional network architectures to simultaneously predict real-value inter-residue distances and classify them into multiple distance intervals. Tested on 43 CASP13 hard domains, DeepDist achieves comparable performance in real-value distance prediction and multi-class distance prediction. The average mean square error (MSE) of DeepDist’s real-value distance prediction is 0.896 Å2 when filtering out the predicted distance ≥ 16 Å, which is lower than 1.003 Å2 of DeepDist’s multi-class distance prediction. When distance predictions are converted into contact predictions at 8 Å threshold (the standard threshold in the field), the precision of top L/5 and L/2 contact predictions of DeepDist’s multi-class distance prediction is 79.3% and 66.1%, respectively, higher than 78.6% and 64.5% of its real-value distance prediction and the best results in the CASP13 experiment. Conclusions DeepDist can predict inter-residue distances well and improve binary contact prediction over the existing state-of-the-art methods. Moreover, the predicted real-value distances can be directly used to reconstruct protein tertiary structures better than multi-class distance predictions due to the lower MSE. Finally, we demonstrate that predicting the real-value distance map and multi-class distance map at the same time performs better than predicting real-value distances alone.


BMC Genomics ◽  
2020 ◽  
Vol 21 (S11) ◽  
Author(s):  
Haicang Zhang ◽  
Yufeng Shen

Abstract Background Accurate prediction of protein structure is fundamentally important to understand biological function of proteins. Template-based modeling, including protein threading and homology modeling, is a popular method for protein tertiary structure prediction. However, accurate template-query alignment and template selection are still very challenging, especially for the proteins with only distant homologs available. Results We propose a new template-based modelling method called ThreaderAI to improve protein tertiary structure prediction. ThreaderAI formulates the task of aligning query sequence with template as the classical pixel classification problem in computer vision and naturally applies deep residual neural network in prediction. ThreaderAI first employs deep learning to predict residue-residue aligning probability matrix by integrating sequence profile, predicted sequential structural features, and predicted residue-residue contacts, and then builds template-query alignment by applying a dynamic programming algorithm on the probability matrix. We evaluated our methods both in generating accurate template-query alignment and protein threading. Experimental results show that ThreaderAI outperforms currently popular template-based modelling methods HHpred, CNFpred, and the latest contact-assisted method CEthreader, especially on the proteins that do not have close homologs with known structures. In particular, in terms of alignment accuracy measured with TM-score, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 56, 13, and 11%, respectively, on template-query pairs at the similarity of fold level from SCOPe data. And on CASP13’s TBM-hard data, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 16, 9 and 8% in terms of TM-score, respectively. Conclusions These results demonstrate that with the help of deep learning, ThreaderAI can significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins.


2020 ◽  
Vol 118 (3) ◽  
pp. 292a
Author(s):  
Daisuke Kihara ◽  
Genki Terashi ◽  
Sai Raghavendra Maddhuri Venkata Subramaniya

2021 ◽  
Author(s):  
Tianqi Wu ◽  
Jian Liu ◽  
Zhiye Guo ◽  
Jie Hou ◽  
Jianlin Cheng

Abstract Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the field. In this paper, we present our latest open-source protein tertiary structure prediction system - MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. The template-based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template-free (ab initio or de novo) modeling uses the inter-residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. The success rate is increased to 3% for both regular and hard domains if five predictions are made per domain. Moreover, the prediction accuracy of the pure template-free structure modeling method on both TBM and FM targets is very close to the combination of template-based and template-free modeling methods. This demonstrates that the distance-based template-free modeling method powered by deep learning can largely replace the traditional template-based modeling method even on TBM targets that TBM methods used to dominate and therefore provides a uniform structure modeling approach to any protein. Finally, on the 38 CASP14 FM and FM/TBM hard domains, MULTICOM2 server predictors (MULTICOM-HYBRID, MULTICOM-DEEP, MULTICOM-DIST) were ranked among the top 20 automated server predictors in the CASP14 experiment. After combining multiple predictors from the same research group as one entry, MULTICOM-HYBRID was ranked no. 5. The source code of MULTICOM2 is freely available at https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.


2020 ◽  
Author(s):  
Haicang Zhang ◽  
Yufeng Shen

AbstractAccurate prediction of protein structure is fundamentally important to understand biological function of proteins. Template-based modeling, including protein threading and homology modeling, is a popular method for protein tertiary structure prediction. However, accurate template-query alignment and template selection are still very challenging, especially for the proteins with only distant homologs available. We propose a new template-based modelling method called ThreaderAI to improve protein tertiary structure prediction. ThreaderAI formulates the task of aligning query sequence with template as the classical pixel classification problem in computer vision and naturally applies deep residual neural network in prediction. ThreaderAI first employs deep learning to predict residue-residue aligning probability matrix by integrating sequence profile, predicted sequential structural features, and predicted residueresidue contacts, and then builds template-query alignment by applying a dynamic programming algorithm on the probability matrix. We evaluated our methods both in generating accurate template-query alignment and protein threading. Experimental results show that ThreaderAI outperforms currently popular template-based modelling methods HHpred, CNFpred, and the latest contact-assisted method CEthreader, especially on the proteins that do not have close homologs with known structures. In particular, in terms of alignment accuracy measured with TM-score, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 56%, 13%, and 11%, respectively, on template-query pairs at the similarity of fold level from SCOPe data. And on CASP13’s TBM-hard data, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 16%, 9% and 8% in terms of TM-score, respectively. These results demonstrate that with the help of deep learning, ThreaderAI can significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins.Availabilityhttps://github.com/ShenLab/ThreaderAI


Sign in / Sign up

Export Citation Format

Share Document