Identification of residue pairing in interacting β-strands from a predicted residue contact map

AbstractDespite the rapid progress of protein residue contact prediction, predicted residue contact maps frequently contain many errors. However, information of residue pairing in β strands could be extracted from a noisy contact map, due to the presence of characteristic contact patterns in β-β interactions. This information may benefit the tertiary structure prediction of mainly β proteins. In this work, we introduce a novel ridge-detection-based β-β contact predictor, RDb2C, to identify residue pairing in β strands from any predicted residue contact map. The algorithm adopts ridge detection, a well-developed technique in computer image processing, to capture consecutive residue contacts, and then utilizes a novel multi-stage random forest framework to integrate the ridge information and additional features for prediction. Starting from the predicted contact map of CCMpred, RDb2C remarkably outperforms all state-of-the-art methods on two conventional test sets of β proteins (BetaSheet916 and BetaSheet1452), and achieves F1-scores of ~62% and ~76% at the residue level and strand level, respectively. Taking the prediction of the more advanced RaptorX-Contact as input, RDb2C achieves impressively higher performance, with F1-scores reaching ~76% and ~86% at the residue level and strand level, respectively. According to our tests on 61 mainly β proteins, improvement in the β-β contact prediction can further ameliorate the structural prediction.Availability: All source data and codes are available at http://166.111.152.91/Downloads.html or at the GitHub address of https://github.com/wzmao/RDb2C.Author summaryDue to the topological complexity, mainly β proteins are challenging targets in protein structure prediction. Knowledge of the pairing between β strands, especially the residue pairing pattern, can greatly facilitate the tertiary structure prediction of mainly β proteins. In this work, we developed a novel algorithm to identify the residue pairing in β strands from a predicted residue contact map. This method adopts the ridge detection technique to capture the characteristic pattern of β-β interactions from the map and then utilizes a multi-stage random forest framework to predict β-β contacts at the residue level. According to our tests, our method could effectively improve the prediction of β-β contacts even from a highly noisy contact map. Moreover, the refined β-β contact information could effectively improve the structural modeling of mainly β proteins.

Download Full-text

Protein Contact Map Denoising Using Generative Adversarial Networks

10.1101/2020.06.26.174300 ◽

2020 ◽

Author(s):

Sai Raghavendra Maddhuri Venkata Subramaniya ◽

Genki Terashi ◽

Aashish Jain ◽

Yuki Kagaya ◽

Daisuke Kihara

Keyword(s):

Protein Structure ◽

Structure Prediction ◽

Tertiary Structure ◽

Substantial Improvement ◽

Generative Adversarial Networks ◽

Sequence Information ◽

Contact Map ◽

Denoising Method ◽

Contact Prediction ◽

Adversarial Networks

ABSTRACTProtein residue-residue contact prediction from protein sequence information has undergone substantial improvement in the past few years, which has made it a critical driving force for building correct protein tertiary structure models. Improving accuracy of contact predictions has, therefore, become the forefront of protein structure prediction. Here, we show a novel contact map denoising method, ContactGAN, which uses Generative Adversarial Networks (GAN) to refine predicted protein contact maps. ContactGAN was able to make a consistent and significant improvement over predictions made by recent contact prediction methods when tested on two datasets including protein structure modeling targets in CASP13. ContactGAN will be a valuable addition in the structure prediction pipeline to achieve an extra gain in contact prediction accuracy.

Download Full-text

A Review of Protein Inter-residue Distance Prediction

Current Bioinformatics ◽

10.2174/1574893615999200425230056 ◽

2021 ◽

Vol 15 (8) ◽

pp. 821-830

Author(s):

He Huang ◽

Xinqi Gong

Keyword(s):

Structure Prediction ◽

Tertiary Structure ◽

Contact Map ◽

Distance Information ◽

Distance Map ◽

Large Molecules ◽

Residue Contact ◽

Linear Sequence ◽

Residue Contacts ◽

Distance Prediction

Proteins are large molecules consisting of a linear sequence of amino acids. Protein performs biological functions with specific 3D structures. The main factors that drive proteins to form these structures are constraint between residues. These constraints usually lead to important inter-residue relationships, including short-range inter-residue contacts and long-range interresidue distances. Thus, a highly accurate prediction of inter-residue contact and distance information is of great significance for protein tertiary structure computations. Some methods have been proposed for inter-residue contact prediction, most of which focus on contact map prediction and some reviews have summarized the progresses. However, inter-residue distance prediction is found to provide better guidance for protein structure prediction than contact map prediction in recent years. The methods for inter-residue distance prediction can be roughly divided into two types according to the consideration of distance value: one is based on multi-classification with discrete value and the other is based on regression with continuous value. Here, we summarize these algorithms and show that they have obtained good results. Compared to contact map prediction, distance map prediction is in its infancy. There is a lot to do in the future including improving distance map prediction precision and incorporating them into residue-residue distanceguided ab initio protein folding.

Download Full-text

ISSEC: inferring contacts among protein secondary structure elements using deep object detection

BMC Bioinformatics ◽

10.1186/s12859-020-03793-y ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Qi Zhang ◽

Jianwei Zhu ◽

Fusong Ju ◽

Lupeng Kong ◽

Shiwei Sun ◽

...

Keyword(s):

Secondary Structure ◽

Object Detection ◽

Structure Prediction ◽

Tertiary Structure ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

Confidence Score ◽

Contact Map ◽

Residue Contact ◽

Residue Contacts

Abstract Background The formation of contacts among protein secondary structure elements (SSEs) is an important step in protein folding as it determines topology of protein tertiary structure; hence, inferring inter-SSE contacts is crucial to protein structure prediction. One of the existing strategies infers inter-SSE contacts directly from the predicted possibilities of inter-residue contacts without any preprocessing, and thus suffers from the excessive noises existing in the predicted inter-residue contacts. Another strategy defines SSEs based on protein secondary structure prediction first, and then judges whether each candidate SSE pair could form contact or not. However, it is difficult to accurately determine boundary of SSEs due to the errors in secondary structure prediction. The incorrectly-deduced SSEs definitely hinder subsequent prediction of the contacts among them. Results We here report an accurate approach to infer the inter-SSE contacts (thus called as ISSEC) using the deep object detection technique. The design of ISSEC is based on the observation that, in the inter-residue contact map, the contacting SSEs usually form rectangle regions with characteristic patterns. Therefore, ISSEC infers inter-SSE contacts through detecting such rectangle regions. Unlike the existing approach directly using the predicted probabilities of inter-residue contact, ISSEC applies the deep convolution technique to extract high-level features from the inter-residue contacts. More importantly, ISSEC does not rely on the pre-defined SSEs. Instead, ISSEC enumerates multiple candidate rectangle regions in the predicted inter-residue contact map, and for each region, ISSEC calculates a confidence score to measure whether it has characteristic patterns or not. ISSEC employs greedy strategy to select non-overlapping regions with high confidence score, and finally infers inter-SSE contacts according to these regions. Conclusions Comprehensive experimental results suggested that ISSEC outperformed the state-of-the-art approaches in predicting inter-SSE contacts. We further demonstrated the successful applications of ISSEC to improve prediction of both inter-residue contacts and tertiary structure as well.

Download Full-text

Protein contact map refinement for improving structure prediction using generative adversarial networks

Bioinformatics ◽

10.1093/bioinformatics/btab220 ◽

2021 ◽

Author(s):

Sai Raghavendra Maddhuri Venkata Subramaniya ◽

Genki Terashi ◽

Aashish Jain ◽

Yuki Kagaya ◽

Daisuke Kihara

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Tertiary Structure ◽

Substantial Improvement ◽

Supplementary Information ◽

Generative Adversarial Networks ◽

Contact Map ◽

Contact Prediction ◽

Adversarial Networks

Abstract Motivation Protein structure prediction remains as one of the most important problems in computational biology and biophysics. In the past few years, protein residue–residue contact prediction has undergone substantial improvement, which has made it a critical driving force for successful protein structure prediction. Boosting the accuracy of contact predictions has, therefore, become the forefront of protein structure prediction. Results We show a novel contact map refinement method, ContactGAN, which uses Generative Adversarial Networks (GAN). ContactGAN was able to make a significant improvement over predictions made by recent contact prediction methods when tested on three datasets including protein structure modeling targets in CASP13 and CASP14. We show improvement of precision in contact prediction, which translated into improvement in the accuracy of protein tertiary structure models. On the other hand, observed improvement over trRosetta was relatively small, reasons for which are discussed. ContactGAN will be a valuable addition in the structure prediction pipeline to achieve an extra gain in contact prediction accuracy. Availability and implementation https://github.com/kiharalab/ContactGAN. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Protein tertiary structure prediction and refinement using deep learning and Rosetta in CASP14

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.26194 ◽

2021 ◽

Author(s):

Ivan Anishchenko ◽

Minkyung Baek ◽

Hahnbeom Park ◽

Naozumi Hiranuma ◽

David E. Kim ◽

...

Keyword(s):

Deep Learning ◽

Structure Prediction ◽

Tertiary Structure ◽

Tertiary Structure Prediction ◽

Protein Tertiary Structure ◽

Protein Tertiary Structure Prediction

Download Full-text

Template-based prediction of protein structure with deep learning

BMC Genomics ◽

10.1186/s12864-020-07249-8 ◽

2020 ◽

Vol 21 (S11) ◽

Author(s):

Haicang Zhang ◽

Yufeng Shen

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Structure Prediction ◽

Tertiary Structure ◽

Query Sequence ◽

Dynamic Programming Algorithm ◽

Tertiary Structure Prediction ◽

Protein Tertiary Structure ◽

Protein Threading ◽

Protein Tertiary Structure Prediction

Abstract Background Accurate prediction of protein structure is fundamentally important to understand biological function of proteins. Template-based modeling, including protein threading and homology modeling, is a popular method for protein tertiary structure prediction. However, accurate template-query alignment and template selection are still very challenging, especially for the proteins with only distant homologs available. Results We propose a new template-based modelling method called ThreaderAI to improve protein tertiary structure prediction. ThreaderAI formulates the task of aligning query sequence with template as the classical pixel classification problem in computer vision and naturally applies deep residual neural network in prediction. ThreaderAI first employs deep learning to predict residue-residue aligning probability matrix by integrating sequence profile, predicted sequential structural features, and predicted residue-residue contacts, and then builds template-query alignment by applying a dynamic programming algorithm on the probability matrix. We evaluated our methods both in generating accurate template-query alignment and protein threading. Experimental results show that ThreaderAI outperforms currently popular template-based modelling methods HHpred, CNFpred, and the latest contact-assisted method CEthreader, especially on the proteins that do not have close homologs with known structures. In particular, in terms of alignment accuracy measured with TM-score, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 56, 13, and 11%, respectively, on template-query pairs at the similarity of fold level from SCOPe data. And on CASP13’s TBM-hard data, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 16, 9 and 8% in terms of TM-score, respectively. Conclusions These results demonstrate that with the help of deep learning, ThreaderAI can significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins.

Download Full-text