RECURSIVE PROTEIN MODELING: A DIVIDE AND CONQUER STRATEGY FOR PROTEIN STRUCTURE PREDICTION AND ITS CASE STUDY IN CASP9

After decades of research, protein structure prediction remains a very challenging problem. In order to address the different levels of complexity of structural modeling, two types of modeling techniques — template-based modeling and template-free modeling — have been developed. Template-based modeling can often generate a moderate- to high-resolution model when a similar, homologous template structure is found for a query protein but fails if no template or only incorrect templates are found. Template-free modeling, such as fragment-based assembly, may generate models of moderate resolution for small proteins of low topological complexity. Seldom have the two techniques been integrated together to improve protein modeling. Here we develop a recursive protein modeling approach to selectively and collaboratively apply template-based and template-free modeling methods to model template-covered (i.e. certain) and template-free (i.e. uncertain) regions of a protein. A preliminary implementation of the approach was tested on a number of hard modeling cases during the 9th Critical Assessment of Techniques for Protein Structure Prediction (CASP9) and successfully improved the quality of modeling in most of these cases. Recursive modeling can significantly reduce the complexity of protein structure modeling and integrate template-based and template-free modeling to improve the quality and efficiency of protein structure prediction.

Download Full-text

Recursive protein modeling: A divide and conquer strategy for protein structure prediction and its case study in CASP9

2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW) ◽

10.1109/bibmw.2011.6112398 ◽

2011 ◽

Cited By ~ 4

Author(s):

Jianlin Cheng ◽

Zheng Wang ◽

J. Eickholt ◽

Xin Deng

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Protein Modeling ◽

Divide And Conquer

Download Full-text

Deep template-based protein structure prediction

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008954 ◽

2021 ◽

Vol 17 (5) ◽

pp. e1008954

Author(s):

Fandi Wu ◽

Jinbo Xu

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Conditional Random Fields ◽

3D Models ◽

Query Protein ◽

Distance Information ◽

Alternating Direction ◽

Template Free

Motivation Protein structure prediction has been greatly improved by deep learning, but most efforts are devoted to template-free modeling. But very few deep learning methods are developed for TBM (template-based modeling), a popular technique for protein structure prediction. TBM has been studied extensively in the past, but its accuracy is not satisfactory when highly similar templates are not available. Results This paper presents a new method NDThreader (New Deep-learning Threader) to address the challenges of TBM. NDThreader first employs DRNF (deep convolutional residual neural fields), which is an integration of deep ResNet (convolutional residue neural networks) and CRF (conditional random fields), to align a query protein to templates without using any distance information. Then NDThreader uses ADMM (alternating direction method of multipliers) and DRNF to further improve sequence-template alignments by making use of predicted distance potential. Finally, NDThreader builds 3D models from a sequence-template alignment by feeding it and sequence coevolution information into a deep ResNet to predict inter-atom distance distribution, which is then fed into PyRosetta for 3D model construction. Our experimental results show that NDThreader greatly outperforms existing methods such as CNFpred, HHpred, DeepThreader and CEthreader. NDThreader was blindly tested in CASP14 as a part of RaptorX server, which obtained the best average GDT score among all CASP14 servers on the 58 TBM targets.

Download Full-text

Genetic Algorithm inAb Initio Protein Structure Prediction Using Low Resolution Model: A Review

Biomedical Data and Applications - Studies in Computational Intelligence ◽

10.1007/978-3-642-02193-0_14 ◽

2009 ◽

pp. 317-342 ◽

Cited By ~ 17

Author(s):

Md. Tamjidul Hoque ◽

Madhu Chetty ◽

Abdul Sattar

Keyword(s):

Genetic Algorithm ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Low Resolution ◽

Resolution Model

Download Full-text

A Multi-Objective Stochastic Optimization Approach for Decoy Generation in Template-Free Protein Structure Prediction

Biophysical Journal ◽

10.1016/j.bpj.2018.11.365 ◽

2019 ◽

Vol 116 (3) ◽

pp. 59a

Author(s):

Ahmed Bin Zaman ◽

Amarda Shehu

Keyword(s):

Protein Structure ◽

Stochastic Optimization ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Optimization Approach ◽

Free Protein ◽

Multi Objective ◽

Template Free

Download Full-text

3P-012 Toward the Success of Template-Free Protein Structure Prediction : Dependency of Performance on the Fragment Libraries(Protein:Structure,The 47th Annual Meeting of the Biophysical Society of Japan)

Seibutsu Butsuri ◽

10.2142/biophys.49.s152_5 ◽

2009 ◽

Vol 49 (supplement) ◽

pp. S152

Author(s):

Shintaro Minami ◽

George Chikenji

Keyword(s):

Protein Structure ◽

Annual Meeting ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Free Protein ◽

Biophysical Society ◽

Template Free ◽

Fragment Libraries

Download Full-text

Deep Template-based Protein Structure Prediction

10.1101/2020.12.26.424433 ◽

2020 ◽

Author(s):

Fandi Wu ◽

Jinbo Xu

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Random Fields ◽

Structure Prediction ◽

Conditional Random Fields ◽

3D Models ◽

Query Protein ◽

Supplementary Information ◽

Distance Information ◽

Alternating Direction

AbstractMotivationTBM (template-based modeling) is a popular method for protein structure prediction. When very good templates are not available, it is challenging to identify the best templates, build accurate sequence-template alignments and construct 3D models from alignments.ResultsThis paper presents a new method NDThreader (New Deep-learning Threader) to address the challenges of TBM. DNThreader first employs DRNF (deep convolutional residual neural fields), which is an integration of deep ResNet (convolutional residue neural networks) and CRF (conditional random fields), to align a query protein to templates without using any distance information. Then NDThreader uses ADMM (alternating direction method of multipliers) and DRNF to further improve sequence-template alignments by making use of predicted distance potential. Finally NDThreader builds 3D models from a sequence-template alignment by feeding it and sequence co-evolution information into a deep ResNet to predict inter-atom distance distribution, which is then fed into PyRosetta for 3D model construction. Our experimental results on the CASP13 and CAMEO data show that our methods outperform existing ones such as CNFpred, HHpred, DeepThreader and CEthreader. NDThreader was blindly tested in CASP14 as a part of RaptorX server, which obtained the best GDT score among all CASP14 servers on the 58 TBM targets.Availability and Implementationavailable as a part of web server at http://[email protected] InformationSupplementary data are available online.

Download Full-text

MULTICOM2: an open-source protein structure prediction system powered by deep learning and distance prediction

10.21203/rs.3.rs-339464/v1 ◽

2021 ◽

Author(s):

Tianqi Wu ◽

Jian Liu ◽

Zhiye Guo ◽

Jie Hou ◽

Jianlin Cheng

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Open Source ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Tertiary Structure ◽

Modeling Method ◽

Structure Modeling ◽

Prediction System ◽

Template Free

Abstract Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the field. In this paper, we present our latest open-source protein tertiary structure prediction system - MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. The template-based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template-free (ab initio or de novo) modeling uses the inter-residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. The success rate is increased to 3% for both regular and hard domains if five predictions are made per domain. Moreover, the prediction accuracy of the pure template-free structure modeling method on both TBM and FM targets is very close to the combination of template-based and template-free modeling methods. This demonstrates that the distance-based template-free modeling method powered by deep learning can largely replace the traditional template-based modeling method even on TBM targets that TBM methods used to dominate and therefore provides a uniform structure modeling approach to any protein. Finally, on the 38 CASP14 FM and FM/TBM hard domains, MULTICOM2 server predictors (MULTICOM-HYBRID, MULTICOM-DEEP, MULTICOM-DIST) were ranked among the top 20 automated server predictors in the CASP14 experiment. After combining multiple predictors from the same research group as one entry, MULTICOM-HYBRID was ranked no. 5. The source code of MULTICOM2 is freely available at https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.

Download Full-text

CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction

10.1101/2020.10.06.327585 ◽

2020 ◽

Author(s):

Fusong Ju ◽

Jianwei Zhu ◽

Bin Shao ◽

Lupeng Kong ◽

Tie-Yan Liu ◽

...

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Structure Prediction ◽

Tertiary Structure ◽

Query Protein ◽

Spatial Proximity ◽

Multiple Sequence ◽

Variance Matrix

Protein functions are largely determined by the final details of their tertiary structures, and the structures could be accurately reconstructed based on inter-residue distances. Residue co-evolution has become the primary principle for estimating inter-residue distances since the residues in close spatial proximity tend to co-evolve. The widely-used approaches infer residue co-evolution using an indirect strategy, i.e., they first extract from the multiple sequence alignment (MSA) of query protein some handcrafted features, say, co-variance matrix, and then infer residue co-evolution using these features rather than the raw information carried by MSA. This indirect strategy always leads to considerable information loss and inaccurate estimation of inter-residue distances. Here, we report a deep neural network framework (called CopulaNet) to learn residue co-evolution directly from MSA without any handcrafted features. The CopulaNet consists of two key elements: i) an encoder to model context-specific mutation for each residue, and ii) an aggregator to model correlations among residues and thereafter infer residue co-evolutions. Using the CASP13 (the 13th Critical Assessment of Protein Structure Prediction) target proteins as representatives, we demonstrated the successful application of CopulaNet for estimating inter-residue distances and further predicting protein tertiary structure with improved accuracy and efficiency. Head-to-head comparison suggested that for 24 out of the 31 free modeling CASP13 domains, ProFOLD outperformed AlphaFold, one of the state-of-the-art prediction approaches.

Download Full-text

Decoy selection for protein structure prediction via extreme gradient boosting and ranking

BMC Bioinformatics ◽

10.1186/s12859-020-3523-9 ◽

2020 ◽

Vol 21 (S1) ◽

Author(s):

Nasrin Akhter ◽

Gopinath Chennupati ◽

Hristo Djidjev ◽

Amarda Shehu

Keyword(s):

Machine Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Energy Landscape ◽

Biologically Active ◽

Test Cases ◽

Consensus Methods ◽

Extreme Gradient Boosting ◽

Template Free

Abstract Background Identifying one or more biologically-active/native decoys from millions of non-native decoys is one of the major challenges in computational structural biology. The extreme lack of balance in positive and negative samples (native and non-native decoys) in a decoy set makes the problem even more complicated. Consensus methods show varied success in handling the challenge of decoy selection despite some issues associated with clustering large decoy sets and decoy sets that do not show much structural similarity. Recent investigations into energy landscape-based decoy selection approaches show promises. However, lack of generalization over varied test cases remains a bottleneck for these methods. Results We propose a novel decoy selection method, ML-Select, a machine learning framework that exploits the energy landscape associated with the structure space probed through a template-free decoy generation. The proposed method outperforms both clustering and energy ranking-based methods, all the while consistently offering better performance on varied test-cases. Moreover, ML-Select shows promising results even for the decoy sets consisting of mostly low-quality decoys. Conclusions ML-Select is a useful method for decoy selection. This work suggests further research in finding more effective ways to adopt machine learning frameworks in achieving robust performance for decoy selection in template-free protein structure prediction.

Download Full-text