A database assisted protein structure prediction method via a swarm intelligence algorithm

Proteins play a vital molecular role in all living organisms. Experimentally, it is difficult to predict the protein structure, however alternatively theoretical prediction method holds good for it. The 3D structure prediction of proteins is very much important in biology and this leads to the discovery of different useful drugs, enzymes, and currently this is considered as an important research domain. The prediction of proteins is related to identification of its tertiary structure. From the computational point of view, different models (protein representations) have been developed along with certain efficient optimization methods to predict the protein structure. The bio-inspired computation is used mostly for optimization process during solving protein structure. These algorithms now a days has received great interests and attention in the literature. This chapter aim basically for discussing the key features of recently developed five different types of bio-inspired computational algorithms, applied in protein structure prediction problems.

Download Full-text

Study of Real-Valued Distance Prediction For Protein Structure Prediction with Deep Learning

10.1101/2020.11.26.400523 ◽

2020 ◽

Author(s):

Jin Li ◽

Jinbo Xu

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

3D Structure ◽

Prediction Method ◽

Structure Modeling ◽

Contact Prediction ◽

Real Value ◽

3D Structure Modeling ◽

Distance Prediction

AbstractInter-residue distance prediction by deep ResNet (convolutional residual neural network) has greatly advanced protein structure prediction. Currently the most successful structure prediction methods predict distance by discretizing it into dozens of bins. Here we study how well real-valued distance can be predicted and how useful it is for 3D structure modeling by comparing it with discrete-valued prediction based upon the same deep ResNet. Different from the recent methods that predict only a single real value for the distance of an atom pair, we predict both the mean and standard deviation of a distance and then employ a novel method to fold a protein by the predicted mean and deviation. Our findings include: 1) tested on the CASP13 FM (free-modeling) targets, our real-valued distance prediction obtains 81% precision on top L/5 long-range contact prediction, much better than the best CASP13 results (70%); 2) our real-valued prediction can predict correct folds for the same number of CASP13 FM targets as the best CASP13 group, despite generating only 20 decoys for each target; 3) our method greatly outperforms a very new real-valued prediction method DeepDist in both contact prediction and 3D structure modeling; and 4) when the same deep ResNet is used, our real-valued distance prediction has 1-6% higher contact and distance accuracy than our own discrete-valued prediction, but less accurate 3D structure models.

Download Full-text

New 63 knot and other knots in human proteome from AlphaFold predictions

10.1101/2021.12.30.474018 ◽

2022 ◽

Author(s):

Agata Paulina Perlinska ◽

Wanda Helena Niemyska ◽

Bartosz Ambrozy Gren ◽

Pawel Rubach ◽

Joanna Ida Sulkowska

Keyword(s):

Machine Learning ◽

Protein Structure ◽

Structure Prediction ◽

Prediction Method ◽

Human Proteome ◽

Mathematical Notation ◽

Complex Type ◽

Human Proteins ◽

Knotted Proteins ◽

Structure Prediction Method

AlphaFold is a new, highly accurate machine learning protein structure prediction method that outperforms other methods. Recently this method was used to predict the structure of 98.5% of human proteins. We analyze here the structure of these AlphaFold-predicted human proteins for the presence of knots. We found that the human proteome contains 65 robustly knotted proteins, including the most complex type of a knot yet reported in proteins. That knot type, denoted 63 in mathematical notation, would necessitate a more complex folding path than any knotted proteins characterized to date. In some cases AlphaFold structure predictions are not highly accurate, which either makes their topology hard to verify or results in topological artifacts. Other structures that we found, which are knotted, potentially knotted, and structures with artifacts (knots) we deposited in a database available at: https://knotprot.cent.uw.edu.pl/alphafold.

Download Full-text

Improving deep learning-based protein distance prediction in CASP14

10.1101/2021.02.02.429462 ◽

2021 ◽

Author(s):

Zhiye Guo ◽

Tianqi Wu ◽

Jian Liu ◽

Jie Hou ◽

Jianlin Cheng

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Prediction Method ◽

Learning Method ◽

Sequence Alignments ◽

Evolutionary Features ◽

Protein Distance ◽

Distance Prediction

AbstractAccurate prediction of residue-residue distances is important for protein structure prediction. We developed several protein distance predictors based on a deep learning distance prediction method and blindly tested them in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The prediction method uses deep residual neural networks with the channel-wise attention mechanism to classify the distance between every two residues into multiple distance intervals. The input features for the deep learning method include co-evolutionary features as well as other sequence-based features derived from multiple sequence alignments (MSAs). Three alignment methods are used with multiple protein sequence/profile databases to generate MSAs for input feature generation. Based on different configurations and training strategies of the deep learning method, five MULTICOM distance predictors were created to participate in the CASP14 experiment. Benchmarked on 37 hard CASP14 domains, the best performing MULTICOM predictor is ranked 5th out of 30 automated CASP14 distance prediction servers in terms of precision of top L/5 long-range contact predictions (i.e. classifying distances between two residues into two categories: in contact (< 8 Angstrom) and not in contact otherwise) and performs better than the best CASP13 distance prediction method. The best performing MULTICOM predictor is also ranked 6th among automated server predictors in classifying inter-residue distances into 10 distance intervals defined by CASP14 according to the F1 measure. The results show that the quality and depth of MSAs depend on alignment methods and sequence databases and have a significant impact on the accuracy of distance prediction. Using larger training datasets and multiple complementary features improves prediction accuracy. However, the number of effective sequences in MSAs is only a weak indicator of the quality of MSAs and the accuracy of predicted distance maps. In contrast, there is a strong correlation between the accuracy of contact/distance predictions and the average probability of the predicted contacts, which can therefore be more effectively used to estimate the confidence of distance predictions and select predicted distance maps.

Download Full-text

CGLFold: a contact-assisted de novo protein structure prediction using global exploration and loop perturbation sampling algorithm

Bioinformatics ◽

10.1093/bioinformatics/btz943 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2443-2450 ◽

Cited By ~ 2

Author(s):

Jun Liu ◽

Xiao-Gen Zhou ◽

Yang Zhang ◽

Gui-Jun Zhang

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

De Novo ◽

Differential Evolution Algorithm ◽

Prediction Method ◽

Sampling Strategy ◽

Conformational Space ◽

Supplementary Information ◽

Local Perturbation

Abstract Motivation Regions that connect secondary structure elements in a protein are known as loops, whose slight change will produce dramatic effect on the entire topology. This study investigates whether the accuracy of protein structure prediction can be improved using a loop-specific sampling strategy. Results A novel de novo protein structure prediction method that combines global exploration and loop perturbation is proposed in this study. In the global exploration phase, the fragment recombination and assembly are used to explore the massive conformational space and generate native-like topology. In the loop perturbation phase, a loop-specific local perturbation model is designed to improve the accuracy of the conformation and is solved by differential evolution algorithm. These two phases enable a cooperation between global exploration and local exploitation. The filtered contact information is used to construct the conformation selection model for guiding the sampling. The proposed CGLFold is tested on 145 benchmark proteins, 14 free modeling (FM) targets of CASP13 and 29 FM targets of CASP12. The experimental results show that the loop-specific local perturbation can increase the structure diversity and success rate of conformational update and gradually improve conformation accuracy. CGLFold obtains template modeling score ≥ 0.5 models on 95 standard test proteins, 7 FM targets of CASP13 and 9 FM targets of CASP12. Availability and implementation The source code and executable versions are freely available at https://github.com/iobio-zjut/CGLFold. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Sequence Specific Dihedral Angle Distribution: Application in Protein Structure Prediction and Evaluation

Plant Tissue Culture and Biotechnology ◽

10.3329/ptcb.v19i2.5439 ◽

1970 ◽

Vol 19 (2) ◽

pp. 217-226

Author(s):

S. M. Minhaz Ud-Dean ◽

Mahdi Muhammad Moosa

Keyword(s):

Protein Structure ◽

Dihedral Angle ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Protein Structures ◽

Angle Distribution ◽

Ramachandran Plot ◽

Specific Data ◽

Specific Distribution ◽

Structure Evaluation

Protein structure prediction and evaluation is one of the major fields of computational biology. Estimation of dihedral angle can provide information about the acceptability of both theoretically predicted and experimentally determined structures. Here we report on the sequence specific dihedral angle distribution of high resolution protein structures available in PDB and have developed Sasichandran, a tool for sequence specific dihedral angle prediction and structure evaluation. This tool will allow evaluation of a protein structure in pdb format from the sequence specific distribution of Ramachandran angles. Additionally, it will allow retrieval of the most probable Ramachandran angles for a given sequence along with the sequence specific data. Key words: Torsion angle, φ-ψ distribution, sequence specific ramachandran plot, Ramasekharan, protein structure appraisal D.O.I. 10.3329/ptcb.v19i2.5439 Plant Tissue Cult. & Biotech. 19(2): 217-226, 2009 (December)

Download Full-text