PREDICTING PROTEIN FOLDING RATE FROM AMINO ACID SEQUENCE

2011 ◽  
Vol 09 (01) ◽  
pp. 1-13 ◽  
Author(s):  
JIANXIU GUO ◽  
NINI RAO

Predicting protein folding rate from amino acid sequence is an important challenge in computational and molecular biology. Over the past few years, many methods have been developed to reflect the correlation between the folding rates and protein structures and sequences. In this paper, we present an effective method, a combined neural network — genetic algorithm approach, to predict protein folding rates only from amino acid sequences, without any explicit structural information. The originality of this paper is that, for the first time, it tackles the effect of sequence order. The proposed method provides a good correlation between the predicted and experimental folding rates. The correlation coefficient is 0.80 and the standard error is 2.65 for 93 proteins, the largest such databases of proteins yet studied, when evaluated with leave-one-out jackknife test. The comparative results demonstrate that this correlation is better than most of other methods, and suggest the important contribution of sequence order information to the determination of protein folding rates.

2021 ◽  
Vol 12 ◽  
Author(s):  
Ruifang Li ◽  
Hong Li ◽  
Xue Feng ◽  
Ruifeng Zhao ◽  
Yongxia Cheng

Many works have reported that protein folding rates are influenced by the characteristics of amino acid sequences and protein structures. However, few reports on the problem of whether the corresponding mRNA sequences are related to the protein folding rates can be found. An mRNA sequence is regarded as a kind of genetic language, and its vocabulary and phraseology must provide influential information regarding the protein folding rate. In the present work, linear regressions on the parameters of the vocabulary and phraseology of mRNA sequences and the corresponding protein folding rates were analyzed. The results indicated that D2 (the adjacent base-related information redundancy) values and the GC content values of the corresponding mRNA sequences exhibit significant negative relations with the protein folding rates, but D1 (the single base information redundancy) values exhibit significant positive relations with the protein folding rates. In addition, the results show that the relationships between the parameters of the genetic language and the corresponding protein folding rates are obviously different for different protein groups. Some useful parameters that are related to protein folding rates were found. The results indicate that when predicting protein folding rates, the information from protein structures and their amino acid sequences is insufficient, and some information for regulating the protein folding rates must be derived from the mRNA sequences.


2011 ◽  
Vol 37 (12) ◽  
pp. 1331-1338 ◽  
Author(s):  
Jian-Xiu GUO ◽  
Ni-Ni RAO ◽  
Guang-Xiong LIU ◽  
Jie LI ◽  
Yun-He WANG

2011 ◽  
Vol 378-379 ◽  
pp. 157-160
Author(s):  
Jian Xiu Guo ◽  
Ni Ni Rao

Understanding the relationship between amino acid sequences and folding rates of proteins is an important challenge in computational and molecular biology. All existing algorithms for predicting protein folding rates have never taken into account the sequence coupling effects. In this work, a novel algorithm was developed for predicting the protein folding rates from amino acid sequences. The prediction was achieved on the basis of dipeptide composition, in which the sequence coupling effects are explicitly included through a series of conditional probability elements. Based on a non-redundant dataset of 99 proteins, the proposed method was found to provide an excellent agreement between the predicted and experimental folding rates of proteins when evaluated with the jackknife test. The correlation coefficient was 87.7% and the standard error was 2.04, which indicated the important contribution from sequence coupling effects to the determination of protein folding rates.


2017 ◽  
Vol 15 (03) ◽  
pp. 1750009 ◽  
Author(s):  
Bruno Grisci ◽  
Márcio Dorn

The development of computational methods to accurately model three-dimensional protein structures from sequences of amino acid residues is becoming increasingly important to the structural biology field. This paper addresses the challenge of predicting the tertiary structure of a given amino acid sequence, which has been reported to belong to the NP-Complete class of problems. We present a new method, namely NEAT–FLEX, based on NeuroEvolution of Augmenting Topologies (NEAT) to extract structural features from (ABS) proteins that are determined experimentally. The proposed method manipulates structural information from the Protein Data Bank (PDB) and predicts the conformational flexibility (FLEX) of residues of a target amino acid sequence. This information may be used in three-dimensional structure prediction approaches as a way to reduce the conformational search space. The proposed method was tested with 24 different amino acid sequences. Evolving neural networks were compared against a traditional error back-propagation algorithm; results show that the proposed method is a powerful way to extract and represent structural information from protein molecules that are determined experimentally.


2020 ◽  
Vol 27 (4) ◽  
pp. 321-328 ◽  
Author(s):  
Yanru Li ◽  
Ying Zhang ◽  
Jun Lv

Background: Protein folding rate is mainly determined by the size of the conformational space to search, which in turn is dictated by factors such as size, structure and amino-acid sequence in a protein. It is important to integrate these factors effectively to form a more precisely description of conformation space. But there is no general paradigm to answer this question except some intuitions and empirical rules. Therefore, at the present stage, predictions of the folding rate can be improved through finding new factors, and some insights are given to the above question. Objective: Its purpose is to propose a new parameter that can describe the size of the conformational space to improve the prediction accuracy of protein folding rate. Method: Based on the optimal set of amino acids in a protein, an effective cumulative backbone torsion angles (CBTAeff) was proposed to describe the size of the conformational space. Linear regression model was used to predict protein folding rate with CBTAeff as a parameter. The degree of correlation was described by the coefficient of determination and the mean absolute error MAE between the predicted folding rates and experimental observations. Results: It achieved a high correlation (with the coefficient of determination of 0.70 and MAE of 1.88) between the logarithm of folding rates and the (CBTAeff)0.5 with experimental over 112 twoand multi-state folding proteins. Conclusion: The remarkable performance of our simplistic model demonstrates that CBTA based on optimal set was the major determinants of the conformation space of natural proteins.


2020 ◽  
Vol 27 (4) ◽  
pp. 303-312 ◽  
Author(s):  
Ruifang Li ◽  
Hong Li ◽  
Sarula Yang ◽  
Xue Feng

Background: It is currently believed that protein folding rates are influenced by protein structure, environment and temperature, amino acid sequence and so on. We have been working for long to determine whether and in what ways mRNA affects the protein folding rate. A large number of palindromes aroused our attention in our previous research. Whether these palindromes do have important influences on protein folding rates and what’s the mechanism? Very few related studies are focused on these problems. Objective: In this article, our motivation is to find out if palindromes have important influences on protein folding rates and what’s the mechanism. Method: In this article, the parameters of the palindromes were defined and calculated, the linear regression analysis between the values of each parameter and the experimental protein folding rates were done. Furthermore, to compare the results of different kinds of proteins, proteins were classified into the two-state proteins and the multi-state proteins. For the two kinds of proteins, the above linear regression analysis were performed respectively. Results : Protein folding rates were negatively correlated to the palindrome frequencies for all proteins. An extremely significant negative linear correlation appeared in the relationship between palindrome densities and protein folding rates. And the repeatedly used bases by different palindromes simultaneously have an important effect on the relationship between palindrome density and protein folding rate. Conclusion: The palindromes have important influences on protein folding rates, and the repeatedly used bases in different palindromes simultaneously play a key role in influencing the protein folding rates.


2005 ◽  
Vol 03 (06) ◽  
pp. 1391-1409 ◽  
Author(s):  
LU-YONG WANG

Local structural information is supposed to be frequently encoded in local amino acid sequences. Previous research only indicated that some local structure positions have specific residue preferences in some particular local structures. However, correlated pairwise replacements for interacting residues in recurrent local structural motifs from unrelated proteins have not been studied systematically. We introduced a new method fusing statistical covariation analysis and local structure-based alignment. Systematic analysis of structure-based multiple alignments of recurrent local structures from unrelated proteins in representative subset of Protein Databank indicates that covarying residue pairs with statistical significance exist in local structural motifs, in particular β-turns and helix caps. These residue pairs are mostly linked through polar functional groups with direct or indirect hydrogen bonding. Hydrophobic interaction is also a major factor in constraining pairwise amino acid residue replacement in recurrent local structures. We also found correlated residue pairs that are not clearly linked with through-space interactions. The physical constrains underlying these covariations are less clear. Overall, covarying residue pairs with statistical significance exist in local structures from unrelated proteins. The existence of sequence covariations in local structural motifs from unrelated proteins indicates that many relics of local relations are still retained in the tertiary structures after protein folding. It supports the notion that some local structural information is encoded in local sequences and the local structural codes could play important roles in determining native state protein folding topology.


Sign in / Sign up

Export Citation Format

Share Document