COVARIATION ANALYSIS OF LOCAL AMINO ACID SEQUENCES IN RECURRENT PROTEIN LOCAL STRUCTURES

2005 ◽  
Vol 03 (06) ◽  
pp. 1391-1409 ◽  
Author(s):  
LU-YONG WANG

Local structural information is supposed to be frequently encoded in local amino acid sequences. Previous research only indicated that some local structure positions have specific residue preferences in some particular local structures. However, correlated pairwise replacements for interacting residues in recurrent local structural motifs from unrelated proteins have not been studied systematically. We introduced a new method fusing statistical covariation analysis and local structure-based alignment. Systematic analysis of structure-based multiple alignments of recurrent local structures from unrelated proteins in representative subset of Protein Databank indicates that covarying residue pairs with statistical significance exist in local structural motifs, in particular β-turns and helix caps. These residue pairs are mostly linked through polar functional groups with direct or indirect hydrogen bonding. Hydrophobic interaction is also a major factor in constraining pairwise amino acid residue replacement in recurrent local structures. We also found correlated residue pairs that are not clearly linked with through-space interactions. The physical constrains underlying these covariations are less clear. Overall, covarying residue pairs with statistical significance exist in local structures from unrelated proteins. The existence of sequence covariations in local structural motifs from unrelated proteins indicates that many relics of local relations are still retained in the tertiary structures after protein folding. It supports the notion that some local structural information is encoded in local sequences and the local structural codes could play important roles in determining native state protein folding topology.

2011 ◽  
Vol 09 (01) ◽  
pp. 1-13 ◽  
Author(s):  
JIANXIU GUO ◽  
NINI RAO

Predicting protein folding rate from amino acid sequence is an important challenge in computational and molecular biology. Over the past few years, many methods have been developed to reflect the correlation between the folding rates and protein structures and sequences. In this paper, we present an effective method, a combined neural network — genetic algorithm approach, to predict protein folding rates only from amino acid sequences, without any explicit structural information. The originality of this paper is that, for the first time, it tackles the effect of sequence order. The proposed method provides a good correlation between the predicted and experimental folding rates. The correlation coefficient is 0.80 and the standard error is 2.65 for 93 proteins, the largest such databases of proteins yet studied, when evaluated with leave-one-out jackknife test. The comparative results demonstrate that this correlation is better than most of other methods, and suggest the important contribution of sequence order information to the determination of protein folding rates.


2002 ◽  
Vol 184 (8) ◽  
pp. 2225-2234 ◽  
Author(s):  
Jason P. Folster ◽  
Terry D. Connell

ABSTRACT ChiA, an 88-kDa endochitinase encoded by the chiA gene of the gram-negative enteropathogen Vibrio cholerae, is secreted via the eps-encoded main terminal branch of the general secretory pathway (GSP), a mechanism which also transports cholera toxin. To localize the extracellular transport signal of ChiA that initiates transport of the protein through the GSP, a chimera comprised of ChiA fused at the N terminus with the maltose-binding protein (MalE) of Escherichia coli and fused at the C terminus with a 13-amino-acid epitope tag (E-tag) was expressed in strain 569B(chiA::Kanr), a chiA-deficient but secretion-competent mutant of V. cholerae. Fractionation studies revealed that blockage of the natural N terminus and C terminus of ChiA did not prevent secretion of the MalE-ChiA-E-tag chimera. To locate the amino acid sequences which encoded the transport signal, a series of truncations of ChiA were engineered. Secretion of the mutant polypeptides was curtailed only when ChiA was deleted from the N terminus beyond amino acid position 75 or from the C terminus beyond amino acid 555. A mutant ChiA comprised of only those amino acids was secreted by wild-type V. cholerae but not by an epsD mutant, establishing that amino acids 75 to 555 independently harbored sufficient structural information to promote secretion by the GSP of V. cholerae. Cys77 and Cys537, two cysteines located just within the termini of ChiA(75-555), were not required for secretion, indicating that those residues were not essential for maintaining the functional activity of the ChiA extracellular transport signal.


2011 ◽  
Vol 378-379 ◽  
pp. 157-160
Author(s):  
Jian Xiu Guo ◽  
Ni Ni Rao

Understanding the relationship between amino acid sequences and folding rates of proteins is an important challenge in computational and molecular biology. All existing algorithms for predicting protein folding rates have never taken into account the sequence coupling effects. In this work, a novel algorithm was developed for predicting the protein folding rates from amino acid sequences. The prediction was achieved on the basis of dipeptide composition, in which the sequence coupling effects are explicitly included through a series of conditional probability elements. Based on a non-redundant dataset of 99 proteins, the proposed method was found to provide an excellent agreement between the predicted and experimental folding rates of proteins when evaluated with the jackknife test. The correlation coefficient was 87.7% and the standard error was 2.04, which indicated the important contribution from sequence coupling effects to the determination of protein folding rates.


2021 ◽  
Vol 12 ◽  
Author(s):  
Ruifang Li ◽  
Hong Li ◽  
Xue Feng ◽  
Ruifeng Zhao ◽  
Yongxia Cheng

Many works have reported that protein folding rates are influenced by the characteristics of amino acid sequences and protein structures. However, few reports on the problem of whether the corresponding mRNA sequences are related to the protein folding rates can be found. An mRNA sequence is regarded as a kind of genetic language, and its vocabulary and phraseology must provide influential information regarding the protein folding rate. In the present work, linear regressions on the parameters of the vocabulary and phraseology of mRNA sequences and the corresponding protein folding rates were analyzed. The results indicated that D2 (the adjacent base-related information redundancy) values and the GC content values of the corresponding mRNA sequences exhibit significant negative relations with the protein folding rates, but D1 (the single base information redundancy) values exhibit significant positive relations with the protein folding rates. In addition, the results show that the relationships between the parameters of the genetic language and the corresponding protein folding rates are obviously different for different protein groups. Some useful parameters that are related to protein folding rates were found. The results indicate that when predicting protein folding rates, the information from protein structures and their amino acid sequences is insufficient, and some information for regulating the protein folding rates must be derived from the mRNA sequences.


1973 ◽  
pp. 275-291
Author(s):  
JONATHAN KING ◽  
MYEONG-HEE YU ◽  
JAVED SIDDIQI ◽  
CAMERON HAASE

2017 ◽  
Vol 15 (03) ◽  
pp. 1750009 ◽  
Author(s):  
Bruno Grisci ◽  
Márcio Dorn

The development of computational methods to accurately model three-dimensional protein structures from sequences of amino acid residues is becoming increasingly important to the structural biology field. This paper addresses the challenge of predicting the tertiary structure of a given amino acid sequence, which has been reported to belong to the NP-Complete class of problems. We present a new method, namely NEAT–FLEX, based on NeuroEvolution of Augmenting Topologies (NEAT) to extract structural features from (ABS) proteins that are determined experimentally. The proposed method manipulates structural information from the Protein Data Bank (PDB) and predicts the conformational flexibility (FLEX) of residues of a target amino acid sequence. This information may be used in three-dimensional structure prediction approaches as a way to reduce the conformational search space. The proposed method was tested with 24 different amino acid sequences. Evolving neural networks were compared against a traditional error back-propagation algorithm; results show that the proposed method is a powerful way to extract and represent structural information from protein molecules that are determined experimentally.


2021 ◽  
Vol 22 (4) ◽  
pp. 1955
Author(s):  
Aikaterini Kefala ◽  
Maria Amprazi ◽  
Efstratios Mylonas ◽  
Dina Kotsifaki ◽  
Mary Providaki ◽  
...  

Recurrent protein folding motifs include various types of helical bundles formed by α-helices that supercoil around each other. While specific patterns of amino acid residues (heptad repeats) characterize the highly versatile folding motif of four-α-helical bundles, the significance of the polypeptide chain directionality is not sufficiently understood, although it determines sequence patterns, helical dipoles, and other parameters for the folding and oligomerization processes of bundles. To investigate directionality aspects in sequence-structure relationships, we reversed the amino acid sequences of two well-characterized, highly regular four-α-helical bundle proteins and studied the folding, oligomerization, and structural properties of the retro-proteins, using Circular Dichroism Spectroscopy (CD), Size Exclusion Chromatography combined with Multi-Angle Laser Light Scattering (SEC-MALS), and Small Angle X-ray Scattering (SAXS). The comparison of the parent proteins with their retro-counterparts reveals that while the α-helical character of the parents is affected to varying degrees by sequence reversal, the folding states, oligomerization propensities, structural stabilities, and shapes of the new molecules strongly depend on the characteristics of the heptad repeat patterns. The highest similarities between parent and retro-proteins are associated with the presence of uninterrupted heptad patterns in helical bundles sequences.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Tomasz Woźniak ◽  
Małgorzata Sajek ◽  
Jadwiga Jaruzelska ◽  
Marcin Piotr Sajek

Abstract Background The functions of RNA molecules are mainly determined by their secondary structures. These functions can also be predicted using bioinformatic tools that enable the alignment of multiple RNAs to determine functional domains and/or classify RNA molecules into RNA families. However, the existing multiple RNA alignment tools, which use structural information, are slow in aligning long molecules and/or a large number of molecules. Therefore, a more rapid tool for multiple RNA alignment may improve the classification of known RNAs and help to reveal the functions of newly discovered RNAs. Results Here, we introduce an extremely fast Python-based tool called RNAlign2D. It converts RNA sequences to pseudo-amino acid sequences, which incorporate structural information, and uses a customizable scoring matrix to align these RNA molecules via the multiple protein sequence alignment tool MUSCLE. Conclusions RNAlign2D produces accurate RNA alignments in a very short time. The pseudo-amino acid substitution matrix approach utilized in RNAlign2D is applicable for virtually all protein aligners.


2020 ◽  
Vol 21 (S16) ◽  
Author(s):  
Guangjie Zhou ◽  
Jun Wang ◽  
Xiangliang Zhang ◽  
Maozu Guo ◽  
Guoxian Yu

Abstract Background Maize (Zea mays ssp. mays L.) is the most widely grown and yield crop in the world, as well as an important model organism for fundamental research of the function of genes. The functions of Maize proteins are annotated using the Gene Ontology (GO), which has more than 40000 terms and organizes GO terms in a direct acyclic graph (DAG). It is a huge challenge to accurately annotate relevant GO terms to a Maize protein from such a large number of candidate GO terms. Some deep learning models have been proposed to predict the protein function, but the effectiveness of these approaches is unsatisfactory. One major reason is that they inadequately utilize the GO hierarchy. Results To use the knowledge encoded in the GO hierarchy, we propose a deep Graph Convolutional Network (GCN) based model (DeepGOA) to predict GO annotations of proteins. DeepGOA firstly quantifies the correlations (or edges) between GO terms and updates the edge weights of the DAG by leveraging GO annotations and hierarchy, then learns the semantic representation and latent inter-relations of GO terms in the way by applying GCN on the updated DAG. Meanwhile, Convolutional Neural Network (CNN) is used to learn the feature representation of amino acid sequences with respect to the semantic representations. After that, DeepGOA computes the dot product of the two representations, which enable to train the whole network end-to-end coherently. Extensive experiments show that DeepGOA can effectively integrate GO structural information and amino acid information, and then annotates proteins accurately. Conclusions Experiments on Maize PH207 inbred line and Human protein sequence dataset show that DeepGOA outperforms the state-of-the-art deep learning based methods. The ablation study proves that GCN can employ the knowledge of GO and boost the performance. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=DeepGOA.


Sign in / Sign up

Export Citation Format

Share Document