Assessing the Impact of Secondary Structure and Solvent Accessibility on Protein Evolution

Genetics ◽  
1998 ◽  
Vol 149 (1) ◽  
pp. 445-458 ◽  
Author(s):  
Nick Goldman ◽  
Jeffrey L Thorne ◽  
David T Jones

Abstract Empirically derived models of amino acid replacement are employed to study the association between various physical features of proteins and evolution. The strengths of these associations are statistically evaluated by applying the models of protein evolution to 11 diverse sets of protein sequences. Parametric bootstrap tests indicate that the solvent accessibility status of a site has a particularly strong association with the process of amino acid replacement that it experiences. Significant association between secondary structure environment and the amino acid replacement process is also observed. Careful description of the length distribution of secondary structure elements and of the organization of secondary structure and solvent accessibility along a protein did not always significantly improve the fit of the evolutionary models to the data sets that were analyzed. As indicated by the strength of the association of both solvent accessibility and secondary structure with amino acid replacement, the process of protein evolution—both above and below the species level—will not be well understood until the physical constraints that affect protein evolution are identified and characterized.

2004 ◽  
Vol 02 (02) ◽  
pp. 333-342 ◽  
Author(s):  
WEI-MOU ZHENG

Simple hidden Markov models are proposed for predicting secondary structure of a protein from its amino acid sequence. Since the length of protein conformation segments varies in a narrow range, we ignore the duration effect of length distribution, and focus on inclusion of short range correlations of residues and of conformation states in the models. Conformation-independent and -dependent amino acid coarse-graining schemes are designed for the models by means of proper mutual information. We compare models of different level of complexity, and establish a practical model with a high prediction accuracy.


Biology ◽  
2020 ◽  
Vol 9 (4) ◽  
pp. 64 ◽  
Author(s):  
Akanksha Pandey ◽  
Edward L. Braun

Phylogenomics, the use of large datasets to examine phylogeny, has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life; this could reflect, at least in part, the poor-fit of the models used to analyze heterogeneous datasets. Some of the heterogeneity may reflect the different patterns of selection on proteins based on their structures. To test that hypothesis, we developed a pipeline to divide phylogenomic protein datasets into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had distinct signals for the topology of the deepest branches in the metazoan tree. We focused on a dataset that appeared to have a mixture of signals and we found that the most striking difference in phylogenetic signal reflected relative solvent accessibility. Analyses of exposed sites (residues located on the surface of proteins) yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge+ctenophore clade. These differences in phylogenetic signal were not ameliorated when we conducted analyses using a set of maximum-likelihood profile mixture models. These models are very similar to the Bayesian CAT model, which has been used in many analyses of deep metazoan phylogeny. In contrast, analyses conducted after recoding amino acids to limit the impact of deviations from compositional stationarity increased the congruence in the estimates of phylogeny for exposed and buried sites; after recoding amino acid trees estimated using the exposed and buried site both supported placement of ctenophores sister to all other animals. Although the central conclusion of our analyses is that sites in different structural environments yield distinct trees when analyzed using models of protein evolution, our amino acid recoding analyses also have implications for metazoan evolution. Specifically, our results add to the evidence that ctenophores are the sister group of all other animals and they further suggest that the placozoa+cnidaria clade found in some other studies deserves more attention. Taken as a whole, these results provide striking evidence that it is necessary to achieve a better understanding of the constraints due to protein structure to improve phylogenetic estimation.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Yajun Sheng ◽  
Xingye Qiu ◽  
Chen Zhang ◽  
Jun Xu ◽  
Yanping Zhang ◽  
...  

The protein quaternary structure is very important to the biological process. Predicting their attributes is an essential task in computational biology for the advancement of the proteomics. However, the existing methods did not consider sufficient properties of amino acid. To end this, we proposed a hybrid method Quad-PRE to predict protein quaternary structure attributes using the properties of amino acid, predicted secondary structure, predicted relative solvent accessibility, and position-specific scoring matrix profiles and motifs. Empirical evaluation on independent dataset shows that Quad-PRE achieved higher overall accuracy 81.7%, especially higher accuracy 92.8%, 93.3%, and 90.6% on discrimination for trimer, hexamer, and octamer, respectively. Our model also reveals that six features sets are all important to the prediction, and a hybrid method is an optimal strategy by now. The results indicate that the proposed method can classify protein quaternary structure attributes effectively.


2019 ◽  
Author(s):  
Ana Filipa Moutinho ◽  
Fernanda Fontes Trancoso ◽  
Julien Yann Dutheil

AbstractAdaptive mutations play an important role in molecular evolution. However, the frequency and nature of these mutations at the intra-molecular level is poorly understood. To address this, we analysed the impact of protein architecture on the rate of adaptive substitutions, aiming to understand how protein biophysics influences fitness and adaptation. Using Drosophila melanogaster and Arabidopsis thaliana population genomics data, we fitted models of distribution of fitness effects and estimated the rate of adaptive amino-acid substitutions both at the protein and amino-acid residue level. We performed a comprehensive analysis covering genome, gene and protein structure, by exploring a multitude of factors with a plausible impact on the rate of adaptive evolution, such as intron number, protein length, secondary structure, relative solvent accessibility, intrinsic protein disorder, chaperone affinity, gene expression, protein function and protein-protein interactions. We found that the relative solvent accessibility is a major driver of adaptive evolution, with most adaptive mutations occurring at the surface of proteins. Moreover, we observe that the rate of adaptive substitutions differs between protein functional classes, with genes encoding for protein biosynthesis and degradation signalling exhibiting the fastest rates of protein adaptation. Overall, our results suggest that adaptive evolution in proteins is mainly driven by inter-molecular interactions, with host-pathogen coevolution likely playing a major role.


Genes ◽  
2018 ◽  
Vol 9 (11) ◽  
pp. 553 ◽  
Author(s):  
Joseph Ahrens ◽  
Jordon Rahaman ◽  
Jessica Siltberg-Liberles

Various structural and functional constraints govern the evolution of protein sequences. As a result, the relative rates of amino acid replacement among sites within a protein can vary significantly. Previous large-scale work on Metazoan (Animal) protein sequence alignments indicated that amino acid replacement rates are partially driven by a complex interaction among three factors: intrinsic disorder propensity; secondary structure; and functional domain involvement. Here, we use sequence-based predictors to evaluate the effects of these factors on site-specific sequence evolutionary rates within four eukaryotic lineages: Metazoans; Plants; Saccharomycete Fungi; and Alveolate Protists. Our results show broad, consistent trends across all four Eukaryote groups. In all four lineages, there is a significant increase in amino acid replacement rates when comparing: (i) disordered vs. ordered sites; (ii) random coil sites vs. sites in secondary structures; and (iii) inter-domain linker sites vs. sites in functional domains. Additionally, within Metazoans, Plants, and Saccharomycetes, there is a strong confounding interaction between intrinsic disorder and secondary structure—alignment sites exhibiting both high disorder propensity and involvement in secondary structures have very low average rates of sequence evolution. Analysis of gene ontology (GO) terms revealed that in all four lineages, a high fraction of sequences containing these conserved, disordered-structured sites are involved in nucleic acid binding. We also observe notable differences in the statistical trends of Alveolates, where intrinsically disordered sites are more variable than in other Eukaryotes and the statistical interactions between disorder and other factors are less pronounced.


2019 ◽  
Vol 36 (9) ◽  
pp. 2013-2028 ◽  
Author(s):  
Ana Filipa Moutinho ◽  
Fernanda Fontes Trancoso ◽  
Julien Yann Dutheil

Abstract Adaptive mutations play an important role in molecular evolution. However, the frequency and nature of these mutations at the intramolecular level are poorly understood. To address this, we analyzed the impact of protein architecture on the rate of adaptive substitutions, aiming to understand how protein biophysics influences fitness and adaptation. Using Drosophila melanogaster and Arabidopsis thaliana population genomics data, we fitted models of distribution of fitness effects and estimated the rate of adaptive amino-acid substitutions both at the protein and amino-acid residue level. We performed a comprehensive analysis covering genome, gene, and protein structure, by exploring a multitude of factors with a plausible impact on the rate of adaptive evolution, such as intron number, protein length, secondary structure, relative solvent accessibility, intrinsic protein disorder, chaperone affinity, gene expression, protein function, and protein–protein interactions. We found that the relative solvent accessibility is a major determinant of adaptive evolution, with most adaptive mutations occurring at the surface of proteins. Moreover, we observe that the rate of adaptive substitutions differs between protein functional classes, with genes encoding for protein biosynthesis and degradation signaling exhibiting the fastest rates of protein adaptation. Overall, our results suggest that adaptive evolution in proteins is mainly driven by intermolecular interactions, with host–pathogen coevolution likely playing a major role.


2009 ◽  
Vol 83 (9) ◽  
pp. 4605-4615 ◽  
Author(s):  
Philippa C. Matthews ◽  
Alasdair J. Leslie ◽  
Aris Katzourakis ◽  
Hayley Crawford ◽  
Rebecca Payne ◽  
...  

ABSTRACT The selection of escape mutations has a major impact on immune control of infections with viruses such as human immunodeficiency virus (HIV). Viral evasion of CD8+ T-cell responses leaves predictable combinations of escape mutations, termed HLA “footprints.” The most clearly defined footprints are those associated with HLA alleles that are linked with successful control of HIV, such as HLA-B*57. Here we investigated the extent to which HLA footprint sites in HIV type 1 (HIV-1) are associated with viral evolution among and within clades. First, we examined the extent to which amino acid differences between HIV-1 clades share identity with sites of HLA-mediated selection pressure and observed a strong association, in particular with respect to sites of HLA-B selection (P < 10−6). Similarly, the sites of amino acid variability within a clade were found to overlap with sites of HLA-selected mutation. Second, we studied the impact of HLA selection on interclade phylogeny. Removing the sites of amino acid variability did not significantly affect clade-specific clustering, reflecting the central role of founder effects in establishing distinct clades. However, HLA footprints may underpin founder strains, and we show that amino acid substitutions between clades alter phylogeny, underlining a potentially substantial role for HLA in driving ongoing viral evolution. Finally, we investigated the impact of HLA selection on within-clade phylogeny and demonstrate that even a single HLA allele footprint can result in significant phylogenetic clustering of sequences. In conclusion, these data highlight the fact that HLA can be a strong selection force for both intra- and interclade HIV evolution at a population level.


2018 ◽  
Vol 15 (1) ◽  
pp. 45-54 ◽  
Author(s):  
Bishnupriya Panda ◽  
Babita Majhi ◽  
Abhimanyu Thakur

Background: Proteins are the utmost multi-purpose macromolecules, which play a crucial function in many aspects of biological processes. For a long time, sequence arrangement of amino acid has been utilized for the prediction of protein secondary structure. Besides, in major methods for the prediction of protein secondary structure class, the impact of Gaussian noise on sequence representation of amino acids has not been considered until now; which is one of the important constraints for the functionality of a protein. </P><P> Methods: In the present research, the prediction of protein secondary structure class was accomplished by integrated application of Stockwell transformation and Amino Acid Composition (AAC), on equivalent Electron-ion Interaction Potential (EIIP) representation of raw amino acid sequence. The introduced method was evaluated by using 4 benchmark datasets of low sequence homology, namely PDB25, 498, 277, and 204. Furthermore, random forest algorithm together with the out-of-bag error estimate and Support Vector Machine (SVM), using k-fold cross validation demonstrated high feature representation potential of our reported approach. Results: The overall prediction accuracy for PDB25, 498, 277, and 204 datasets with randomforest classifier was 92.5%, 94.79%, 92.45%, and 88.04% respectively, whereas with SVM, the results were 84.66%, 95.32%, 89.29%, and 84.37% respectively. An integrated-order-function-frequency-time (OFFT) model has been proposed for the prediction of protein secondary structure class. For the first time, we reported the effect of Gaussian noise on the prediction accuracy of protein secondary structure class and proposed a robust integrated- OFFT model, which is effectively noise resistant.


Sign in / Sign up

Export Citation Format

Share Document