The Impact of Protein Architecture on Adaptive Evolution

Abstract Adaptive mutations play an important role in molecular evolution. However, the frequency and nature of these mutations at the intramolecular level are poorly understood. To address this, we analyzed the impact of protein architecture on the rate of adaptive substitutions, aiming to understand how protein biophysics influences fitness and adaptation. Using Drosophila melanogaster and Arabidopsis thaliana population genomics data, we fitted models of distribution of fitness effects and estimated the rate of adaptive amino-acid substitutions both at the protein and amino-acid residue level. We performed a comprehensive analysis covering genome, gene, and protein structure, by exploring a multitude of factors with a plausible impact on the rate of adaptive evolution, such as intron number, protein length, secondary structure, relative solvent accessibility, intrinsic protein disorder, chaperone affinity, gene expression, protein function, and protein–protein interactions. We found that the relative solvent accessibility is a major determinant of adaptive evolution, with most adaptive mutations occurring at the surface of proteins. Moreover, we observe that the rate of adaptive substitutions differs between protein functional classes, with genes encoding for protein biosynthesis and degradation signaling exhibiting the fastest rates of protein adaptation. Overall, our results suggest that adaptive evolution in proteins is mainly driven by intermolecular interactions, with host–pathogen coevolution likely playing a major role.

Download Full-text

The impact of protein architecture on adaptive evolution

10.1101/560185 ◽

2019 ◽

Author(s):

Ana Filipa Moutinho ◽

Fernanda Fontes Trancoso ◽

Julien Yann Dutheil

Keyword(s):

Amino Acid ◽

Adaptive Evolution ◽

Protein Function ◽

Population Genomics ◽

Solvent Accessibility ◽

Protein Biosynthesis ◽

Relative Solvent Accessibility ◽

Adaptive Mutations ◽

Protein Architecture ◽

The Impact

AbstractAdaptive mutations play an important role in molecular evolution. However, the frequency and nature of these mutations at the intra-molecular level is poorly understood. To address this, we analysed the impact of protein architecture on the rate of adaptive substitutions, aiming to understand how protein biophysics influences fitness and adaptation. Using Drosophila melanogaster and Arabidopsis thaliana population genomics data, we fitted models of distribution of fitness effects and estimated the rate of adaptive amino-acid substitutions both at the protein and amino-acid residue level. We performed a comprehensive analysis covering genome, gene and protein structure, by exploring a multitude of factors with a plausible impact on the rate of adaptive evolution, such as intron number, protein length, secondary structure, relative solvent accessibility, intrinsic protein disorder, chaperone affinity, gene expression, protein function and protein-protein interactions. We found that the relative solvent accessibility is a major driver of adaptive evolution, with most adaptive mutations occurring at the surface of proteins. Moreover, we observe that the rate of adaptive substitutions differs between protein functional classes, with genes encoding for protein biosynthesis and degradation signalling exhibiting the fastest rates of protein adaptation. Overall, our results suggest that adaptive evolution in proteins is mainly driven by inter-molecular interactions, with host-pathogen coevolution likely playing a major role.

Download Full-text

SphereCon—a method for precise estimation of residue relative solvent accessible area from limited structural information

Bioinformatics ◽

10.1093/bioinformatics/btaa159 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3372-3378

Author(s):

Alexander Gress ◽

Olga V Kalinina

Keyword(s):

Protein Function ◽

Structural Information ◽

Solvent Accessibility ◽

Three Dimensional ◽

Structural Data ◽

Supplementary Information ◽

Dimensional Structure ◽

Relative Solvent Accessibility ◽

Precise Measure ◽

The Impact

Abstract Motivation In proteins, solvent accessibility of individual residues is a factor contributing to their importance for protein function and stability. Hence one might wish to calculate solvent accessibility in order to predict the impact of mutations, their pathogenicity and for other biomedical applications. A direct computation of solvent accessibility is only possible if all atoms of a protein three-dimensional structure are reliably resolved. Results We present SphereCon, a new precise measure that can estimate residue relative solvent accessibility (RSA) from limited data. The measure is based on calculating the volume of intersection of a sphere with a cone cut out in the direction opposite of the residue with surrounding atoms. We propose a method for estimating the position and volume of residue atoms in cases when they are not known from the structure, or when the structural data are unreliable or missing. We show that in cases of reliable input structures, SphereCon correlates almost perfectly with the directly computed RSA, and outperforms other previously suggested indirect methods. Moreover, SphereCon is the only measure that yields accurate results when the identities of amino acids are unknown. A significant novel feature of SphereCon is that it can estimate RSA from inter-residue distance and contact matrices, without any information about the actual atom coordinates. Availability and implementation https://github.com/kalininalab/spherecon. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Phylogenetic Analyses of Sites in Different Protein Structural Environments Result in Distinct Placements of the Metazoan Root

Biology ◽

10.3390/biology9040064 ◽

2020 ◽

Vol 9 (4) ◽

pp. 64 ◽

Cited By ~ 6

Author(s):

Akanksha Pandey ◽

Edward L. Braun

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Solvent Accessibility ◽

Phylogenetic Signal ◽

Phylogenetic Analyses ◽

Sister Group ◽

Striking Difference ◽

Relative Solvent Accessibility ◽

Protein Datasets ◽

The Impact

Phylogenomics, the use of large datasets to examine phylogeny, has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life; this could reflect, at least in part, the poor-fit of the models used to analyze heterogeneous datasets. Some of the heterogeneity may reflect the different patterns of selection on proteins based on their structures. To test that hypothesis, we developed a pipeline to divide phylogenomic protein datasets into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had distinct signals for the topology of the deepest branches in the metazoan tree. We focused on a dataset that appeared to have a mixture of signals and we found that the most striking difference in phylogenetic signal reflected relative solvent accessibility. Analyses of exposed sites (residues located on the surface of proteins) yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge+ctenophore clade. These differences in phylogenetic signal were not ameliorated when we conducted analyses using a set of maximum-likelihood profile mixture models. These models are very similar to the Bayesian CAT model, which has been used in many analyses of deep metazoan phylogeny. In contrast, analyses conducted after recoding amino acids to limit the impact of deviations from compositional stationarity increased the congruence in the estimates of phylogeny for exposed and buried sites; after recoding amino acid trees estimated using the exposed and buried site both supported placement of ctenophores sister to all other animals. Although the central conclusion of our analyses is that sites in different structural environments yield distinct trees when analyzed using models of protein evolution, our amino acid recoding analyses also have implications for metazoan evolution. Specifically, our results add to the evidence that ctenophores are the sister group of all other animals and they further suggest that the placozoa+cnidaria clade found in some other studies deserves more attention. Taken as a whole, these results provide striking evidence that it is necessary to achieve a better understanding of the constraints due to protein structure to improve phylogenetic estimation.

Download Full-text

Faculty Opinions recommendation of The use of orthologous sequences to predict the impact of amino acid substitutions on protein function.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.3558956.3485081 ◽

2010 ◽

Author(s):

Alejandro Schaffer

Keyword(s):

Amino Acid ◽

Protein Function ◽

Amino Acid Substitutions ◽

The Impact

Download Full-text

Assessing the Impact of Secondary Structure and Solvent Accessibility on Protein Evolution

Genetics ◽

10.1093/genetics/149.1.445 ◽

1998 ◽

Vol 149 (1) ◽

pp. 445-458 ◽

Cited By ~ 21

Author(s):

Nick Goldman ◽

Jeffrey L Thorne ◽

David T Jones

Keyword(s):

Amino Acid ◽

Secondary Structure ◽

Protein Evolution ◽

Solvent Accessibility ◽

Strong Association ◽

Length Distribution ◽

Parametric Bootstrap ◽

Amino Acid Replacement ◽

Physical Constraints ◽

The Impact

Abstract Empirically derived models of amino acid replacement are employed to study the association between various physical features of proteins and evolution. The strengths of these associations are statistically evaluated by applying the models of protein evolution to 11 diverse sets of protein sequences. Parametric bootstrap tests indicate that the solvent accessibility status of a site has a particularly strong association with the process of amino acid replacement that it experiences. Significant association between secondary structure environment and the amino acid replacement process is also observed. Careful description of the length distribution of secondary structure elements and of the organization of secondary structure and solvent accessibility along a protein did not always significantly improve the fit of the evolutionary models to the data sets that were analyzed. As indicated by the strength of the association of both solvent accessibility and secondary structure with amino acid replacement, the process of protein evolution—both above and below the species level—will not be well understood until the physical constraints that affect protein evolution are identified and characterized.

Download Full-text

Searching for signatures of positive selection in cytochrome b gene associated with subterranean lifestyle in fast-evolving arvicolines (Arvicolinae, Cricetidae, Rodentia)

BMC Ecology and Evolution ◽

10.1186/s12862-021-01819-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Olga V. Bondareva ◽

Nadezhda A. Potapova ◽

Kirill A. Konovalov ◽

Tatyana V. Petrova ◽

Natalia I. Abramson

Keyword(s):

Amino Acid ◽

Oxidative Phosphorylation ◽

Adaptive Evolution ◽

Complex Structure ◽

Subterranean Rodents ◽

Amino Acid Sequence Variation ◽

Tertiary Protein Structure ◽

Metabolic Performance ◽

The Impact ◽

Subterranean Species

Abstract Background Mitochondrial genes encode proteins involved in oxidative phosphorylation. Variations in lifestyle and ecological niche can be directly reflected in metabolic performance. Subterranean rodents represent a good model for testing hypotheses on adaptive evolution driven by important ecological shifts. Voles and lemmings of the subfamily Arvicolinae (Rodentia: Cricetidae) provide a good example for studies of adaptive radiation. This is the youngest group within the order Rodentia showing the fastest rates of diversification, including the transition to the subterranean lifestyle in several phylogenetically independent lineages. Results We evaluated the signatures of selection in the mitochondrial cytochrome b (cytB) gene in 62 Arvicolinae species characterized by either subterranean or surface-dwelling lifestyle by assessing amino acid sequence variation, exploring the functional consequences of the observed variation in the tertiary protein structure, and estimating selection pressure. Our analysis revealed that: (1) three of the convergent amino acid substitutions were found among phylogenetically distant subterranean species and (2) these substitutions may have an influence on the protein complex structure, (3) cytB showed an increased ω and evidence of relaxed selection in subterranean lineages, relative to non-subterranean, and (4) eight protein domains possess increased nonsynonymous substitutions ratio in subterranean species. Conclusions Our study provides insights into the adaptive evolution of the cytochrome b gene in the Arvicolinae subfamily and its potential implications in the molecular mechanism of adaptation. We present a framework for future characterizations of the impact of specific mutations on the function, physiology, and interactions of the mtDNA-encoded proteins involved in oxidative phosphorylation.

Download Full-text

How much does Ne vary among species?

10.1101/861849 ◽

2019 ◽

Cited By ~ 2

Author(s):

Nicolas Galtier ◽

Marjolaine Rousselle

Keyword(s):

Amino Acid ◽

Population Genomics ◽

Species Variation ◽

Effective Population ◽

Gamma Model ◽

Frequency Spectra ◽

Synonymous Mutations ◽

Evolutionary Force ◽

Order Of Magnitude ◽

The Impact

AbstractGenetic drift is an important evolutionary force of strength inversely proportional to Ne, the effective population size. The impact of drift on genome diversity and evolution is known to vary among species, but quantifying this effect is a difficult task. Here we assess the magnitude of variation in drift power among species of animals via its effect on the mutation load – which implies also inferring the distribution of fitness effects of deleterious mutations (DFE). To this aim, we analyze the non-synonymous (amino-acid changing) and synonymous (amino-acid conservative) allele frequency spectra in a large sample of metazoan species, with a focus on the primates vs. fruit flies contrast. We show that a Gamma model of the DFE is not suitable due to strong differences in estimated shape parameters among taxa, while adding a class of lethal mutations essentially solves the problem. Using the Gamma + lethal model and assuming that the mean deleterious effects of non-synonymous mutations is shared among species, we estimate that the power of drift varies by a factor of at least 500 between large-Ne and small-Ne species of animals, i.e., an order of magnitude more than the among-species variation in genetic diversity. Our results are relevant to Lewontin’s paradox while further questioning the meaning of the Ne parameter in population genomics.

Download Full-text

Faculty Opinions recommendation of The use of orthologous sequences to predict the impact of amino acid substitutions on protein function.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.3558956.3265054 ◽

2010 ◽

Author(s):

Michael Wagner

Keyword(s):

Amino Acid ◽

Protein Function ◽

Amino Acid Substitutions ◽

The Impact

Download Full-text

Quad-PRE: A Hybrid Method to Predict Protein Quaternary Structure Attributes

Computational and Mathematical Methods in Medicine ◽

10.1155/2014/715494 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Yajun Sheng ◽

Xingye Qiu ◽

Chen Zhang ◽

Jun Xu ◽

Yanping Zhang ◽

...

Keyword(s):

Amino Acid ◽

Secondary Structure ◽

Hybrid Method ◽

Quaternary Structure ◽

Biological Process ◽

Solvent Accessibility ◽

Empirical Evaluation ◽

Relative Solvent Accessibility ◽

Independent Dataset ◽

Scoring Matrix

The protein quaternary structure is very important to the biological process. Predicting their attributes is an essential task in computational biology for the advancement of the proteomics. However, the existing methods did not consider sufficient properties of amino acid. To end this, we proposed a hybrid method Quad-PRE to predict protein quaternary structure attributes using the properties of amino acid, predicted secondary structure, predicted relative solvent accessibility, and position-specific scoring matrix profiles and motifs. Empirical evaluation on independent dataset shows that Quad-PRE achieved higher overall accuracy 81.7%, especially higher accuracy 92.8%, 93.3%, and 90.6% on discrimination for trimer, hexamer, and octamer, respectively. Our model also reveals that six features sets are all important to the prediction, and a hybrid method is an optimal strategy by now. The results indicate that the proposed method can classify protein quaternary structure attributes effectively.

Download Full-text

How Much Does Ne Vary Among Species?

Genetics ◽

10.1534/genetics.120.303622 ◽

2020 ◽

Vol 216 (2) ◽

pp. 559-572 ◽

Cited By ~ 2

Author(s):

Nicolas Galtier ◽

Marjolaine Rousselle

Keyword(s):

Amino Acid ◽

Population Genomics ◽

Species Variation ◽

Effective Population ◽

Gamma Model ◽

Frequency Spectra ◽

Fitness Effects ◽

Evolutionary Force ◽

Order Of Magnitude ◽

The Impact

Genetic drift is an important evolutionary force of strength inversely proportional to Ne, the effective population size. The impact of drift on genome diversity and evolution is known to vary among species, but quantifying this effect is a difficult task. Here we assess the magnitude of variation in drift power among species of animals via its effect on the mutation load – which implies also inferring the distribution of fitness effects of deleterious mutations. To this aim, we analyze the nonsynonymous (amino-acid changing) and synonymous (amino-acid conservative) allele frequency spectra in a large sample of metazoan species, with a focus on the primates vs. fruit flies contrast. We show that a Gamma model of the distribution of fitness effects is not suitable due to strong differences in estimated shape parameters among taxa, while adding a class of lethal mutations essentially solves the problem. Using the Gamma + lethal model and assuming that the mean deleterious effects of nonsynonymous mutations is shared among species, we estimate that the power of drift varies by a factor of at least 500 between large-Ne and small-Ne species of animals, i.e., an order of magnitude more than the among-species variation in genetic diversity. Our results are relevant to Lewontin’s paradox while further questioning the meaning of the Ne parameter in population genomics.

Download Full-text