scholarly journals Detecting consistent patterns of directional adaptation using differential selection codon models

2016 ◽  
Author(s):  
Sahar Parto ◽  
Nicolas Lartillot

AbstractBackgroundPhylogenetic codon models are often used to characterize the selective regimes acting on protein coding sequences. Recent methodological developments have led to models explicitly accounting for the interplay between mutation and selection, by explicitly modelling the amino acid fitness landscape along the sequence. However, thus far, most of these models have assumed that the fitness landscape is constant over time. Fluctuations of the fitness landscape may often be random or depend on complex and unknown factors. However, some organisms may be subject to systematic changes in selective pressure, resulting in reproducible molecular adaptations across independent lineages subject to similar conditions.ResultsHere, we developed a codon-based differential selection model, which aims to detect and quantify the fine-grained consistent patterns of adaptation at the protein-coding level, as a function of external conditions experienced by the organism under investigation. The model parameterizes the global mutational pressure, as well as the site- and condition-specific amino acid selective preferences. This phylogenetic model is implemented in a Bayesian MCMC framework. After validation with simulations, we applied our method to a dataset of HIV sequences from patients with known HLA genetic background. Our differential selection model detects and characterizes differentially selected coding positions specifically associated with two different HLA alleles.Conclusionour differential selection model is able to identify consistent molecular adaptations as a function of repeated changes in the environment of the organism. These models can be applied to many other problems, ranging from viral adaptation to evolution of life-history strategies in plants or animals.

BMC Biology ◽  
2019 ◽  
Vol 17 (1) ◽  
Author(s):  
Frida Belinky ◽  
Itamar Sela ◽  
Igor B. Rogozin ◽  
Eugene V. Koonin

Abstract Background Single nucleotide substitutions in protein-coding genes can be divided into synonymous (S), with little fitness effect, and non-synonymous (N) ones that alter amino acids and thus generally have a greater effect. Most of the N substitutions are affected by purifying selection that eliminates them from evolving populations. However, additional mutations of nearby bases potentially could alleviate the deleterious effect of single substitutions, making them subject to positive selection. To elucidate the effects of selection on double substitutions in all codons, it is critical to differentiate selection from mutational biases. Results We addressed the evolutionary regimes of within-codon double substitutions in 37 groups of closely related prokaryotic genomes from diverse phyla by comparing the fractions of double substitutions within codons to those of the equivalent double S substitutions in adjacent codons. Under the assumption that substitutions occur one at a time, all within-codon double substitutions can be represented as “ancestral-intermediate-final” sequences (where “intermediate” refers to the first single substitution and “final” refers to the second substitution) and can be partitioned into four classes: (1) SS, S intermediate–S final; (2) SN, S intermediate–N final; (3) NS, N intermediate–S final; and (4) NN, N intermediate–N final. We found that the selective pressure on the second substitution markedly differs among these classes of double substitutions. Analogous to single S (synonymous) substitutions, SS double substitutions evolve neutrally, whereas analogous to single N (non-synonymous) substitutions, SN double substitutions are subject to purifying selection. In contrast, NS show positive selection on the second step because the original amino acid is recovered. The NN double substitutions are heterogeneous and can be subject to either purifying or positive selection, or evolve neutrally, depending on the amino acid similarity between the final or intermediate and the ancestral states. Conclusions The results of the present, comprehensive analysis of the evolutionary landscape of within-codon double substitutions reaffirm the largely conservative regime of protein evolution. However, the second step of a double substitution can be subject to positive selection when the first step is deleterious. Such positive selection can result in frequent crossing of valleys on the fitness landscape.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Andrew M. Ritchie ◽  
Tristan L. Stark ◽  
David A. Liberles

Abstract Background Recovering the historical patterns of selection acting on a protein coding sequence is a major goal of evolutionary biology. Mutation-selection models address this problem by explicitly modelling fixation rates as a function of site-specific amino acid fitness values.However, they are restricted in their utility for investigating directional evolution because they require prior knowledge of the locations of fitness changes in the lineages of a phylogeny. Results We apply a modified mutation-selection methodology that relaxes assumptions of equlibrium and time-reversibility. Our implementation allows us to identify branches where adaptive or compensatory shifts in the fitness landscape have taken place, signalled by a change in amino acid fitness profiles. Through simulation and analysis of an empirical data set of $$\beta $$ β -lactamase genes, we test our ability to recover the position of adaptive events within the tree and successfully reconstruct initial codon frequencies and fitness profile parameters generated under the non-stationary model. Conclusion We demonstrate successful detection of selective shifts and identification of the affected branch on partitions of 300 codons or more. We successfully reconstruct fitness parameters and initial codon frequencies in simulated data and demonstrate that failing to account for non-equilibrium evolution can increase the error in fitness profile estimation. We also demonstrate reconstruction of plausible shifts in amino acid fitnesses in the bacterial $$\beta $$ β -lactamase family and discuss some caveats for interpretation.


2017 ◽  
Author(s):  
Dariya K Sydykova ◽  
Claus O Wilke

Many applications require the calculation of site-specific evolutionary rates from alignments of amino-acid sequences. For example, catalytic residues in enzymes and interface regions in protein complexes can be inferred from observed relative rates. While numerous approaches exist to calculate amino-acid rates, however, it is not entirely clear what physical quantities the inferred rates represent and how these rates relate to the underlying fitness landscape of the evolving protein. Further, amino-acid rates can be calculated in the context of different amino-acid exchangeability matrices, such as JTT, LG, or WAG, and again it is not known how the choice of the matrix influences the physical interpretation of the inferred rates. Here, we develop a theory of measurement for site-specific evolutionary rates, but analytically solving the maximum-likelihood equations for rate inference performed on sequences evolved under a mutation–selection model. We demonstrate that the measurement process can only recover the true expected rates of the mutation–selection model if rates are measured relative to a naïve exchangeability matrix, in which all exchangeabilities are equal to one. Rate measurements using other matrices are quantitatively close but not mathematically correct. Our results demonstrate that insights obtained from phylogenetic-tree inference do not necessarily apply to rate inference, and best practices for the former may be deleterious for the latter.


2017 ◽  
Author(s):  
Dariya K Sydykova ◽  
Claus O Wilke

Many applications require the calculation of site-specific evolutionary rates from alignments of amino-acid sequences. For example, catalytic residues in enzymes and interface regions in protein complexes can be inferred from observed relative rates. While numerous approaches exist to calculate amino-acid rates, however, it is not entirely clear what physical quantities the inferred rates represent and how these rates relate to the underlying fitness landscape of the evolving protein. Further, amino-acid rates can be calculated in the context of different amino-acid exchangeability matrices, such as JTT, LG, or WAG, and again it is not known how the choice of the matrix influences the physical interpretation of the inferred rates. Here, we develop a theory of measurement for site-specific evolutionary rates, but analytically solving the maximum-likelihood equations for rate inference performed on sequences evolved under a mutation–selection model. We demonstrate that the measurement process can only recover the true expected rates of the mutation–selection model if rates are measured relative to a naïve exchangeability matrix, in which all exchangeabilities are equal to one. Rate measurements using other matrices are quantitatively close but not mathematically correct. Our results demonstrate that insights obtained from phylogenetic-tree inference do not necessarily apply to rate inference, and best practices for the former may be deleterious for the latter.


2018 ◽  
Author(s):  
Dariya K. Sydykova ◽  
Claus O. Wilke

In the field of molecular evolution, we commonly calculate site-specific evolutionary rates from alignments of amino-acid sequences. For example, catalytic residues in enzymes and interface regions in protein complexes can be inferred from observed relative rates. While numerous approaches exist to calculate amino-acid rates, it is not entirely clear what physical quantities the inferred rates represent and how these rates relate to the underlying fitness landscape of the evolving proteins. Further, amino-acid rates can be calculated in the context of different amino-acid exchangeability matrices, such as JTT, LG, or WAG, and again it is not well understood how the choice of the matrix influences the physical inter-pretation of the inferred rates. Here, we develop a theory of measurement for site-specific evolutionary rates, by analytically solving the maximum-likelihood equations for rate inference performed on sequences evolved under a mutation–selection model. We demonstrate that for realistic analysis settings the measurement process will recover the true expected rates of the mutation–selection model if rates are measured relative to a naïve exchangeability matrix, in which all exchangeabilities are equal to 1/19. We also show that rate measurements using other matrices are quantitatively close but in general not mathematically equivalent. Our results demonstrate that insights obtained from phylogenetic-tree inference do not necessarily apply to rate inference, and best practices for the former may be deleterious for the latter.Significance StatementMaximum likelihood inference is widely used to infer model parameters from sequence data in an evolutionary context. One major challenge in such inference procedures is the problem of having to identify the appropriate model used for inference. Model parameters usually are meaningful only to the extent that the model is appropriately specified and matches the process that generated the data. However, in practice, we don’t know what process generated the data, and most models in actual use are misspecified. To circumvent this problem, we show here that we can employ maximum likelihood inference to make defined and meaningful measurements on arbitrary processes. Our approach uses misspecification as a deliberate strategy, and this strategy results in robust and meaningful parameter inference.


2019 ◽  
Author(s):  
Frida Belinky ◽  
Itamar Sela ◽  
Igor B. Rogozin ◽  
Eugene V. Koonin

AbstractSingle nucleotide substitutions in protein-coding genes can be divided into synonymous (S), with little fitness effect, and non-synonymous (N) ones that alter amino acids and thus generally have a greater effect. Most of the N substitutions are affected by purifying selection that eliminates them from evolving populations. However, additional mutations of nearby bases can modulate the deleterious effect of single substitutions and thus might be subject to positive selection. To elucidate the effects of selection on double substitutions in all codons, it is critical to differentiate selection from mutational biases. We approached this problem by comparing the fractions of double substitutions within codons to those of the equivalent double S substitutions in adjacent codons. Under the assumption that substitutions occur one at a time, all within-codon double substitutions can be represented as “ancestral-intermediate-final” sequences and can be partitioned into 4 classes: 1) SS: S intermediate – S final, 2) SN: S intermediate – N final, 3) NS: N intermediate – S final, 4) NN: N intermediate – N final. We found that the selective pressure on the second substitution markedly differs among these classes of double substitutions. Analogous to single S substitutions, SS evolve neutrally whereas, analogous to single N substitutions, SN are subject to purifying selection. In contrast, NS show positive selection on the second step because the original amino acid is recovered. The NN double substitutions are heterogeneous and can be subject to either purifying or positive selection, or evolve neutrally, depending on the amino acid similarity between the final or intermediate and the ancestral states. The general trend is that the second mutation compensates for the deleterious effect of the first one, resulting in frequent crossing of valleys on the fitness landscape.


Genetics ◽  
2000 ◽  
Vol 155 (1) ◽  
pp. 431-449 ◽  
Author(s):  
Ziheng Yang ◽  
Rasmus Nielsen ◽  
Nick Goldman ◽  
Anne-Mette Krabbe Pedersen

AbstractComparison of relative fixation rates of synonymous (silent) and nonsynonymous (amino acid-altering) mutations provides a means for understanding the mechanisms of molecular sequence evolution. The nonsynonymous/synonymous rate ratio (ω = dN/dS) is an important indicator of selective pressure at the protein level, with ω = 1 meaning neutral mutations, ω < 1 purifying selection, and ω > 1 diversifying positive selection. Amino acid sites in a protein are expected to be under different selective pressures and have different underlying ω ratios. We develop models that account for heterogeneous ω ratios among amino acid sites and apply them to phylogenetic analyses of protein-coding DNA sequences. These models are useful for testing for adaptive molecular evolution and identifying amino acid sites under diversifying selection. Ten data sets of genes from nuclear, mitochondrial, and viral genomes are analyzed to estimate the distributions of ω among sites. In all data sets analyzed, the selective pressure indicated by the ω ratio is found to be highly heterogeneous among sites. Previously unsuspected Darwinian selection is detected in several genes in which the average ω ratio across sites is <1, but in which some sites are clearly under diversifying selection with ω > 1. Genes undergoing positive selection include the β-globin gene from vertebrates, mitochondrial protein-coding genes from hominoids, the hemagglutinin (HA) gene from human influenza virus A, and HIV-1 env, vif, and pol genes. Tests for the presence of positively selected sites and their subsequent identification appear quite robust to the specific distributional form assumed for ω and can be achieved using any of several models we implement. However, we encountered difficulties in estimating the precise distribution of ω among sites from real data sets.


Author(s):  
Shaoxiong Zhou ◽  
Shengrong Gong ◽  
Shan Zhong ◽  
Wei Pan ◽  
Wenhao Ying

2016 ◽  
Author(s):  
Claudia Bank ◽  
Sebastian Matuszewski ◽  
Ryan T. Hietpas ◽  
Jeffrey D. Jensen

AbstractThe study of fitness landscapes, which aims at mapping genotypes to fitness, is receiving ever-increasing attention. Novel experimental approaches combined with NGS methods enable accurate and extensive studies of the fitness effects of mutations – allowing us to test theoretical predictions and improve our understanding of the shape of the true underlying fitness landscape, and its implications for the predictability and repeatability of evolution.Here, we present a uniquely large multi-allelic fitness landscape comprised of 640 engineered mutants that represent all possible combinations of 13 amino-acid changing mutations at six sites in the heat-shock protein Hsp90 in Saccharomyces cerevisiae under elevated salinity. Despite a prevalent pattern of negative epistasis in the landscape, we find that the global fitness peak is reached via four positively epistatic mutations. Combining traditional and extending recently proposed theoretical and statistical approaches, we quantify features of the global multi-allelic fitness landscape. Using subsets of the data, we demonstrate that extrapolation beyond a known part of the landscape is difficult owing to both local ruggedness and amino-acid specific epistatic hotspots, and that inference is additionally confounded by the non-random choice of mutations for experimental fitness landscapes.Author SummaryThe study of fitness landscapes is fundamentally concerned with understanding the relative roles of stochastic and deterministic processes in adaptive evolution. Here, the authors present a uniquely large and complete multi-allelic intragenic fitness landscape of 640 systematically engineered mutations in yeast Hsp90. Using a combination of traditional and recently proposed theoretical approaches, they study the accessibility of the global fitness peak, and the potential for predictability of the fitness landscape topography. They report local ruggedness of the landscape and the existence of epistatic hotspot mutations, which together make extrapolation and hence predictability inherently difficult, if mutation-specific information is not considered.


2020 ◽  
Author(s):  
Shiwani Limbu

AbstractKinesins of class 13 (kinesin-13s), also known as KinI family proteins, are non-motile microtubule binding kinesin proteins. Mitotic centromere-associated kinesin (MCAK), a member of KinI family protein, diffuses along the microtubule and plays a key role in microtubule depolymerization. Here we have demonstrated the role of evolutionary selection in MCAK protein coding region in regulating its dynamics associated with microtubule binding and stability. Our results indicate that evolutionary selection within MCAK motor domain at amino acid position 440 in carnivora and artiodactyla order results in significant change in the dynamics of α – helix and loop 11, indicating its likely impact on changing the microtubule binding and depolymerization process. Furthermore, evolutionary selections at amino acid position 600, 617 and 698 are likely to affect MCAK stability. A deeper understanding of evolutionary selections in MCAK can reveal the mechanism associated with change in microtubule dynamics within eutherian mammals.


Sign in / Sign up

Export Citation Format

Share Document