A codon model of nucleotide substitution with selection on synonymous codon usage

2014 ◽  
Author(s):  
Laura Kubatko ◽  
Premal Shah ◽  
Radu Herbei ◽  
Michael Gilchrist

The quality of phylogenetic inference made from protein-coding genes depends, in part, on the realism with which the codon substitution process is modeled. Here we propose a new mechanistic model that combines the standard M0 substitution model of Yang (1997) with a simplified model from Gilchrist (2007) that includes selection on synonymous substitutions as a function of codon-specific nonsense error rates. We tested the newly proposed model by applying it to 104 protein-coding genes in brewer's yeast, and compared the fit of the new model to the standard M0 model and to the mutation-selection model of Yang and Nielsen (2008) using the AIC. Our new model provided significantly better fit in approximately 85% of the cases considered for the basic M0 model and in approximately 25% of the cases for the M0 model with estimated codon frequencies, but only in a few cases when the mutation-selection model was considered. However, our model includes a parameter that can be interpreted as a measure of the rate of protein production, and the estimates of this parameter were highly correlated with an independent measure of protein production for the yeast genes considered here. Finally, we found that in some cases the new model led to the preference of a different phylogeny for a subset of the genes considered, indicating that substitution model choice may have an impact on the estimated phylogeny.

Author(s):  
Nicolas Rodrigue ◽  
Thibault Latrille ◽  
Nicolas Lartillot

Abstract In recent years, codon substitution models based on the mutation–selection principle have been extended for the purpose of detecting signatures of adaptive evolution in protein-coding genes. However, the approaches used to date have either focused on detecting global signals of adaptive regimes—across the entire gene—or on contexts where experimentally derived, site-specific amino acid fitness profiles are available. Here, we present a Bayesian site-heterogeneous mutation–selection framework for site-specific detection of adaptive substitution regimes given a protein-coding DNA alignment. We offer implementations, briefly present simulation results, and apply the approach on a few real data sets. Our analyses suggest that the new approach shows greater sensitivity than traditional methods. However, more study is required to assess the impact of potential model violations on the method, and gain a greater empirical sense its behavior on a broader range of real data sets. We propose an outline of such a research program.


2016 ◽  
Author(s):  
Benjamin D Kaehler ◽  
Von Bing Yap ◽  
Gavin A Huttley

Estimation of natural selection on protein-coding sequences is a key comparative genomics approach for de novo prediction of lineage specific adaptations. Selective pressure is measured on a per-gene basis by comparing the rate of non-synonymous substitutions to the rate of neutral evolution, typically assumed to be the rate of synonymous substitutions. All published codon substitution models have been time-reversible and thus assume that sequence composition does not change over time. We previously demonstrated that if time-reversible DNA substitution models are applied blindly in the presence of changing sequence composition, the number of substitutions is systematically biased towards overestimation. We extend these findings to the case of codon substitution models and further demonstrate that the ratio of non-synonymous to synonymous rates of substitution tends to be underestimated over three data sets of insects, mammals, and vertebrates. Our basis for comparison is a non-stationary codon substitution model that allows sequence composition to change. Model selection and model fit results demonstrate that our new model tends to fit the data better. Direct measurement of non-stationarity shows that bias in estimates of natural selection and genetic distance increases with the degree of violation of the stationarity assumption. Additionally, inferences drawn under time-reversible models are systematically affected by compositional divergence. As genomic sequences accumulate at an accelerating rate, the importance of accurate de novo estimation of natural selection increases. Our results establish that our new model provides a more robust perspective on this fundamental quantity.


Genes ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 1945
Author(s):  
Olga Bondareva ◽  
Evgeny Genelt-Yanovskiy ◽  
Tatyana Petrova ◽  
Semen Bodrov ◽  
Antonina Smorkatcheva ◽  
...  

This study evaluates signatures of selection in the evolution of the mitochondrial DNA of voles, subfamily Arvicolinae, during the colonization of subterranean environments. The comparative sequence analysis of mitochondrial protein-coding genes of eight subterranean vole species (Prometheomys schaposchnikowi, three species of the genus Ellobius: Ellobius talpinus, Ellobius fuscocapillus and Ellobius lutescens, two species of the genus Terricola: Terricola subterraneus and Terricola daghestanicus, Lasiopodomys mandarinus, and Hyperacrius fertilis) and their closest aboveground relatives was applied using codon-substitution models. The highest number of selection signatures was detected in genes ATP8 and CYTB. The relaxation of selection was observed in most mitochondrial DNA protein-coding genes for subterranean species. The largest amount of relaxed genes is discovered in mole voles (genus Ellobius). The number of selection signatures was found to be independent of the evolutionary age of the lineage but fits the degree of specialization to the subterranean niche. The common trends of selective pressures were observed among the evolutionary ancient and highly specialized subterranean rodent families and phylogenetically young lineages of voles. It suggests that the signatures of adaptation in individual mitochondrial protein-coding genes associated with the colonization of the subterranean niche may appear within a rather short evolutionary timespan.


2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Yu-Xuan Sun ◽  
Lei Wang ◽  
Guo-Qing Wei ◽  
Cen Qian ◽  
Li-Shang Dai ◽  
...  

Abstract The complete mitochondrial genome (mitogenome) of Leucoma salicis (Lepidoptera: Lymantriidae) was sequenced and annotated. It is a circular molecule of 15,334 bp, containing the 37 genes usually present in insect mitogenomes. All protein-coding genes (PCGs) are initiated by ATN codons, other than cox1, which is initiated by CGA. Three of the 13 PCGs had an incomplete termination codon, T or TA, while the others terminated with TAA. The relative synonymous codon usage of the 13 protein-coding genes (PCGs) was consistent with those of published lepidopteran sequences. All tRNA genes had typical clover-leaf secondary structures, except for the tRNASer(AGN), in which the dihydrouridine (DHU) arm could not form a stable stem-loop structure. The A + T-rich region of 325 bp had several distinctive features, including the motif ‘ATAGA’ followed by an 18 bp poly-T stretch, a microsatellite-like (AT)7 element, and an 11-bp poly-A present immediately upstream of tRNAMet. Relationships among 32 insect species were determined using Maximum Likelihood (ML), Neighbor Joining (NJ) and Bayesian Inference (BI) phylogenetic methods. These analyses confirm that L. salicis belongs to the Lymantriidae; and that Lymantriidae is a member of Noctuoidea, and is a sister taxon to Erebidae, Nolidae and Noctuidae, most closely related to Erebidae.


2019 ◽  
Author(s):  
Zhen Peng ◽  
Yehuda Ben-Shahar

1.AbstractProtein-coding DNA sequences are thought to primarily affect phenotypes via the peptides they encode. Yet, emerging data suggest that, although they do not affect protein sequences, synonymous mutations can cause phenotypic changes. Previously, we have shown that signatures of selection on gene-specific codons usage bias are common in genomes of diverse eukaryotic species. Thus, synonymous codon usage, just as amino acid usage pattern, is likely a regular target of natural selection. Consequently, here we propose the hypothesis that at least for some protein-coding genes, codon clusters with biased synonymous codon usage patterns might represent “hidden” nucleic-acid-level functional domains that affect the action of the corresponding proteins via diverse hypothetical mechanisms. To test our hypothesis, we used computational approaches to identify over 3,000 putatively functional codon clusters (PFCCs) with biased usage patterns in about 1,500 protein-coding genes in the Drosophila melanogaster genome. Specifically, our data suggest that these PFCCs are likely associated with specific categories of gene function, including enrichment in genes that encode membrane-bound and secreted proteins. Yet, the majority of the PFCCs that we have identified are not associated with previously annotated functional protein domains. Although the specific functional significance of the majority of the PFCCs we have identified remains unknown, we show that in the highly conserved family of voltage-gated sodium channels, the existence of rare-codon cluster(s) in the nucleic-acid region that encodes the cytoplasmic loop that constitutes inactivation gate is conserved across paralogs as well as orthologs across distant animal species. Together, our findings suggest that codon clusters with biased usage patterns likely represent “hidden” nucleic-acid-level functional domains that cannot be simply predicted from the amino acid sequences they encode. Therefore, it is likely that on the evolutionary timescale, protein-coding DNA sequences are shaped by both amino-acid-dependent and codon-usage-dependent selective forces.


2021 ◽  
Author(s):  
Asif U Tamuri ◽  
Mario dos Reis

We use first principles of population genetics to model the evolution of proteins under persistent positive selection (PPS). PPS may occur when organisms are subjected to persistent environmental change, during adaptive radiations, or in host-pathogen interactions. Our mutation-selection model indicates protein evolution under PPS is an irreversible Markov process, and thus proteins under PPS show a strongly asymmetrical distribution of selection coefficients among amino acid substitutions. Our model shows the criteria ω > 1 (where ω is the ratio of non-synonymous over synonymous codon substitution rates) to detect positive selection is conservative and indeed arbitrary, because in real proteins many mutations are highly deleterious and are removed by selection even at positively-selected sites. We use a penalized-likelihood implementation of our model to successfully detect PPS in plant RuBisCO and influenza HA proteins. By directly estimating selection coefficients at protein sites, our inference procedure bypasses the need for using ω as a surrogate measure of selection and improves our ability to detect molecular adaptation in proteins.


Author(s):  
Olga Bondareva ◽  
Evgeny Genelt-Yanovskiy ◽  
Tatyana Petrova ◽  
Semen Bodrov ◽  
Antonina Smorkatcheva ◽  
...  

The current study evaluates the selection signals in the evolution of mitochondrial DNA of voles, subfamily Arvicolinae, during the colonization of subterranean environments. The comparative sequence analysis of mitochondrial protein-coding genes of eight subterranean vole species (Prometheomys schaposchnikowi, three species of the genus Ellobius: E. talpinus, E. fuscocapillus and E. lutescens, two species of the genus Terricola: T. subterraneus and T. daghestanicus, Lasiopodomys mandarinus and Hyperacrius fertilis) and their closest aboveground relatives using codon-substitution models was applied. The highest number of selection signatures was detected in genes ATP8 and CYTB. The relaxation of selection was observed in most mtDNA protein-coding genes. In mole voles (genus Ellobius) the signatures of adaptive evolution of mitochondrial genes related to subterranean niche were most pronounced. The number of selection signatures was found to be independent of the evolutionary age of the lineage but fits the degree of specialization to the subterranean niche. The common trends of selective pressures were observed among the evolutionary ancient and highly specialized subterranean rodent families and phylogenetically young lineages of voles. It suggests that the signatures of adaptations in individual mitochondrial protein-coding genes associated with the colonization of the subterranean niche may appear within a rather short evolutionary timespan.


2016 ◽  
Author(s):  
Benjamin D Kaehler ◽  
Von Bing Yap ◽  
Gavin A Huttley

Estimation of natural selection on protein-coding sequences is a key comparative genomics approach for de novo prediction of lineage specific adaptations. Selective pressure is measured on a per-gene basis by comparing the rate of non-synonymous substitutions to the rate of neutral evolution, typically assumed to be the rate of synonymous substitutions. All published codon substitution models have been time-reversible and thus assume that sequence composition does not change over time. We previously demonstrated that if time-reversible DNA substitution models are applied blindly in the presence of changing sequence composition, the number of substitutions is systematically biased towards overestimation. We extend these findings to the case of codon substitution models and further demonstrate that the ratio of non-synonymous to synonymous rates of substitution tends to be underestimated over three data sets of insects, mammals, and vertebrates. Our basis for comparison is a non-stationary codon substitution model that allows sequence composition to change. Model selection and model fit results demonstrate that our new model tends to fit the data better. Direct measurement of non-stationarity shows that bias in estimates of natural selection and genetic distance increases with the degree of violation of the stationarity assumption. Additionally, inferences drawn under time-reversible models are systematically affected by compositional divergence. As genomic sequences accumulate at an accelerating rate, the importance of accurate de novo estimation of natural selection increases. Our results establish that our new model provides a more robust perspective on this fundamental quantity.


Sign in / Sign up

Export Citation Format

Share Document