scholarly journals Quantifying Codon Usage in Signal Peptides: Gene Expression and Amino Acid Usage Explain Apparent Selection for Inefficient Codons

2018 ◽  
Author(s):  
Alexander L. Cope ◽  
Robert L. Hettich ◽  
Michael A. Gilchrist

AbstractThe Sec secretion pathway is found across all domains of life. A critical feature of Sec secreted proteins is the signal peptide, a short peptide with distinct physicochemical properties located at the N-terminus of the protein. Previous work indicates signal peptides are biased towards translationally inefficient codons, which is hypothesized to be an adaptation driven by selection to improve the efficacy and efficiency of the protein secretion mechanisms. We investigate codon usage in the signal peptides of E. coli using the Codon Adaptation Index (CAI), the tRNA Adaptation Index (tAI), and the ribosomal overhead cost formulation of the stochastic evolutionary model of protein production rates (ROC-SEMPPR). Comparisons between signal peptides and 5’-end of cytoplasmic proteins using CAI and tAI are consistent with a preference for inefficient codons in signal peptides. Simulations reveal these differences are due to amino acid usage and gene expression - we find these differences disappear when accounting for both factors. In contrast, ROC-SEMPPR, a mechanistic population genetics model capable of separating the effects of selection and mutation bias, shows codon usage bias (CUB) of the signal peptides is indistinguishable from the 5’-ends of cytoplasmic proteins. Additionally, we find CUB at the 5’-ends is weaker than later segments of the gene. Results illustrate the value in using models grounded in population genetics to interpret genetic data. We show failure to account for mutation bias and the effects of gene expression on the efficacy of selection against translation inefficiency can lead to a misinterpretation of codon usage patterns.

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Alexander Schmitz ◽  
Fuzhong Zhang

Abstract Background Cell-to-cell variation in gene expression strongly affects population behavior and is key to multiple biological processes. While codon usage is known to affect ensemble gene expression, how codon usage influences variation in gene expression between single cells is not well understood. Results Here, we used a Sort-seq based massively parallel strategy to quantify gene expression variation from a green fluorescent protein (GFP) library containing synonymous codons in Escherichia coli. We found that sequences containing codons with higher tRNA Adaptation Index (TAI) scores, and higher codon adaptation index (CAI) scores, have higher GFP variance. This trend is not observed for codons with high Normalized Translation Efficiency Index (nTE) scores nor from the free energy of folding of the mRNA secondary structure. GFP noise, or squared coefficient of variance (CV2), scales with mean protein abundance for low-abundant proteins but does not change at high mean protein abundance. Conclusions Our results suggest that the main source of noise for high-abundance proteins is likely not originating at translation elongation. Additionally, the drastic change in mean protein abundance with small changes in protein noise seen from our library implies that codon optimization can be performed without concerning gene expression noise for biotechnology applications.


2021 ◽  
Author(s):  
Alexander L Cope ◽  
Premal Shah

Patterns of non-uniform usage of synonymous codons (codon bias) varies across genes in an organism and across species from all domains of life. The bias in codon usage is due to a combination of both non-adaptive (e.g. mutation biases) and adaptive (e.g. natural selection for translation efficiency/accuracy) evolutionary forces. Most population genetics models quantify the effects of mutation bias and selection on shaping codon usage patterns assuming a uniform mutation bias across the genome. However, mutation biases can vary both along and across chromosomes due to processes such as biased gene conversion, potentially obfuscating signals of translational selection. Moreover, estimates of variation in genomic mutation biases are often lacking for non-model organisms. Here, we combine an unsupervised learning method with a population genetics model of synonymous codon bias evolution to assess the impact of intragenomic variation in mutation bias on the strength and direction of natural selection on synonymous codon usage across 49 Saccharomycotina budding yeasts. We find that in the absence of a priori information, unsupervised learning approaches can be used to identify regions evolving under different mutation biases. We find that the impact of intragenomic variation in mutation bias varies widely, even among closely-related species. We show that the overall strength and direction of selection on codon usage can be underestimated by failing to account for intragenomic variation in mutation biases. Interestingly, genes falling into clusters identified by machine learning are also often physically clustered across chromosomes, consistent with processes such as biased gene conversion. Our results indicate the need for more nuanced models of sequence evolution that systematically incorporate the effects of variable mutation biases on codon frequencies.


Microbiology ◽  
2003 ◽  
Vol 149 (9) ◽  
pp. 2585-2596 ◽  
Author(s):  
Joshua T. Herbeck ◽  
Dennis P. Wall ◽  
Jennifer J. Wernegreen

Wigglesworthia glossinidia brevipalpis, the obligate bacterial endosymbiont of the tsetse fly Glossina brevipalpis, is characterized by extreme genome reduction and AT nucleotide composition bias. Here, multivariate statistical analyses are used to test the hypothesis that mutational bias and genetic drift shape synonymous codon usage and amino acid usage of Wigglesworthia. The results show that synonymous codon usage patterns vary little across the genome and do not distinguish genes of putative high and low expression levels, thus indicating a lack of translational selection. Extreme AT composition bias across the genome also drives relative amino acid usage, but predicted high-expression genes (ribosomal proteins and chaperonins) use GC-rich amino acids more frequently than do low-expression genes. The levels and configuration of amino acid differences between Wigglesworthia and Escherichia coli were compared to test the hypothesis that the relatively GC-rich amino acid profiles of high-expression genes reflect greater amino acid conservation at these loci. This hypothesis is supported by reduced levels of protein divergence at predicted high-expression Wigglesworthia genes and similar configurations of amino acid changes across expression categories. Combined, the results suggest that codon and amino acid usage in the Wigglesworthia genome reflect a strong AT mutational bias and elevated levels of genetic drift, consistent with expected effects of an endosymbiotic lifestyle and repeated population bottlenecks. However, these impacts of mutation and drift are apparently attenuated by selection on amino acid composition at high-expression genes.


2018 ◽  
Vol 19 (12) ◽  
pp. 4010
Author(s):  
Zhaocai Li ◽  
Wen Hu ◽  
Xiaoan Cao ◽  
Ping Liu ◽  
Youjun Shang ◽  
...  

The family of Chlamydiaceae contains a group of obligate intracellular bacteria that can infect a wide range of hosts. The evolutionary trend of members in this family is a hot topic, which benefits our understanding of the cross-infection of these pathogens. In this study, 14 whole genomes of 12 Chlamydia species were used to investigate the nucleotide, codon, and amino acid usage bias by synonymous codon usage value and information entropy method. The results showed that all the studied Chlamydia spp. had A/T rich genes with over-represented A or T at the third positions and G or C under-represented at these positions, suggesting that nucleotide usages influenced synonymous codon usages. The overall codon usage trend from synonymous codon usage variations divides the Chlamydia spp. into four separate clusters, while amino acid usage divides the Chlamydia spp. into two clusters with some exceptions, which reflected the genetic diversity of the Chlamydiaceae family members. The overall codon usage pattern represented by the effective number of codons (ENC) was significantly positively correlated to gene GC3 content. A negative correlation exists between ENC and the codon adaptation index for some Chlamydia species. These results suggested that mutation pressure caused by nucleotide composition constraint played an important role in shaping synonymous codon usage patterns. Furthermore, codon usage of T3ss and Pmps gene families adapted to that of the corresponding genome. Taken together, analyses help our understanding of evolutionary interactions between nucleotide, synonymous codon, and amino acid usages in genes of Chlamydiaceae family members.


Gene ◽  
2005 ◽  
Vol 352 ◽  
pp. 109-117 ◽  
Author(s):  
Jörg Schaber ◽  
Claude Rispe ◽  
Jennifer Wernegreen ◽  
Andreas Buness ◽  
François Delmotte ◽  
...  

10.29007/87r9 ◽  
2020 ◽  
Author(s):  
Zhixiu Lu ◽  
Michael Gilchrist ◽  
Scott Emrich

Codon usage bias has been known to reflect the expression level of a protein-coding gene under the evolutionary theory that selection favors certain synonymous codons. Although measuring the effect of selection in simple organisms such as yeast and E. coli has proven to be effective and accurate, codon-based methods perform less well in plants and humans. In this paper, we extend a prior method that incorporates another evolutionary factor, namely mutation bias and its effect on codon usage. Our results indicate that prediction of gene expression is significantly improved under our framework, and suggests that quantification of mutation bias is essential for fully understanding synonymous codon usage. We also propose an improved method, namely MLE-Φ, with much greater computation efficiency and a wider range of applications. An implementation of this method is provided at https://github.com/luzhixiu1996/MLE- Phi.


Sign in / Sign up

Export Citation Format

Share Document