Quantifying Codon Usage in Signal Peptides: Gene Expression and Amino Acid Usage Explain Apparent Selection for Inefficient Codons

AbstractThe Sec secretion pathway is found across all domains of life. A critical feature of Sec secreted proteins is the signal peptide, a short peptide with distinct physicochemical properties located at the N-terminus of the protein. Previous work indicates signal peptides are biased towards translationally inefficient codons, which is hypothesized to be an adaptation driven by selection to improve the efficacy and efficiency of the protein secretion mechanisms. We investigate codon usage in the signal peptides of E. coli using the Codon Adaptation Index (CAI), the tRNA Adaptation Index (tAI), and the ribosomal overhead cost formulation of the stochastic evolutionary model of protein production rates (ROC-SEMPPR). Comparisons between signal peptides and 5’-end of cytoplasmic proteins using CAI and tAI are consistent with a preference for inefficient codons in signal peptides. Simulations reveal these differences are due to amino acid usage and gene expression - we find these differences disappear when accounting for both factors. In contrast, ROC-SEMPPR, a mechanistic population genetics model capable of separating the effects of selection and mutation bias, shows codon usage bias (CUB) of the signal peptides is indistinguishable from the 5’-ends of cytoplasmic proteins. Additionally, we find CUB at the 5’-ends is weaker than later segments of the gene. Results illustrate the value in using models grounded in population genetics to interpret genetic data. We show failure to account for mutation bias and the effects of gene expression on the efficacy of selection against translation inefficiency can lead to a misinterpretation of codon usage patterns.

Download Full-text

Quantifying codon usage in signal peptides: Gene expression and amino acid usage explain apparent selection for inefficient codons

Biochimica et Biophysica Acta (BBA) - Biomembranes ◽

10.1016/j.bbamem.2018.09.010 ◽

2018 ◽

Vol 1860 (12) ◽

pp. 2479-2485 ◽

Cited By ~ 2

Author(s):

Alexander L. Cope ◽

Robert L. Hettich ◽

Michael A. Gilchrist

Keyword(s):

Gene Expression ◽

Amino Acid ◽

Codon Usage ◽

Amino Acid Usage ◽

Signal Peptides ◽

Selection For

Download Full-text

Massively parallel gene expression variation measurement of a synonymous codon library

BMC Genomics ◽

10.1186/s12864-021-07462-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Alexander Schmitz ◽

Fuzhong Zhang

Keyword(s):

Gene Expression ◽

Codon Usage ◽

Single Cells ◽

Massively Parallel ◽

Protein Abundance ◽

Translation Efficiency ◽

Gene Expression Variation ◽

Expression Variation ◽

Change In Mean ◽

Adaptation Index

Abstract Background Cell-to-cell variation in gene expression strongly affects population behavior and is key to multiple biological processes. While codon usage is known to affect ensemble gene expression, how codon usage influences variation in gene expression between single cells is not well understood. Results Here, we used a Sort-seq based massively parallel strategy to quantify gene expression variation from a green fluorescent protein (GFP) library containing synonymous codons in Escherichia coli. We found that sequences containing codons with higher tRNA Adaptation Index (TAI) scores, and higher codon adaptation index (CAI) scores, have higher GFP variance. This trend is not observed for codons with high Normalized Translation Efficiency Index (nTE) scores nor from the free energy of folding of the mRNA secondary structure. GFP noise, or squared coefficient of variance (CV2), scales with mean protein abundance for low-abundant proteins but does not change at high mean protein abundance. Conclusions Our results suggest that the main source of noise for high-abundance proteins is likely not originating at translation elongation. Additionally, the drastic change in mean protein abundance with small changes in protein noise seen from our library implies that codon optimization can be performed without concerning gene expression noise for biotechnology applications.

Download Full-text

Intragenomic variation in mutation biases causes underestimation of selection on synonymous codon usage

10.1101/2021.10.29.466462 ◽

2021 ◽

Author(s):

Alexander L Cope ◽

Premal Shah

Keyword(s):

Population Genetics ◽

Natural Selection ◽

Codon Usage ◽

Codon Bias ◽

Synonymous Codon ◽

Synonymous Codon Usage ◽

Mutation Bias ◽

Biased Gene Conversion ◽

Intragenomic Variation ◽

The Impact

Patterns of non-uniform usage of synonymous codons (codon bias) varies across genes in an organism and across species from all domains of life. The bias in codon usage is due to a combination of both non-adaptive (e.g. mutation biases) and adaptive (e.g. natural selection for translation efficiency/accuracy) evolutionary forces. Most population genetics models quantify the effects of mutation bias and selection on shaping codon usage patterns assuming a uniform mutation bias across the genome. However, mutation biases can vary both along and across chromosomes due to processes such as biased gene conversion, potentially obfuscating signals of translational selection. Moreover, estimates of variation in genomic mutation biases are often lacking for non-model organisms. Here, we combine an unsupervised learning method with a population genetics model of synonymous codon bias evolution to assess the impact of intragenomic variation in mutation bias on the strength and direction of natural selection on synonymous codon usage across 49 Saccharomycotina budding yeasts. We find that in the absence of a priori information, unsupervised learning approaches can be used to identify regions evolving under different mutation biases. We find that the impact of intragenomic variation in mutation bias varies widely, even among closely-related species. We show that the overall strength and direction of selection on codon usage can be underestimated by failing to account for intragenomic variation in mutation biases. Interestingly, genes falling into clusters identified by machine learning are also often physically clustered across chromosomes, consistent with processes such as biased gene conversion. Our results indicate the need for more nuanced models of sequence evolution that systematically incorporate the effects of variable mutation biases on codon frequencies.

Download Full-text

Gene Expression Levels Are Correlated with Synonymous Codon Usage, Amino Acid Composition, and Gene Architecture in the Red Flour Beetle, Tribolium castaneum

Molecular Biology and Evolution ◽

10.1093/molbev/mss184 ◽

2012 ◽

Vol 29 (12) ◽

pp. 3755-3766 ◽

Cited By ~ 28

Author(s):

Anna Williford ◽

Jeffery P. Demuth

Keyword(s):

Gene Expression ◽

Amino Acid ◽

Codon Usage ◽

Amino Acid Composition ◽

Tribolium Castaneum ◽

Synonymous Codon ◽

Synonymous Codon Usage ◽

Red Flour Beetle ◽

Gene Architecture ◽

Gene Expression Levels

Download Full-text

Impact of bias discrepancy and amino acid usage on estimates of the effective number of codons used in a gene, and a test for selection on codon usage

Gene ◽

10.1016/j.gene.2007.12.001 ◽

2008 ◽

Vol 410 (1) ◽

pp. 82-88 ◽

Cited By ~ 12

Author(s):

Anders Fuglsang

Keyword(s):

Amino Acid ◽

Codon Usage ◽

Amino Acid Usage ◽

Effective Number ◽

Effective Number Of Codons

Download Full-text

Gene expression level influences amino acid usage, but not codon usage, in the tsetse fly endosymbiont Wigglesworthia

Microbiology ◽

10.1099/mic.0.26381-0 ◽

2003 ◽

Vol 149 (9) ◽

pp. 2585-2596 ◽

Cited By ~ 37

Author(s):

Joshua T. Herbeck ◽

Dennis P. Wall ◽

Jennifer J. Wernegreen

Keyword(s):

Amino Acid ◽

Codon Usage ◽

Genetic Drift ◽

Synonymous Codon ◽

Synonymous Codon Usage ◽

Tsetse Fly ◽

High Expression ◽

Amino Acid Usage ◽

Mutational Bias ◽

Low Expression

Wigglesworthia glossinidia brevipalpis, the obligate bacterial endosymbiont of the tsetse fly Glossina brevipalpis, is characterized by extreme genome reduction and AT nucleotide composition bias. Here, multivariate statistical analyses are used to test the hypothesis that mutational bias and genetic drift shape synonymous codon usage and amino acid usage of Wigglesworthia. The results show that synonymous codon usage patterns vary little across the genome and do not distinguish genes of putative high and low expression levels, thus indicating a lack of translational selection. Extreme AT composition bias across the genome also drives relative amino acid usage, but predicted high-expression genes (ribosomal proteins and chaperonins) use GC-rich amino acids more frequently than do low-expression genes. The levels and configuration of amino acid differences between Wigglesworthia and Escherichia coli were compared to test the hypothesis that the relatively GC-rich amino acid profiles of high-expression genes reflect greater amino acid conservation at these loci. This hypothesis is supported by reduced levels of protein divergence at predicted high-expression Wigglesworthia genes and similar configurations of amino acid changes across expression categories. Combined, the results suggest that codon and amino acid usage in the Wigglesworthia genome reflect a strong AT mutational bias and elevated levels of genetic drift, consistent with expected effects of an endosymbiotic lifestyle and repeated population bottlenecks. However, these impacts of mutation and drift are apparently attenuated by selection on amino acid composition at high-expression genes.

Download Full-text

Synonymous Codon Usages as an Evolutionary Dynamic for Chlamydiaceae

International Journal of Molecular Sciences ◽

10.3390/ijms19124010 ◽

2018 ◽

Vol 19 (12) ◽

pp. 4010

Author(s):

Zhaocai Li ◽

Wen Hu ◽

Xiaoan Cao ◽

Ping Liu ◽

Youjun Shang ◽

...

Keyword(s):

Amino Acid ◽

Codon Usage ◽

Synonymous Codon ◽

Family Members ◽

Synonymous Codon Usage ◽

Amino Acid Usage ◽

Codon Usage Pattern ◽

Evolutionary Trend ◽

Mutation Pressure ◽

Wide Range

The family of Chlamydiaceae contains a group of obligate intracellular bacteria that can infect a wide range of hosts. The evolutionary trend of members in this family is a hot topic, which benefits our understanding of the cross-infection of these pathogens. In this study, 14 whole genomes of 12 Chlamydia species were used to investigate the nucleotide, codon, and amino acid usage bias by synonymous codon usage value and information entropy method. The results showed that all the studied Chlamydia spp. had A/T rich genes with over-represented A or T at the third positions and G or C under-represented at these positions, suggesting that nucleotide usages influenced synonymous codon usages. The overall codon usage trend from synonymous codon usage variations divides the Chlamydia spp. into four separate clusters, while amino acid usage divides the Chlamydia spp. into two clusters with some exceptions, which reflected the genetic diversity of the Chlamydiaceae family members. The overall codon usage pattern represented by the effective number of codons (ENC) was significantly positively correlated to gene GC3 content. A negative correlation exists between ENC and the codon adaptation index for some Chlamydia species. These results suggested that mutation pressure caused by nucleotide composition constraint played an important role in shaping synonymous codon usage patterns. Furthermore, codon usage of T3ss and Pmps gene families adapted to that of the corresponding genome. Taken together, analyses help our understanding of evolutionary interactions between nucleotide, synonymous codon, and amino acid usages in genes of Chlamydiaceae family members.

Download Full-text

Gene expression levels influence amino acid usage and evolutionary rates in endosymbiotic bacteria

Gene ◽

10.1016/j.gene.2005.04.003 ◽

2005 ◽

Vol 352 ◽

pp. 109-117 ◽

Cited By ~ 18

Author(s):

Jörg Schaber ◽

Claude Rispe ◽

Jennifer Wernegreen ◽

Andreas Buness ◽

François Delmotte ◽

...

Keyword(s):

Gene Expression ◽

Amino Acid ◽

Evolutionary Rates ◽

Amino Acid Usage ◽

Expression Levels ◽

Endosymbiotic Bacteria ◽

Gene Expression Levels

Download Full-text

Analysis of the Relationship between Genomic GC Content and Patterns of Base Usage, Codon Usage and Amino Acid Usage in Prokaryotes: Similar GC Content Adopts Similar Compositional Frequencies Regardless of the Phylogenetic Lineages

PLoS ONE ◽

10.1371/journal.pone.0107319 ◽

2014 ◽

Vol 9 (9) ◽

pp. e107319 ◽

Cited By ~ 13

Author(s):

Hui-Qi Zhou ◽

Lu-Wen Ning ◽

Hui-Xiong Zhang ◽

Feng-Biao Guo

Keyword(s):

Amino Acid ◽

Codon Usage ◽

Gc Content ◽

Amino Acid Usage ◽

Phylogenetic Lineages ◽

The Relationship ◽

Base Usage ◽

Genomic Gc Content

Download Full-text

Analysis of Mutation Bias in Shaping Codon Usage Bias and Its Association with Gene Expression Across Species

10.29007/87r9 ◽

2020 ◽

Author(s):

Zhixiu Lu ◽

Michael Gilchrist ◽

Scott Emrich

Keyword(s):

Gene Expression ◽

Codon Usage ◽

Codon Usage Bias ◽

Synonymous Codon ◽

Synonymous Codon Usage ◽

Mutation Bias ◽

Protein Coding ◽

E Coli ◽

Synonymous Codons ◽

Computation Efficiency

Codon usage bias has been known to reflect the expression level of a protein-coding gene under the evolutionary theory that selection favors certain synonymous codons. Although measuring the effect of selection in simple organisms such as yeast and E. coli has proven to be effective and accurate, codon-based methods perform less well in plants and humans. In this paper, we extend a prior method that incorporates another evolutionary factor, namely mutation bias and its effect on codon usage. Our results indicate that prediction of gene expression is significantly improved under our framework, and suggests that quantification of mutation bias is essential for fully understanding synonymous codon usage. We also propose an improved method, namely MLE-Φ, with much greater computation efficiency and a wider range of applications. An implementation of this method is provided at https://github.com/luzhixiu1996/MLE- Phi.

Download Full-text