A novel framework for evaluating the performance of codon usage bias metrics

The unequal utilization of synonymous codons affects numerous cellular processes including translation rates, protein folding and mRNA degradation. In order to understand the biological impact of variable codon usage bias (CUB) between genes and genomes, it is crucial to be able to accurately measure CUB for a given sequence. A large number of metrics have been developed for this purpose, but there is currently no way of systematically testing the accuracy of individual metrics or knowing whether metrics provide consistent results. This lack of standardization can result in false-positive and false-negative findings if underpowered or inaccurate metrics are applied as tools for discovery. Here, we show that the choice of CUB metric impacts both the significance and measured effect sizes in numerous empirical datasets, raising questions about the generality of findings in published research. To bring about standardization, we developed a novel method to create synthetic protein-coding DNA sequences according to different models of codon usage. We use these benchmark sequences to identify the most accurate and robust metrics with regard to sequence length, GC content and amino acid heterogeneity. Finally, we show how our benchmark can aid the development of new metrics by providing feedback on its performance compared to the state of the art.

Download Full-text

Comparative analysis of codon usage patterns in SARS-CoV-2, its mutants and other respiratory viruses

10.1101/2021.03.03.433699 ◽

2021 ◽

Author(s):

Neetu Tyagi ◽

Rahila Sardar ◽

Dinesh Gupta

Keyword(s):

Codon Usage ◽

Codon Usage Bias ◽

Gc Content ◽

Respiratory Illness ◽

Respiratory Viruses ◽

Nucleotide Composition ◽

Health Crisis ◽

Study Results ◽

Usage Patterns ◽

The Difference

AbstractThe Coronavirus disease 2019 (COVID-19) outbreak caused by Severe Acute Respiratory Syndrome Coronavirus 2 virus (SARS-CoV-2) poses a worldwide human health crisis, causing respiratory illness with a high mortality rate. To investigate the factors governing codon usage bias in all the respiratory viruses, including SARS-CoV-2 isolates from different geographical locations (~62K), including two recently emerging strains from the United Kingdom (UK), i.e., VUI202012/01 and South Africa (SA), i.e., 501.Y.V2 codon usage bias (CUBs) analysis was performed. The analysis includes RSCU analysis, GC content calculation, ENC analysis, dinucleotide frequency and neutrality plot analysis. We were motivated to conduct the study to fulfil two primary aims: first, to identify the difference in codon usage bias amongst all SARS-CoV-2 genomes and, secondly, to compare their CUBs properties with other respiratory viruses. A biased nucleotide composition was found as most of the highly preferred codons were A/U-ending in all the respiratory viruses studied here. Compared with the human host, the RSCU analysis led to the identification of 11 over-represented codons and 9 under-represented codons in SARS-CoV-2 genomes. Correlation analysis of ENC and GC3s revealed that mutational pressure is the leading force determining the CUBs. The present study results yield a better understanding of codon usage preferences for SARS-CoV-2 genomes and discover the possible evolutionary determinants responsible for the biases found among the respiratory viruses, thus unveils a unique feature of the SARS-CoV-2 evolution and adaptation. To the best of our knowledge, this is the first attempt at comparative CUBs analysis on the worldwide genomes of SARS-CoV-2, including novel emerged strains and other respiratory viruses.

Download Full-text

Coupling Between Protein Level Selection and Codon Usage Optimization in the Evolution of Bacteria and Archaea

mBio ◽

10.1128/mbio.00956-14 ◽

2014 ◽

Vol 5 (2) ◽

Cited By ~ 25

Author(s):

Wenqi Ran ◽

David M. Kristensen ◽

Eugene V. Koonin

Keyword(s):

Codon Usage ◽

Protein Level ◽

Codon Usage Bias ◽

Protein Sequence ◽

Gc Content ◽

Protein Sequences ◽

Microbial Evolution ◽

Fine Tuning ◽

Selection For ◽

Genomic Gc Content

ABSTRACT The relationship between the selection affecting codon usage and selection on protein sequences of orthologous genes in diverse groups of bacteria and archaea was examined by using the Alignable Tight Genome Clusters database of prokaryote genomes. The codon usage bias is generally low, with 57.5% of the gene-specific optimal codon frequencies (F opt ) being below 0.55. This apparent weak selection on codon usage contrasts with the strong purifying selection on amino acid sequences, with 65.8% of the gene-specific dN/dS ratios being below 0.1. For most of the genomes compared, a limited but statistically significant negative correlation between F opt and dN/dS was observed, which is indicative of a link between selection on protein sequence and selection on codon usage. The strength of the coupling between the protein level selection and codon usage bias showed a strong positive correlation with the genomic GC content. Combined with previous observations on the selection for GC-rich codons in bacteria and archaea with GC-rich genomes, these findings suggest that selection for translational fine-tuning could be an important factor in microbial evolution that drives the evolution of genome GC content away from mutational equilibrium. This type of selection is particularly pronounced in slowly evolving, “high-status” genes. A significantly stronger link between the two aspects of selection is observed in free-living bacteria than in parasitic bacteria and in genes encoding metabolic enzymes and transporters than in informational genes. These differences might reflect the special importance of translational fine-tuning for the adaptability of gene expression to environmental changes. The results of this work establish the coupling between protein level selection and selection for translational optimization as a distinct and potentially important factor in microbial evolution. IMPORTANCE Selection affects the evolution of microbial genomes at many levels, including both the structure of proteins and the regulation of their production. Here we demonstrate the coupling between the selection on protein sequences and the optimization of codon usage in a broad range of bacteria and archaea. The strength of this coupling varies over a wide range and strongly and positively correlates with the genomic GC content. The cause(s) of the evolution of high GC content is a long-standing open question, given the universal mutational bias toward AT. We propose that optimization of codon usage could be one of the key factors that determine the evolution of GC-rich genomes. This work establishes the coupling between selection at the level of protein sequence and at the level of codon choice optimization as a distinct aspect of genome evolution.

Download Full-text

First Complete Genome Sequence of Brucella abortus 2308 isolated from an abortion storm in a dairy farm in India

10.21203/rs.3.rs-420448/v1 ◽

2021 ◽

Author(s):

Amit Kumar ◽

Malyaj R Prajapati ◽

Surendra Upadhyay ◽

Anamika Bhordia ◽

Vinod Kumar Singh ◽

...

Keyword(s):

Genome Sequence ◽

Dna Sequences ◽

Brucella Abortus ◽

Complete Genome Sequence ◽

Complete Genome ◽

Messenger Rna ◽

Gc Content ◽

Dairy Farm ◽

Rrna Genes ◽

Sequence Length

Abstract The present report communicates the first complete genome sequence of Brucella abortus 2308 strain isolated from a an abortion storm in a dairy farm located at Kanpur, Uttar Pradesh in India. It caused the last trimester abortions of 32 animals out of 100 cows in a dairy over a period of 60 days. The bacteria were isolated in pure culture from the placenta of aborted cows. The genome sequence length of isolated bacteria is 3,285,606 bp with a 57.25 % GC content, an N50 value of 296,426, L50 value of 4 containing 3,119 coding DNA sequences (CDSs), 49 tRNAs, 1 transfer messenger RNA (mRNA), and 3 rRNA genes. It is the first report of Brucella abortus 2308 isolation and complete genome sequence from Indian subcontinent.

Download Full-text

Insights into Comparative Genomics, Codon Usage Bias, and Phylogenetic Relationship of Species from Biebersteiniaceae and Nitrariaceae Based on Complete Chloroplast Genomes

Plants ◽

10.3390/plants9111605 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1605

Author(s):

Xiaofeng Chi ◽

Faqi Zhang ◽

Qi Dong ◽

Shilong Chen

Keyword(s):

Codon Usage ◽

Codon Usage Bias ◽

High Similarity ◽

Peganum Harmala ◽

Protein Coding ◽

Variable Regions ◽

Chloroplast Genomes ◽

Nitraria Sibirica ◽

Effective Number Of Codons ◽

Relationship Of

Biebersteiniaceae and Nitrariaceae, two small families, were classified in Sapindales recently. Taxonomic and phylogenetic relationships within Sapindales are still poorly resolved and controversial. In current study, we compared the chloroplast genomes of five species (Biebersteinia heterostemon, Peganum harmala, Nitraria roborowskii, Nitraria sibirica, and Nitraria tangutorum) from Biebersteiniaceae and Nitrariaceae. High similarity was detected in the gene order, content and orientation of the five chloroplast genomes; 13 highly variable regions were identified among the five species. An accelerated substitution rate was found in the protein-coding genes, especially clpP. The effective number of codons (ENC), parity rule 2 (PR2), and neutrality plots together revealed that the codon usage bias is affected by mutation and selection. The phylogenetic analysis strongly supported (Nitrariaceae (Biebersteiniaceae + The Rest)) relationships in Sapindales. Our findings can provide useful information for analyzing phylogeny and molecular evolution within Biebersteiniaceae and Nitrariaceae.

Download Full-text

Mutation Profiles, Glycosylation Site Distribution and Codon Usage Bias of HPV16

10.1101/2021.03.04.434005 ◽

2021 ◽

Author(s):

Zhihua Ou ◽

Wei Liu ◽

Junhua LI ◽

Hongli Du

Keyword(s):

Codon Usage ◽

Codon Usage Bias ◽

Vaccine Development ◽

Gc Content ◽

Surface Protein ◽

Nucleotide Composition ◽

Glycosylation Site ◽

Protein Glycosylation ◽

Adaptation Mechanism ◽

Site Distribution

Human papillomavirus type 16 (HPV16) is the most prevalent HPV type causing cervical cancers. Herein, using 1,597 full genomes of HPV16, we systemically investigated the mutation profiles, surface protein glycosylation sites and the codon usage bias of the eight open reading frames (ORFs) of HPV16 genomes from different lineages and sublineages. Multiple lineage- or subline-age-specific mutation sites were identified. Glycosylation analysis showed that HPV16 lineage D contained the highest number of unique potential glycosylation site in both L1 and L2 capsid protein, which might lead to their antigenic distances from other HPV16 lineages. Nucleotide composition of HPV16 showed that the overall AT content was higher than GC content at the 3rd codon position. Relatively high ENC values suggested that the HPV16 ORFs didn't have strong codon usage bias. Most of the HPV16 ORFs were mainly governed by natural selection pressure such as translational pressure, except for L2. HPV16 only shared some of the preferred codons with human, which might help reduce competition in translational resources. These findings may help increase our understanding of the heterogeneity between HPV16 lineages and sublineages, and the adaptation mechanism of HPV in human cells, which might facilitate HPV classification and improve vaccine development and application.

Download Full-text

Codon-Optimized Fluorescent Proteins Designed for Expression in Low-GC Gram-Positive Bacteria

Applied and Environmental Microbiology ◽

10.1128/aem.02066-08 ◽

2009 ◽

Vol 75 (7) ◽

pp. 2099-2110 ◽

Cited By ~ 46

Author(s):

Inka Sastalla ◽

Kannie Chim ◽

Gordon Y. C. Cheung ◽

Andrei P. Pomerantsev ◽

Stephen H. Leppla

Keyword(s):

Codon Usage ◽

Dna Sequences ◽

Fluorescent Protein ◽

Protective Antigen ◽

Fluorescent Proteins ◽

Yellow Fluorescent Protein ◽

Gc Content ◽

Virulence Plasmid ◽

Positive Bacterium ◽

Gram Positive

ABSTRACT Fluorescent proteins have wide applications in biology. However, not all of these proteins are properly expressed in bacteria, especially if the codon usage and genomic GC content of the host organism are not ideal for high expression. In this study, we analyzed the DNA sequences of multiple fluorescent protein genes with respect to codons and GC content and compared them to a low-GC gram-positive bacterium, Bacillus anthracis. We found high discrepancies for cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and the photoactivatable green fluorescent protein (PAGFP), but not GFP, with regard to GC content and codon usage. Concomitantly, when the proteins were expressed in B. anthracis, CFP- and YFP-derived fluorescence was undetectable microscopically, a phenomenon caused not by lack of gene transcription or degradation of the proteins but by lack of protein expression. To improve expression in bacteria with low genomic GC contents, we synthesized a codon-optimized gfp and constructed optimized photoactivatable pagfp, cfp, and yfp, which were in contrast to nonoptimized genes highly expressed in B. anthracis and in another low-GC gram-positive bacterium, Staphylococcus aureus. Using optimized GFP as a reporter, we were able to monitor the activity of the protective antigen promoter of B. anthracis and confirm its dependence on bicarbonate and regulators present on virulence plasmid pXO1.

Download Full-text

Comparisons of Coding and Noncoding Sequences to infer the Origin of Codon usage Bias

10.1101/174359 ◽

2017 ◽

Author(s):

Prashant Mainali ◽

Sobita Pathak

Keyword(s):

Codon Usage ◽

Codon Usage Bias ◽

Gc Content ◽

Mutation Bias ◽

Translation Process ◽

Noncoding Regions ◽

Coding Regions ◽

Synonymous Codons ◽

Shed Light

ABSTRACTCodon usage bias is the preferential use of the subset of synonymous codons during translation. In this paper, the comparisons of normalized entropy and GC content between the sequence of coding regions of Escherichia coli k12 and noncoding regions (ncRNA, rRNA) of various organisms were done to shed light on the origin of the codon usage bias.The normalized entropy of the coding regions was found significantly higher than the noncoding regions, suggesting the role of the translation process in shaping codon usage bias. Further, when the position specific GC content of both coding and noncoding regions was analyzed, the GC2 content in coding regions was lower than GC1 and GC2 while in noncoding regions, the GC1, GC2, GC3 contents were approximately equal. This discrepancy is explained by the biased mutation coupled with the presence and absence of selection pressure. The accumulation of CG content occurs in the sequences due to mutation bias in DNA repair and recombination process. In noncoding regions, the mutation is harmful and thus, selected against while due to the degeneracy of codons in coding regions, a mutation in GC3 is neutral and hence, not selected. Thus, the accumulation of GC content occurs in coding regions, and thus codon usage bias occurs.

Download Full-text

Analysis of codon usage bias reveals optimal codons in Elaeis guineensis

Biodiversitas Journal of Biological Diversity ◽

10.13057/biodiv/d211138 ◽

2020 ◽

Vol 21 (11) ◽

Author(s):

Redi Aditama ◽

Zulfikar Achmad Tanjung ◽

Widyartini Made Sudania ◽

Yogo Adhi Nugroho ◽

Condro Utomo ◽

...

Keyword(s):

Codon Usage ◽

Oil Palm ◽

Codon Usage Bias ◽

Elaeis Guineensis ◽

Synonymous Codon ◽

Gc Content ◽

Synonymous Codon Usage ◽

Mutational Bias ◽

Optimal Codons ◽

Good Ability

Abstract. Aditama R, Tanjung ZA, Sudania WM, Nugroho YA, Utomo C, Liwang T. 2020. Analysis of codon usage bias reveals optimal codons in Elaeis guineensis. Biodiversitas 21: 5331-5337. Codon usage bias of oil palm genome was reported employing several indices, including GC content, relative synonymous codon usage (RSCU), the effective number of codons (ENC), and codon adaptation index (CAI). Unimodal distribution of GC content was observed and matched with non-grass monocots characteristics. Correspondence analysis (COA) on synonymous codon usage bias showed that the main axis was strongly driven by GC content. The ENC and neutrality plot of oil palm genes indicating that natural selection played more vital role compared to mutational bias on shaping codon usage bias. A positive correlation between calculated CAI and experimental data of oil palm gene expression was detected indicating good ability of this index. Finally, eighteen codons were defined as “optimal codons” that may provide a useful reference for heterogeneous expression and genome editing studies.

Download Full-text

Comprehensive Analysis and Comparison on the Codon Usage Pattern of Whole Mycobacterium tuberculosis Coding Genome from Different Area

BioMed Research International ◽

10.1155/2018/3574976 ◽

2018 ◽

Vol 2018 ◽

pp. 1-7 ◽

Cited By ~ 6

Author(s):

Li Gun ◽

Ren Yumiao ◽

Pan Haixian ◽

Zhang Liang

Keyword(s):

Mycobacterium Tuberculosis ◽

Codon Usage ◽

Codon Usage Bias ◽

Gc Content ◽

Codon Usage Pattern ◽

Usage Pattern ◽

Strong Negative Correlation ◽

Synonymous Codons ◽

The Relationship ◽

Adaptation Index

Phenomenon of unequal use of synonymous codons in Mycobacterium tuberculosis is common. Codon usage bias not only plays an important regulatory role at the level of gene expression, but also helps in improving the accuracy and efficiency of translation. Meanwhile, codon usage pattern of Mycobacterium tuberculosis genome is important for interpreting evolutionary characteristics in species. In order to investigate the codon usage pattern of the Mycobacterium tuberculosis genome, 12 Mycobacterium tuberculosis genomes from different area are downloaded from the GeneBank. The correlations between G3, GC12, whole GC content, codon adaptation index, codon bias index, and so on of Mycobacterium tuberculosis genomes are calculated. The ENC-plot, relationship between A3/(A3+T3) and G3/(G3+C3), GC12 versus GC3 plot, and the RSCU of overall/separated genomes all show that the codon usage bias exists in all 12 Mycobacterium tuberculosis genomes. Lastly, relationship between CBI and the equalization of ENC shows a strong negative correlation between them. The relationship between protein length and GC content (GC3 and GC12) shows that more obvious differences in the GC content may be in shorter protein. These results show that codon usage bias existing in the Mycobacterium tuberculosis genomes could be used for further study on their evolutionary phenomenon.

Download Full-text

A Comparison of Synonymous Codon Usage Bias Patterns in DNA and RNA Virus Genomes: Quantifying the Relative Importance of Mutational Pressure and Natural Selection

BioMed Research International ◽

10.1155/2013/406342 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10 ◽

Cited By ~ 21

Author(s):

Youhua Chen

Keyword(s):

Natural Selection ◽

Codon Usage ◽

Total Variation ◽

Codon Usage Bias ◽

Rna Viruses ◽

Gc Content ◽

Relative Importance ◽

Mutational Pressure ◽

Dna And Rna ◽

Usage Patterns

Codon usage bias patterns have been broadly explored for many viruses. However, the relative importance of mutation pressure and natural selection is still under debate. In the present study, I tried to resolve controversial issues on determining the principal factors of codon usage patterns for DNA and RNA viruses, respectively, by examining over 38000 ORFs. By utilizing variation partitioning technique, the results showed that 27% and 21% of total variation could be attributed to mutational pressure, while 5% and 6% of total variation could be explained by natural selection for DNA and RNA viruses, respectively, in codon usage patterns. Furthermore, the combined effect of mutational pressure and natural selection on influencing codon usage patterns of viruses is substantial (explaining 10% and 8% of total variation of codon usage patterns). With respect to GC variation, GC content is always negatively and significantly correlated with aromaticity. Interestingly, the signs for the significant correlations between GC, gene lengths, and hydrophobicity are completely opposite between DNA and RNA viruses, being positive for DNA viruses while being negative for RNA viruses. At last, GC12 versus G3s plot suggests that natural selection is more important than mutational pressure on influencing the GC content in the first and second codon positions.

Download Full-text