scholarly journals Position-dependent Codon Usage Bias in the Human Transcriptome

2021 ◽  
Author(s):  
Kaavya Subramanian ◽  
Nathan Waugh ◽  
Cole Shanks ◽  
David A Hendrix

All life depends on the reliable translation of RNA to protein according to complex interactions between translation machinery and RNA sequence features. While ribosomal occupancy and codon frequencies vary across coding regions, well-established metrics for computing coding potential of RNA do not capture such positional dependence. Here, we investigate position-dependent codon usage bias (PDCUB), which dynamically accounts for the position of protein-coding signals embedded within coding regions. We demonstrate the existence of PDCUB in the human transcriptome, and show that it can be used to predict translation-initiating codons with greater accuracy than other models. We further show that observed PDCUB is not accounted for by other common metrics, including position-dependent GC content, consensus sequences, and the presence of signal peptides in the translation product. More importantly, PDCUB defines a spectrum of translational efficiency supported by ribosomal occupancy and tRNA adaptation index (tAI). High PDCUB scores correspond to a tAI-defined translational ramp and low ribosomal occupancy, while low PDCUB scores exhibit a translational valley and the highest ribosomal occupancy. Finally, we examine the relationship between PDCUB intensity and functional enrichment. We find that transcripts with start codons showing the highest PDCUB are enriched for functions relating to the regulation of synaptic signaling and plasticity, as well as skeletal, heart, and nervous-system development. Furthermore, transcripts with high PDCUB are depleted for functions related to immune response and detection of chemical stimulus. These findings lay important groundwork for advances in our understanding of the regulation of translation, the calculation of coding potential, and the classification of RNA transcripts.

2017 ◽  
Author(s):  
Prashant Mainali ◽  
Sobita Pathak

ABSTRACTCodon usage bias is the preferential use of the subset of synonymous codons during translation. In this paper, the comparisons of normalized entropy and GC content between the sequence of coding regions of Escherichia coli k12 and noncoding regions (ncRNA, rRNA) of various organisms were done to shed light on the origin of the codon usage bias.The normalized entropy of the coding regions was found significantly higher than the noncoding regions, suggesting the role of the translation process in shaping codon usage bias. Further, when the position specific GC content of both coding and noncoding regions was analyzed, the GC2 content in coding regions was lower than GC1 and GC2 while in noncoding regions, the GC1, GC2, GC3 contents were approximately equal. This discrepancy is explained by the biased mutation coupled with the presence and absence of selection pressure. The accumulation of CG content occurs in the sequences due to mutation bias in DNA repair and recombination process. In noncoding regions, the mutation is harmful and thus, selected against while due to the degeneracy of codons in coding regions, a mutation in GC3 is neutral and hence, not selected. Thus, the accumulation of GC content occurs in coding regions, and thus codon usage bias occurs.


Author(s):  
Davide Arella ◽  
Maddalena Dilucca ◽  
Andrea Giansanti

AbstractIn each genome, synonymous codons are used with different frequencies; this general phenomenon is known as codon usage bias. It has been previously recognised that codon usage bias could affect the cellular fitness and might be associated with the ecology of microbial organisms. In this exploratory study, we investigated the relationship between codon usage bias, lifestyles (thermophiles vs. mesophiles; pathogenic vs. non-pathogenic; halophilic vs. non-halophilic; aerobic vs. anaerobic and facultative) and habitats (aquatic, terrestrial, host-associated, specialised, multiple) of 615 microbial organisms (544 bacteria and 71 archaea). Principal component analysis revealed that species with given phenotypic traits and living in similar environmental conditions have similar codon preferences, as represented by the relative synonymous codon usage (RSCU) index, and similar spectra of tRNA availability, as gauged by the tRNA gene copy number (tGCN). Moreover, by measuring the average tRNA adaptation index (tAI) for each genome, an index that can be associated with translational efficiency, we observed that organisms able to live in multiple habitats, including facultative organisms, mesophiles and pathogenic bacteria, are characterised by a reduced translational efficiency, consistently with their need to adapt to different environments. Our results show that synonymous codon choices might be under strong translational selection, which modulates the choice of the codons to differently match tRNA availability, depending on the organism’s lifestyle needs. To our knowledge, this is the first large-scale study that examines the role of codon bias and translational efficiency in the adaptation of microbial organisms to the environment in which they live.


2021 ◽  
Author(s):  
Neetu Tyagi ◽  
Rahila Sardar ◽  
Dinesh Gupta

AbstractThe Coronavirus disease 2019 (COVID-19) outbreak caused by Severe Acute Respiratory Syndrome Coronavirus 2 virus (SARS-CoV-2) poses a worldwide human health crisis, causing respiratory illness with a high mortality rate. To investigate the factors governing codon usage bias in all the respiratory viruses, including SARS-CoV-2 isolates from different geographical locations (~62K), including two recently emerging strains from the United Kingdom (UK), i.e., VUI202012/01 and South Africa (SA), i.e., 501.Y.V2 codon usage bias (CUBs) analysis was performed. The analysis includes RSCU analysis, GC content calculation, ENC analysis, dinucleotide frequency and neutrality plot analysis. We were motivated to conduct the study to fulfil two primary aims: first, to identify the difference in codon usage bias amongst all SARS-CoV-2 genomes and, secondly, to compare their CUBs properties with other respiratory viruses. A biased nucleotide composition was found as most of the highly preferred codons were A/U-ending in all the respiratory viruses studied here. Compared with the human host, the RSCU analysis led to the identification of 11 over-represented codons and 9 under-represented codons in SARS-CoV-2 genomes. Correlation analysis of ENC and GC3s revealed that mutational pressure is the leading force determining the CUBs. The present study results yield a better understanding of codon usage preferences for SARS-CoV-2 genomes and discover the possible evolutionary determinants responsible for the biases found among the respiratory viruses, thus unveils a unique feature of the SARS-CoV-2 evolution and adaptation. To the best of our knowledge, this is the first attempt at comparative CUBs analysis on the worldwide genomes of SARS-CoV-2, including novel emerged strains and other respiratory viruses.


mBio ◽  
2014 ◽  
Vol 5 (2) ◽  
Author(s):  
Wenqi Ran ◽  
David M. Kristensen ◽  
Eugene V. Koonin

ABSTRACT The relationship between the selection affecting codon usage and selection on protein sequences of orthologous genes in diverse groups of bacteria and archaea was examined by using the Alignable Tight Genome Clusters database of prokaryote genomes. The codon usage bias is generally low, with 57.5% of the gene-specific optimal codon frequencies (F opt ) being below 0.55. This apparent weak selection on codon usage contrasts with the strong purifying selection on amino acid sequences, with 65.8% of the gene-specific dN/dS ratios being below 0.1. For most of the genomes compared, a limited but statistically significant negative correlation between F opt and dN/dS was observed, which is indicative of a link between selection on protein sequence and selection on codon usage. The strength of the coupling between the protein level selection and codon usage bias showed a strong positive correlation with the genomic GC content. Combined with previous observations on the selection for GC-rich codons in bacteria and archaea with GC-rich genomes, these findings suggest that selection for translational fine-tuning could be an important factor in microbial evolution that drives the evolution of genome GC content away from mutational equilibrium. This type of selection is particularly pronounced in slowly evolving, “high-status” genes. A significantly stronger link between the two aspects of selection is observed in free-living bacteria than in parasitic bacteria and in genes encoding metabolic enzymes and transporters than in informational genes. These differences might reflect the special importance of translational fine-tuning for the adaptability of gene expression to environmental changes. The results of this work establish the coupling between protein level selection and selection for translational optimization as a distinct and potentially important factor in microbial evolution. IMPORTANCE Selection affects the evolution of microbial genomes at many levels, including both the structure of proteins and the regulation of their production. Here we demonstrate the coupling between the selection on protein sequences and the optimization of codon usage in a broad range of bacteria and archaea. The strength of this coupling varies over a wide range and strongly and positively correlates with the genomic GC content. The cause(s) of the evolution of high GC content is a long-standing open question, given the universal mutational bias toward AT. We propose that optimization of codon usage could be one of the key factors that determine the evolution of GC-rich genomes. This work establishes the coupling between selection at the level of protein sequence and at the level of codon choice optimization as a distinct aspect of genome evolution.


2021 ◽  
pp. 1450-1458
Author(s):  
Sharanagouda S. Patil ◽  
Uma Bharathi Indrabalan ◽  
Kuralayanapalya Puttahonnappa Suresh ◽  
Bibek Ranjan Shome

Background and Aim: Classical swine fever (CSF), caused by CSF virus (CSFV), is a highly contagious disease in pigs causing 100% mortality in susceptible adult pigs and piglets. High mortality rate in pigs causes huge economic loss to pig farmers. CSFV has a positive-sense RNA genome of 12.3 kb in length flanked by untranslated regions at 5' and 3' end. The genome codes for a large polyprotein of 3900 amino acids coding for 11 viral proteins. The 1300 codons in the polyprotein are coded by different combinations of three nucleotides which help the infectious agent to evolve itself and adapt to the host environment. This study performed and employed various methods/techniques to estimate the changes occurring in the process of CSFV evolution by analyzing the codon usage pattern. Materials and Methods: The evolution of viruses is widely studied by analyzing their nucleotides and coding regions/ codons using various methods. A total of 115 complete coding regions of CSFVs including one complete genome from our laboratory (MH734359) were included in this study and analysis was carried out using various methods in estimating codon usage bias and evolution. This study elaborates on the factors that influence the codon usage pattern. Results: The effective number of codons (ENC) and relative synonymous codon usage showed the presence of codon usage bias. The mononucleotide (A) has a higher frequency compared to the other mononucleotides (G, C, and T). The dinucleotides CG and CC are underrepresented and overrepresented. The codons CGT was underrepresented and AGG was overrepresented. The codon adaptation index value of 0.71 was obtained indicating that there is a similarity in the codon usage bias. The principal component analysis, ENC-plot, Neutrality plot, and Parity Rule 2 plot produced in this article indicate that the CSFV is influenced by the codon usage bias. The mutational pressure and natural selection are the important factors that influence the codon usage bias. Conclusion: The study provides useful information on the codon usage analysis of CSFV and may be utilized to understand the host adaptation to virus environment and its evolution. Further, such findings help in new gene discovery, design of primers/probes, design of transgenes, determination of the origin of species, prediction of gene expression level, and gene function of CSFV. To the best of our knowledge, this is the first study on codon usage bias involving such a large number of complete CSFVs including one sequence of CSFV from India.


2018 ◽  
Vol 15 (138) ◽  
pp. 20170667 ◽  
Author(s):  
Sophia S. Liu ◽  
Adam J. Hockenberry ◽  
Michael C. Jewett ◽  
Luís A. N. Amaral

The unequal utilization of synonymous codons affects numerous cellular processes including translation rates, protein folding and mRNA degradation. In order to understand the biological impact of variable codon usage bias (CUB) between genes and genomes, it is crucial to be able to accurately measure CUB for a given sequence. A large number of metrics have been developed for this purpose, but there is currently no way of systematically testing the accuracy of individual metrics or knowing whether metrics provide consistent results. This lack of standardization can result in false-positive and false-negative findings if underpowered or inaccurate metrics are applied as tools for discovery. Here, we show that the choice of CUB metric impacts both the significance and measured effect sizes in numerous empirical datasets, raising questions about the generality of findings in published research. To bring about standardization, we developed a novel method to create synthetic protein-coding DNA sequences according to different models of codon usage. We use these benchmark sequences to identify the most accurate and robust metrics with regard to sequence length, GC content and amino acid heterogeneity. Finally, we show how our benchmark can aid the development of new metrics by providing feedback on its performance compared to the state of the art.


2008 ◽  
Vol 16 (02) ◽  
pp. 241-253
Author(s):  
QIANLI HUANG ◽  
YONG LI ◽  
JESSE LI-LING ◽  
HUIFANG HUANG ◽  
XUEPING CHEN ◽  
...  

To better understand the evolutionary and molecular mechanisms of alternative splicing causing human diseases, we have systematically compared the pattern, the distribution and the density of disease-associated mutations as well as the influence of codon usage bias on the single mutation between alternatively and constitutively spliced genes through analysis of the large datasets from human disease genes. The results indicated that: 1. The most common pattern of single mutation in alternatively and constitutively spliced genes are, respectively, C/T (25.17%), (22.81%) and G/A (21.54%), (22.73%), suggesting that the two types of disease genes are prone to C → T and G → A mutations. 2. There is an overall preponderance for transitions over transversions in alternatively (62.88% versus 37.12%) and constitutively (64.41% versus 35.59%) spliced disease genes. 3. For the second base of codons, there exist significant differences in transitions and transversions between the two types of genes. 4. Our data indicated that the single mutation tends to occur preferentially when the upstream neighboring-nucleotide is C or G in human disease genes. 5. Codon usage bias and synonymous codon usage have great influence on the single mutation in both alternatively and constitutively spliced genes. The GC content and gene length also have very evident influence on such mutations. Our results seem to imply that disease-associated mutations within the coding regions of alternatively spliced human disease genes have different mechanisms from constitutively spliced genes. Such findings may facilitate understanding the molecular mechanism of alternative splicing causing human diseases, and the development of gene therapies for such diseases.


2021 ◽  
Author(s):  
Zhihua Ou ◽  
Wei Liu ◽  
Junhua LI ◽  
Hongli Du

Human papillomavirus type 16 (HPV16) is the most prevalent HPV type causing cervical cancers. Herein, using 1,597 full genomes of HPV16, we systemically investigated the mutation profiles, surface protein glycosylation sites and the codon usage bias of the eight open reading frames (ORFs) of HPV16 genomes from different lineages and sublineages. Multiple lineage- or subline-age-specific mutation sites were identified. Glycosylation analysis showed that HPV16 lineage D contained the highest number of unique potential glycosylation site in both L1 and L2 capsid protein, which might lead to their antigenic distances from other HPV16 lineages. Nucleotide composition of HPV16 showed that the overall AT content was higher than GC content at the 3rd codon position. Relatively high ENC values suggested that the HPV16 ORFs didn't have strong codon usage bias. Most of the HPV16 ORFs were mainly governed by natural selection pressure such as translational pressure, except for L2. HPV16 only shared some of the preferred codons with human, which might help reduce competition in translational resources. These findings may help increase our understanding of the heterogeneity between HPV16 lineages and sublineages, and the adaptation mechanism of HPV in human cells, which might facilitate HPV classification and improve vaccine development and application.


2020 ◽  
Vol 21 (11) ◽  
Author(s):  
Redi Aditama ◽  
Zulfikar Achmad Tanjung ◽  
Widyartini Made Sudania ◽  
Yogo Adhi Nugroho ◽  
Condro Utomo ◽  
...  

Abstract. Aditama R, Tanjung ZA, Sudania WM, Nugroho YA, Utomo C, Liwang T. 2020. Analysis of codon usage bias reveals optimal codons in Elaeis guineensis. Biodiversitas 21: 5331-5337. Codon usage bias of oil palm genome was reported employing several indices, including GC content, relative synonymous codon usage (RSCU), the effective number of codons (ENC), and codon adaptation index (CAI). Unimodal distribution of GC content was observed and matched with non-grass monocots characteristics. Correspondence analysis (COA) on synonymous codon usage bias showed that the main axis was strongly driven by GC content. The ENC and neutrality plot of oil palm genes indicating that natural selection played more vital role compared to mutational bias on shaping codon usage bias. A positive correlation between calculated CAI and experimental data of oil palm gene expression was detected indicating good ability of this index. Finally, eighteen codons were defined as “optimal codons” that may provide a useful reference for heterogeneous expression and genome editing studies.


2018 ◽  
Vol 2018 ◽  
pp. 1-7 ◽  
Author(s):  
Li Gun ◽  
Ren Yumiao ◽  
Pan Haixian ◽  
Zhang Liang

Phenomenon of unequal use of synonymous codons in Mycobacterium tuberculosis is common. Codon usage bias not only plays an important regulatory role at the level of gene expression, but also helps in improving the accuracy and efficiency of translation. Meanwhile, codon usage pattern of Mycobacterium tuberculosis genome is important for interpreting evolutionary characteristics in species. In order to investigate the codon usage pattern of the Mycobacterium tuberculosis genome, 12 Mycobacterium tuberculosis genomes from different area are downloaded from the GeneBank. The correlations between G3, GC12, whole GC content, codon adaptation index, codon bias index, and so on of Mycobacterium tuberculosis genomes are calculated. The ENC-plot, relationship between A3/(A3+T3) and G3/(G3+C3), GC12 versus GC3 plot, and the RSCU of overall/separated genomes all show that the codon usage bias exists in all 12 Mycobacterium tuberculosis genomes. Lastly, relationship between CBI and the equalization of ENC shows a strong negative correlation between them. The relationship between protein length and GC content (GC3 and GC12) shows that more obvious differences in the GC content may be in shorter protein. These results show that codon usage bias existing in the Mycobacterium tuberculosis genomes could be used for further study on their evolutionary phenomenon.


Sign in / Sign up

Export Citation Format

Share Document