scholarly journals Codon Usage Bias Levels Predict Taxonomic Identity and Genetic Composition

2020 ◽  
Author(s):  
Bohdan B. Khomtchouk

AbstractIn this study, we investigate how an organism’s codon usage bias levels can serve as a predictor and classifier of various genomic and evolutionary features across the three kingdoms of life (archaea, bacteria, eukarya). We perform secondary analysis of existing genetic datasets to build several artificial intelligence (AI) and machine learning models trained on over 13,000 organisms that show it is possible to accurately predict an organism’s DNA type (nuclear, mitochondrial, chloroplast) and taxonomic identity simply using its genetic code (64 codon usage frequencies). By leveraging advanced AI and machine learning methods to accurately identify evolutionary origins and genetic composition from codon usage patterns, our study suggests that the genetic code can be utilized to train accurate machine learning classifiers of taxonomic and phylogenetic features. Our dataset and analyses are made publicly available on Github and the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Codon+usage) to facilitate open-source reproducibility and community engagement.

Biomolecules ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 912
Author(s):  
Saadullah Khattak ◽  
Mohd Ahmar Rauf ◽  
Qamar Zaman ◽  
Yasir Ali ◽  
Shabeen Fatima ◽  
...  

The ongoing outbreak of coronavirus disease COVID-19 is significantly implicated by global heterogeneity in the genome organization of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The causative agents of global heterogeneity in the whole genome of SARS-CoV-2 are not well characterized due to the lack of comparative study of a large enough sample size from around the globe to reduce the standard deviation to the acceptable margin of error. To better understand the SARS-CoV-2 genome architecture, we have performed a comprehensive analysis of codon usage bias of sixty (60) strains to get a snapshot of its global heterogeneity. Our study shows a relatively low codon usage bias in the SARS-CoV-2 viral genome globally, with nearly all the over-preferred codons’ A.U. ended. We concluded that the SARS-CoV-2 genome is primarily shaped by mutation pressure; however, marginal selection pressure cannot be overlooked. Within the A/U rich virus genomes of SARS-CoV-2, the standard deviation in G.C. (42.91% ± 5.84%) and the GC3 value (30.14% ± 6.93%) points towards global heterogeneity of the virus. Several SARS-CoV-2 viral strains were originated from different viral lineages at the exact geographic location also supports this fact. Taking all together, these findings suggest that the general root ancestry of the global genomes are different with different genome’s level adaptation to host. This research may provide new insights into the codon patterns, host adaptation, and global heterogeneity of SARS-CoV-2.


2011 ◽  
Vol 57 (12) ◽  
pp. 1016-1023 ◽  
Author(s):  
Xue Lian Luo ◽  
Jian Guo Xu ◽  
Chang Yun Ye

In this study, we analysed synonymous codon usage in Shigella flexneri 2a strain 301 (Sf301) and performed a comparative analysis of synonymous codon usage patterns in Sf301 and other strains of Shigella and Escherichia coli . Although there was a significant variety in codon usage bias among different Sf301 genes, there was a slight but observable codon usage bias that could primarily be attributable to mutational pressure and translational selection. In addition, the relative abundance of dinucleotides in Sf301 was observed to be independent of the overall base composition but was still caused by differential mutational pressure; this also shaped codon usage. By comparing the relative synonymous codon usage values across different Shigella and E. coli strains, we suggested that the synonymous codon usage pattern in the Shigella genomes was strain specific. This study represents a comprehensive analysis of Shigella codon usage patterns and provides a basic understanding of the mechanisms underlying codon usage bias.


2021 ◽  
Author(s):  
Neetu Tyagi ◽  
Rahila Sardar ◽  
Dinesh Gupta

AbstractThe Coronavirus disease 2019 (COVID-19) outbreak caused by Severe Acute Respiratory Syndrome Coronavirus 2 virus (SARS-CoV-2) poses a worldwide human health crisis, causing respiratory illness with a high mortality rate. To investigate the factors governing codon usage bias in all the respiratory viruses, including SARS-CoV-2 isolates from different geographical locations (~62K), including two recently emerging strains from the United Kingdom (UK), i.e., VUI202012/01 and South Africa (SA), i.e., 501.Y.V2 codon usage bias (CUBs) analysis was performed. The analysis includes RSCU analysis, GC content calculation, ENC analysis, dinucleotide frequency and neutrality plot analysis. We were motivated to conduct the study to fulfil two primary aims: first, to identify the difference in codon usage bias amongst all SARS-CoV-2 genomes and, secondly, to compare their CUBs properties with other respiratory viruses. A biased nucleotide composition was found as most of the highly preferred codons were A/U-ending in all the respiratory viruses studied here. Compared with the human host, the RSCU analysis led to the identification of 11 over-represented codons and 9 under-represented codons in SARS-CoV-2 genomes. Correlation analysis of ENC and GC3s revealed that mutational pressure is the leading force determining the CUBs. The present study results yield a better understanding of codon usage preferences for SARS-CoV-2 genomes and discover the possible evolutionary determinants responsible for the biases found among the respiratory viruses, thus unveils a unique feature of the SARS-CoV-2 evolution and adaptation. To the best of our knowledge, this is the first attempt at comparative CUBs analysis on the worldwide genomes of SARS-CoV-2, including novel emerged strains and other respiratory viruses.


Author(s):  
Prajakta P Kokate ◽  
Stephen M Techtmann ◽  
Thomas Werner

Abstract Codon usage bias, where certain codons are used more frequently than their synonymous counterparts, is an interesting phenomenon influenced by three evolutionary forces: mutation, selection, and genetic drift. To better understand how these evolutionary forces affect codon usage bias, an extensive study to detect how codon usage patterns change across species is required. This study investigated 668 single-copy orthologous genes independently in 29 Drosophila species to determine how the codon usage patterns change with phylogenetic distance. We found a strong correlation between phylogenetic distance and codon usage bias and observed striking differences in codon preferences between the two subgenera Drosophila and Sophophora. As compared to the subgenus Sophophora, species of the subgenus Drosophila showed reduced codon usage bias and a reduced preference specifically for codons ending with C, except for codons with G in the second position. We found that codon usage patterns in all species were influenced by the nucleotides in the codon's 2nd and 3rd positions rather than the biochemical properties of the amino acids encoded. We detected a concordance between preferred codons and preferred dinucleotides (at positions 2 and 3 of codons). Furthermore, we observed an association between speciation, codon preferences, and dinucleotide preferences. Our study provides the foundation to understand how selection acts on dinucleotides to influence codon usage bias.


Viruses ◽  
2019 ◽  
Vol 11 (12) ◽  
pp. 1087 ◽  
Author(s):  
Sheng-Lin Shi ◽  
Run-Xi Xia

All iflavirus members belong to the unique genus, Iflavirus, of the family, Iflaviridae. The host taxa and sequence identities of these viruses are diverse. A codon usage bias, maintained by a balance between selection, mutation, and genetic drift, exists in a wide variety of organisms. We characterized the codon usage patterns of 44 iflavirus genomes that were isolated from the classes, Insecta, Arachnida, Mammalia, and Malacostraca. Iflaviruses lack a strong codon usage bias when they are evaluated using an effective number of codons. The odds ratios of the majority of dinucleotides are within the normal range. However, the dinucleotides at the 1st–2nd codon positions are more biased than those at the 2nd–3rd codon positions. Plots of effective numbers of codons, relative neutrality analysis, and PR2 bias analysis all indicate that selection pressure dominates mutations in shaping codon usage patterns in the family, Iflaviridae. When these viruses were grouped into their host taxa, we found that the indices, including the nucleotide composition, effective number of codons, relative synonymous codon usage, and the influencing factors behind the codon usage patterns, all show that there are non-significant differences between the six host-taxa-groups. Our results disagree with our assumption that diverse viruses should possess diverse codon usage patterns, suggesting that the nucleotide composition and codon usage in the family, Iflaviridae, are not host taxa-specific signatures.


Viruses ◽  
2019 ◽  
Vol 11 (4) ◽  
pp. 331 ◽  
Author(s):  
Kajal Biswas ◽  
Supratik Palchoudhury ◽  
Prosenjit Chakraborty ◽  
Utpal Bhattacharyya ◽  
Dilip Ghosh ◽  
...  

Citrus tristeza virus (CTV), a member of the aphid-transmitted closterovirus group, is the causal agent of the notorious tristeza disease in several citrus species worldwide. The codon usage patterns of viruses reflect the evolutionary changes for optimization of their survival and adaptation in their fitness to the external environment and the hosts. The codon usage adaptation of CTV to specific citrus hosts remains to be studied; thus, its role in CTV evolution is not clearly comprehended. Therefore, to better explain the host–virus interaction and evolutionary history of CTV, the codon usage patterns of the coat protein (CP) genes of 122 CTV isolates originating from three economically important citrus hosts (55 isolate from Citrus sinensis, 38 from C. reticulata, and 29 from C. aurantifolia) were studied using several codon usage indices and multivariate statistical methods. The present study shows that CTV displays low codon usage bias (CUB) and higher genomic stability. Neutrality plot and relative synonymous codon usage analyses revealed that the overall influence of natural selection was more profound than that of mutation pressure in shaping the CUB of CTV. The contribution of high-frequency codon analysis and codon adaptation index value show that CTV has host-specific codon usage patterns, resulting in higheradaptability of CTV isolates originating from C. reticulata (Cr-CTV), and low adaptability in the isolates originating from C. aurantifolia (Ca-CTV) and C. sinensis (Cs-CTV). The combination of codon analysis of CTV with citrus genealogy suggests that CTV evolved in C. reticulata or other Citrus progenitors. The outcome of the study enhances the understanding of the factors involved in viral adaptation, evolution, and fitness toward their hosts. This information will definitely help devise better management strategies of CTV.


2019 ◽  
Author(s):  
ying wang ◽  
Lin Yao ◽  
Jinfeng Fan ◽  
Xueying Zhang ◽  
Changhong Guo ◽  
...  

Abstract Background: Codon usage pattern is an important evolutionary feature in genomes widely observed in many organisms. Stylonychia lemnae is a classical model single-celled eukaryote, and a quintessential ciliate typified by dimorphic nuclei: a germline micronucleus and a vegetative macronucleus. Analysis of codon usage pattern of S. lemnae macronucleus genome helps in understanding evolution at molecular level and acquires significance in mRNA translation, design of transgenic and new gene discovery. Results: The codons of the macronucleus genome sequence of S. lemnae were analyzed and 20,750 coding sequences (CDS) were screened. The overall codon usage of S. lemnae is similar and slightly biased. The value of effective number of codons (ENC) showed that the overall extent of codon usage bias in S. lemnae is relatively high. Nucleotide analysis showed that the overall codon usage is biased toward A- and U-ending codons. The phylogenetic analysis indicated that ciliate is independent evolutionary origins from a common ancestor. The RSCU analysis showed that the codon usage pattern of S. lemnae is more similar to that of Thtrahymana thermophila and Paramecium caudatum . Correlation analysis, ENC-GC 3S plot, and PR2 plot indicated that the codon usage patterns of S. lemnae are influenced by both mutational pressure and natural selection, neutrality plot analysis showed that those two factors play major roles. C onclusions : Codon usage patterns in eukaryotes are not determined by translational efficiency, but also are determined by the genome. Our study is the first attempt to evaluate the codon usage pattern of S.lemnae macronucleus genome to better understand the evolutionary changes. These results built the base for further research on the molecular evolution of S. lemnae .


2011 ◽  
Vol 204-210 ◽  
pp. 649-662 ◽  
Author(s):  
Ying Wu ◽  
An Chun Cheng ◽  
Ming Shu Wang ◽  
De Kang Zhu ◽  
Xiao Yue Chen

The analysis of codon usage may improve our understanding of the evolution and pathogenesis of DEV(Duck enteritis virus) and allow reengineering of target gene to improve their expression for gene therapy.In this study,we calculated the codon usage bias in DEV UL55 gene and performed a comparative analysis of synonymous codon usage patterns in other 26 related viruses by EMBOSS CUSP program and Codon W on line.Moreover,statistical methods were used to investigate the correlations of these related parameters. By comparing synonymous codon usage patterns in different viruses,we observed that synonymous codon usage pattern in these virus is virus specific and phylogenetically conserved, with a strong bias towards the codons with A and T at the third codon position. Phylogenetic analysis based on codon usage pattern suggested that DEV UL55 gene was clustered with the avian Alphaherpesvirus but diverged to form a single branch. The Neutrality-plot suggested GC12 and GC3s adopt the same mutation pattern,meanwhile,the ENC-plot revealed that the genetic heterogeneity in UL55 genes is constrained by the G+C content, while translational selection and gene length have no or micro effect on the variations of synonymous codon usage in these virus genes.Furthermore, we compared the codon preferences of DEV with those of E. coli, yeast and Homo sapiens.Data suggested the eukaryotes system such as human system may be more suitable for the expression of DEV UL55 gene in vitro. If the yeast and E. coli expression system are wanted for the expression of DEV UL55 gene ,codon optimization of the DEV UL55 gene may be required.


Viruses ◽  
2018 ◽  
Vol 10 (11) ◽  
pp. 604 ◽  
Author(s):  
Naveen Kumar ◽  
Diwakar Kulkarni ◽  
Benhur Lee ◽  
Rahul Kaushik ◽  
Sandeep Bhatia ◽  
...  

Hendra virus (HeV) and Nipah virus (NiV) are among a group of emerging bat-borne paramyxoviruses that have crossed their species-barrier several times by infecting several hosts with a high fatality rate in human beings. Despite the fatal nature of their infection, a comprehensive study to explore their evolution and adaptation in different hosts is lacking. A study of codon usage patterns in henipaviruses may provide some fruitful insight into their evolutionary processes of synonymous codon usage and host-adapted evolution. Here, we performed a systematic evolutionary and codon usage bias analysis of henipaviruses. We found a low codon usage bias in the coding sequences of henipaviruses and that natural selection, mutation pressure, and nucleotide compositions shapes the codon usage patterns of henipaviruses, with natural selection being more important than the others. Also, henipaviruses showed the highest level of adaptation to bats of the genus Pteropus in the codon adaptation index (CAI), relative to the codon de-optimization index (RCDI), and similarity index (SiD) analyses. Furthermore, a comparison to recently identified henipa-like viruses indicated a high tRNA adaptation index of henipaviruses for human beings, mainly due to F, G and L proteins. Consequently, the study concedes the substantial emergence of henipaviruses in human beings, particularly when paired with frequent exposure to direct/indirect bat excretions.


2019 ◽  
Vol 35 (2) ◽  
pp. 319-327
Author(s):  
Matthew L Jockers ◽  
Fernando Nascimento ◽  
George H Taylor

Abstract The judgments by members of the US Supreme Court in the 2000 case of Bush versus Gore remain controversial to the present. We use text mining and machine learning methods to compare the word usage patterns of Supreme Court Justices in order to explore the likely authorship of both the anonymous 5-4 per curiam decision in this case and the concurrence that is attributed to Chief Justice Rehnquist, with Scalia and Thomas joining. An analysis of high and medium frequency words suggests that Justice Kennedy was likely the main contributor to the per curiam decision. A similar analysis of the concurrence, however, suggests that Justice Scalia may have played a more central role than the document’s purported author, Justice Rehnquist. Our analysis indicates that while Chief Justice Rehnquist was likely to have been the crafter of the document, much of the more forceful language of the concurrence resonates more clearly with a vocabulary that is indicative of Justice Scalia.


Sign in / Sign up

Export Citation Format

Share Document