Selection On synonymous Mutations Revealed by 1135 Genomes of Arabidopsis thaliana

Synonymous mutations do not change the amino acid but do change the synonymous codon usage. In genomes of different organisms, the gene conversion process is biased toward GC, which is irrespective of mutation bias. In the coding region, this trend is especially obvious and it is possibly caused by the preference on G/C-ending codons over the A/T-ending ones. If the G/C-ending codons are advantageous, then the synonymous mutations that change A/T to G/C would be “optimal” compared to the opposite ones. In theory, one should observe signals of positive selection on these optimal synonymous mutations. The recently released single-nucleotide polymorphism (SNP) data from the 1001 genome project of Arabidopsis thaliana provided researchers with an unprecedented opportunity to verify this assumption. I fully take advantage of the SNP data from 1,135 A thaliana lines and came to the conclusion that synonymous mutations in natural populations are not strictly neutral: the synonymous mutations that increase GC content (from A/T to G/C) tend to have higher derived allele frequencies (DAFs) and, therefore, are likely to be positively selected. My current study broadens our knowledge of the selection patterns of synonymous mutations and should be appealing to evolutionary biologists. One sentence summary: In 1135 genomes of Arabidopsis thaliana, the synonymous mutations that increase the GC content tend to have higher derived allele frequencies (DAFs) and are likely to be positively selected.

Download Full-text

Comparative Analysis of Genomic and Transcriptome Sequences Reveals Divergent Patterns of Codon Bias in Wheat and Its Ancestor Species

Frontiers in Genetics ◽

10.3389/fgene.2021.732432 ◽

2021 ◽

Vol 12 ◽

Author(s):

Chenkang Yang ◽

Qi Zhao ◽

Ying Wang ◽

Jiajia Zhao ◽

Ling Qiao ◽

...

Keyword(s):

Triticum Aestivum ◽

Codon Usage ◽

Hexaploid Wheat ◽

Triticum Turgidum ◽

Synonymous Codon ◽

Regression Line ◽

Aegilops Tauschii ◽

Gc Content ◽

Gene Families ◽

Synonymous Codon Usage

The synonymous codons usage shows a characteristic pattern of preference in each organism. This codon usage bias is thought to have evolved for efficient protein synthesis. Synonymous codon usage was studied in genes of the hexaploid wheat Triticum aestivum (AABBDD) and its progenitor species, Triticum urartu (AA), Aegilops tauschii (DD), and Triticum turgidum (AABB). Triticum aestivum exhibited stronger usage bias for G/C-ending codons than did the three progenitor species, and this bias was especially higher compared to T. turgidum and Ae. tauschii. High GC content is a primary factor influencing codon usage in T. aestivum. Neutrality analysis showed a significant positive correlation (p<0.001) between GC12 and GC3 in the four species with regression line slopes near zero (0.16–0.20), suggesting that the effect of mutation on codon usage was only 16–20%. The GC3s values of genes were associated with gene length and distribution density within chromosomes. tRNA abundance data indicated that codon preference corresponded to the relative abundance of isoaccepting tRNAs in the four species. Both mutation and selection have affected synonymous codon usage in hexaploid wheat and its progenitor species. GO enrichment showed that GC biased genes were commonly enriched in physiological processes such as photosynthesis and response to acid chemical. In some certain gene families with important functions, the codon usage of small parts of genes has changed during the evolution process of T. aestivum.

Download Full-text

Analysis of codon usage bias reveals optimal codons in Elaeis guineensis

Biodiversitas Journal of Biological Diversity ◽

10.13057/biodiv/d211138 ◽

2020 ◽

Vol 21 (11) ◽

Author(s):

Redi Aditama ◽

Zulfikar Achmad Tanjung ◽

Widyartini Made Sudania ◽

Yogo Adhi Nugroho ◽

Condro Utomo ◽

...

Keyword(s):

Codon Usage ◽

Oil Palm ◽

Codon Usage Bias ◽

Elaeis Guineensis ◽

Synonymous Codon ◽

Gc Content ◽

Synonymous Codon Usage ◽

Mutational Bias ◽

Optimal Codons ◽

Good Ability

Abstract. Aditama R, Tanjung ZA, Sudania WM, Nugroho YA, Utomo C, Liwang T. 2020. Analysis of codon usage bias reveals optimal codons in Elaeis guineensis. Biodiversitas 21: 5331-5337. Codon usage bias of oil palm genome was reported employing several indices, including GC content, relative synonymous codon usage (RSCU), the effective number of codons (ENC), and codon adaptation index (CAI). Unimodal distribution of GC content was observed and matched with non-grass monocots characteristics. Correspondence analysis (COA) on synonymous codon usage bias showed that the main axis was strongly driven by GC content. The ENC and neutrality plot of oil palm genes indicating that natural selection played more vital role compared to mutational bias on shaping codon usage bias. A positive correlation between calculated CAI and experimental data of oil palm gene expression was detected indicating good ability of this index. Finally, eighteen codons were defined as “optimal codons” that may provide a useful reference for heterogeneous expression and genome editing studies.

Download Full-text

The complete chloroplast genome sequence of Morus cathayana and Morus multicaulis, and comparative analysis within genus Morus L

PeerJ ◽

10.7717/peerj.3037 ◽

2017 ◽

Vol 5 ◽

pp. e3037 ◽

Cited By ~ 11

Author(s):

Wei Qing Kong ◽

Jin Hong Yang

Keyword(s):

Codon Usage ◽

Evolutionary Biology ◽

Synonymous Codon ◽

Synonymous Codon Usage ◽

Mutational Bias ◽

Coding Region ◽

Variable Regions ◽

Standard Curve ◽

Gene Coding ◽

Effective Number Of Codons

Trees in the Morus genera belong to the Moraceae family. To better understand the species status of genus Morus and to provide information for studies on evolutionary biology within the genus, the complete chloroplast (cp) genomes of M. cathayana and M. multicaulis were sequenced. The plastomes of the two species are 159,265 bp and 159,103 bp, respectively, with corresponding 83 and 82 simple sequence repeats (SSRs). Similar to the SSRs of M. mongolica and M. indica cp genomes, more than 70% are mononucleotides, ten are in coding regions, and one exhibits nucleotide content polymorphism. Results for codon usage and relative synonymous codon usage show a strong bias towards NNA and NNT codons in the two cp genomes. Analysis of a plot of the effective number of codons (ENc) for five Morus spp. cp genomes showed that most genes follow the standard curve, but several genes have ENc values below the expected curve. The results indicate that both natural selection and mutational bias have contributed to the codon bias. Ten highly variable regions were identified among the five Morus spp. cp genomes, and 154 single-nucleotide polymorphism mutation events were accurately located in the gene coding region.

Download Full-text

Mutation bias shapes gene evolution in Arabidopsis thaliana

10.1101/2020.06.17.156752 ◽

2020 ◽

Cited By ~ 1

Author(s):

J. Grey Monroe ◽

Thanvi Srikant ◽

Pablo Carbonell-Bejerano ◽

Moises Exposito-Alonso ◽

Mao-Lun Weng ◽

...

Keyword(s):

Arabidopsis Thaliana ◽

De Novo ◽

Gene Evolution ◽

Gc Content ◽

Mutation Rates ◽

Rate Variation ◽

Recent Theory ◽

De Novo Mutations ◽

Coding Region ◽

Mutational Hotspots

Classical evolutionary theory maintains that mutation rate variation between genes should be random with respect to fitness 1–4 and evolutionary optimization of genic mutation rates remains controversial 3,5. However, it has now become known that cytogenetic (DNA sequence + epigenomic) features influence local mutation probabilities 6, which is predicted by more recent theory to be a prerequisite for beneficial mutation rates between different classes of genes to readily evolve 7. To test this possibility, we used de novo mutations in Arabidopsis thaliana to create a high resolution predictive model of mutation rates as a function of cytogenetic features across the genome. As expected, mutation rates are significantly predicted by features such as GC content, histone modifications, and chromatin accessibility. Deeper analyses of predicted mutation rates reveal effects of introns and untranslated exon regions in distancing coding sequences from mutational hotspots at the start and end of transcribed regions in A. thaliana. Finally, predicted coding region mutation rates are significantly lower in genes where mutations are more likely to be deleterious, supported by numerous estimates of evolutionary and functional constraint. These findings contradict neutral expectations that mutation probabilities are independent of fitness consequences. Instead they are consistent with the evolution of lower mutation rates in functionally constrained loci due to cytogenetic features, with important implications for evolutionary biology8.

Download Full-text

Comparative Analysis of Human Coronaviruses Focusing on Nucleotide Variability and Synonymous Codon Usage Pattern

10.1101/2020.07.28.224386 ◽

2020 ◽

Author(s):

Jayanta Kumar Das ◽

Swarup Roy

Keyword(s):

Codon Position ◽

Synonymous Codon ◽

Linear Regression Analysis ◽

Gc Content ◽

Codon Usage Pattern ◽

P Value ◽

Coding Region ◽

Human Coronavirus ◽

Amino Acid Group ◽

Nucleotide Variability

AbstractPrevailing pandemic across the world due to SARSCoV-2 drawing great attention towards discovering its evolutionary origin. We perform an exploratory study to understand the variability of the whole coding region of possible proximal evolutionary neighbours of SARSCoV-2. We consider seven (07) human coronavirus strains from six different species as a candidate for our study.First, we observe a good variability of nucleotides across candidate strains. We did not find a significant variation of GC content across the strains for codon position first and second. However, we interestingly see huge variability of GC-content in codon position 3rd (GC3), and pairwise mean GC-content (SARSCoV, MERSCoV), and (SARSCoV-2, hCoV229E) are quite closer. While observing the relative abundance of dinucleotide feature, we find a shared typical genetic pattern, i.e., high usage of GC and CT nucleotide pair at the first two positions (P12) of codons and the last two positions (P23) of codons, respectively. We also observe a low abundance of CG pair that might help in their evolution bio-process. Secondly, Considering RSCU score, we find a substantial similarity for mild class coronaviruses, i.e., hCoVOC43, hCoVHKU1, and hCoVNL63 based on their codon hit with high RSCU value (≥ 1.5), and minim number of codons hit (count-9) is observed for MERSCoV. We see seven codons ATT, ACT, TCT, CCT, GTT, GCT and GGT with high RSCU value, which are common in all seven strains. These codons are mostly from Aliphatic and Hydroxyl amino acid group. A phylogenetic tree built using RSCU feature reveals proximity among hCoVOC43 and hCoV229E (mild). Thirdly, we perform linear regression analysis among GC content in different codon position and ENC value. We observe a strong correlation (significant p-value) between GC2 and GC3 for SARSCoV-2, hCoV229E and hCoVNL63, and between GC1 and GC3 for hCoV229E, hCoVNL63, SARSCoV. We believe that our findings will help in understanding the mechanism of human coronavirus.

Download Full-text

Comparative study of the hemagglutinin and neuraminidase genes of influenza A virus H3N2, H9N2, and H5N1 subtypes using bioinformatics techniques

Canadian Journal of Microbiology ◽

10.1139/w07-044 ◽

2007 ◽

Vol 53 (7) ◽

pp. 830-839 ◽

Cited By ~ 5

Author(s):

Insung Ahn ◽

Hyeon S. Son

Keyword(s):

Codon Usage ◽

Influenza A Virus ◽

Host Species ◽

Influenza A ◽

Synonymous Codon ◽

Gc Content ◽

Synonymous Codon Usage ◽

Codon Usage Pattern ◽

Evolutionary Patterns ◽

Over Time

To investigate the genomic patterns of influenza A virus subtypes, such as H3N2, H9N2, and H5N1, we collected 1842 sequences of the hemagglutinin and neuraminidase genes from the NCBI database and parsed them into 7 categories: accession number, host species, sampling year, country, subtype, gene name, and sequence. The sequences that were isolated from the human, avian, and swine populations were extracted and stored in a MySQL®database for intensive analysis. The GC content and relative synonymous codon usage (RSCU) values were calculated using JAVA codes. As a result, correspondence analysis of the RSCU values yielded the unique codon usage pattern (CUP) of each subtype and revealed no extreme differences among the human, avian, and swine isolates. H5N1 subtype viruses exhibited little variation in CUPs compared with other subtypes, suggesting that the H5N1 CUP has not yet undergone significant changes within each host species. Moreover, some observations may be relevant to CUP variation that has occurred over time among the H3N2 subtype viruses isolated from humans. All the sequences were divided into 3 groups over time, and each group seemed to have preferred synonymous codon patterns for each amino acid, especially for arginine, glycine, leucine, and valine. The bioinformatics technique we introduce in this study may be useful in predicting the evolutionary patterns of pandemic viruses.

Download Full-text

Recombination, meiotic expression and human codon usage

eLife ◽

10.7554/elife.27344 ◽

2017 ◽

Vol 6 ◽

Cited By ~ 23

Author(s):

Fanny Pouyet ◽

Dominique Mouchiroud ◽

Laurent Duret ◽

Marie Sémon

Keyword(s):

Codon Usage ◽

Large Scale ◽

Synonymous Codon ◽

Gc Content ◽

Synonymous Codon Usage ◽

Translation Efficiency ◽

Functional Categories ◽

Human Genes ◽

Biased Gene Conversion ◽

Mammalian Genomes

Synonymous codon usage (SCU) varies widely among human genes. In particular, genes involved in different functional categories display a distinct codon usage, which was interpreted as evidence that SCU is adaptively constrained to optimize translation efficiency in distinct cellular states. We demonstrate here that SCU is not driven by constraints on tRNA abundance, but by large-scale variation in GC-content, caused by meiotic recombination, via the non-adaptive process of GC-biased gene conversion (gBGC). Expression in meiotic cells is associated with a strong decrease in recombination within genes. Differences in SCU among functional categories reflect differences in levels of meiotic transcription, which is linked to variation in recombination and therefore in gBGC. Overall, the gBGC model explains 70% of the variance in SCU among genes. We argue that the strong heterogeneity of SCU induced by gBGC in mammalian genomes precludes any optimization of the tRNA pool to the demand in codon usage.

Download Full-text

Reduction of the Rate of Poliovirus Protein Synthesis through Large-Scale Codon Deoptimization Causes Attenuation of Viral Virulence by Lowering Specific Infectivity

Journal of Virology ◽

10.1128/jvi.00738-06 ◽

2006 ◽

Vol 80 (19) ◽

pp. 9687-9696 ◽

Cited By ~ 245

Author(s):

Steffen Mueller ◽

Dimitris Papamichail ◽

J. Robert Coleman ◽

Steven Skiena ◽

Eckard Wimmer

Keyword(s):

Large Scale ◽

De Novo ◽

Synonymous Codon ◽

Direct Analysis ◽

Synonymous Codon Usage ◽

Wild Type ◽

Coding Region ◽

Virus Particles ◽

Viral Virulence ◽

Wide Range

ABSTRACT Exploring the utility of de novo gene synthesis with the aim of designing stably attenuated polioviruses (PV), we followed two strategies to construct PV variants containing synthetic replacements of the capsid coding sequences either by deoptimizing synonymous codon usage (PV-AB) or by maximizing synonymous codon position changes of the existing wild-type (wt) poliovirus codons (PV-SD). Despite 934 nucleotide changes in the capsid coding region, PV-SD RNA produced virus with wild-type characteristics. In contrast, no viable virus was recovered from PV-AB RNA carrying 680 silent mutations, due to a reduction of genome translation and replication below a critical level. After subcloning of smaller portions of the AB capsid coding sequence into the wt background, several viable viruses were obtained with a wide range of phenotypes corresponding to their efficiency of directing genome translation. Surprisingly, when inoculated with equal infectious doses (PFU), even the most replication-deficient viruses appeared to be as pathogenic in PV-sensitive CD155tg (transgenic) mice as the PV(M) wild type. However, infection with equal amounts of virus particles revealed a neuroattenuated phenotype over 100-fold. Direct analysis indicated a striking reduction of the specific infectivity of PV-AB-type virus particles. Due to the distribution effect of many silent mutations over large genome segments, codon-deoptimized viruses should have genetically stable phenotypes, and they may prove suitable as attenuated substrates for the production of poliovirus vaccines.

Download Full-text

A Comparative Analysis of Synonymous Codon Usage Bias Pattern in Human Albumin Superfamily

The Scientific World JOURNAL ◽

10.1155/2014/639682 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 18

Author(s):

Hoda Mirsafian ◽

Adiratna Mat Ripen ◽

Aarti Singh ◽

Phaik Hwan Teo ◽

Amir Feisal Merican ◽

...

Keyword(s):

Codon Usage ◽

Codon Usage Bias ◽

Synonymous Codon ◽

Evolutionary Relationship ◽

Gc Content ◽

Protein Secondary Structure ◽

Human Albumin ◽

Synonymous Codon Usage ◽

Domains Of Life ◽

Synonymous Codon Usage Bias

Synonymous codon usage bias is an inevitable phenomenon in organismic taxa across the three domains of life. Though the frequency of codon usage is not equal across species and within genome in the same species, the phenomenon is non random and is tissue-specific. Several factors such as GC content, nucleotide distribution, protein hydropathy, protein secondary structure, and translational selection are reported to contribute to codon usage preference. The synonymous codon usage patterns can be helpful in revealing the expression pattern of genes as well as the evolutionary relationship between the sequences. In this study, synonymous codon usage bias patterns were determined for the evolutionarily close proteins of albumin superfamily, namely, albumin,α-fetoprotein, afamin, and vitamin D-binding protein. Our study demonstrated that the genes of the four albumin superfamily members have low GC content and high values of effective number of codons (ENC) suggesting high expressivity of these genes and less bias in codon usage preferences. This study also provided evidence that the albumin superfamily members are not subjected to mutational selection pressure.

Download Full-text

ΦX174 Attenuation by Whole Genome Codon Deoptimization

Genome Biology and Evolution ◽

10.1093/gbe/evaa214 ◽

2020 ◽

Author(s):

James T Van Leuven ◽

Martina M Ederer ◽

Katelyn Burleigh ◽

LuAnn Scott ◽

Randall A Hughes ◽

...

Keyword(s):

Codon Usage ◽

Synonymous Codon ◽

Synonymous Codon Usage ◽

Viral Fitness ◽

Live Attenuated Vaccine ◽

Protein Coding ◽

Viral Growth ◽

Synonymous Mutations ◽

Genes Encoding ◽

Codon Deoptimization

Abstract Natural selection acting on synonymous mutations in protein-coding genes influences genome composition and evolution. In viruses, introducing synonymous mutations in genes encoding structural proteins can drastically reduce viral growth, providing a means to generate potent, live attenuated vaccine candidates. However, an improved understanding of what compositional features are under selection and how combinations of synonymous mutations affect viral growth is needed to predictably attenuate viruses and make them resistant to reversion. We systematically recoded all non-overlapping genes of the bacteriophage ΦX174 with codons rarely used in its E. coli host. The fitness of recombinant viruses decreases as additional deoptimizing mutations are made to the genome, although not always linearly, and not consistently across genes. Combining deoptimizing mutations may reduce viral fitness more or less than expected from the effect size of the constituent mutations and we point out difficulties in untangling correlated compositional features. We test our model by optimizing the same genes and find that the relationship between codon usage and fitness does not hold for optimization, suggesting that wild-type ΦX174 is at a fitness optimum. This work highlights the need to better understand how selection acts on patterns of synonymous codon usage across the genome and provides a convenient system to investigate the genetic determinants of virulence.

Download Full-text