COUSIN (COdon Usage Similarity INdex): A Normalized Measure of Codon Usage Preferences

Abstract Codon Usage Preferences (CUPrefs) describe the unequal usage of synonymous codons at the gene, chromosome, or genome levels. Numerous indices have been developed to evaluate CUPrefs, either in absolute terms or with respect to a reference. We introduce the normalized index COUSIN (for COdon Usage Similarity INdex), that compares the CUPrefs of a query against those of a reference and normalizes the output over a Null Hypothesis of random codon usage. The added value of COUSIN is to be easily interpreted, both quantitatively and qualitatively. An eponymous software written in Python3 is available for local or online use (http://cousin.ird.fr). This software allows for an easy and complete analysis of CUPrefs via COUSIN, includes seven other indices, and provides additional features such as statistical analyses, clustering, and CUPrefs optimization for gene expression. We illustrate the flexibility of COUSIN and highlight its advantages by analyzing the complete coding sequences of eight divergent genomes. Strikingly, COUSIN captures a bimodal distribution in the CUPrefs of human and chicken genes hitherto unreported with such precision. COUSIN opens new perspectives to uncover CUPrefs specificities in genomes in a practical, informative, and user-friendly way.

Download Full-text

COUSIN (COdon Usage Similarity INdex): A normalized measure of Codon Usage Preferences

10.1101/600361 ◽

2019 ◽

Author(s):

Jérôme Bourret ◽

Samuel Alizon ◽

Ignacio G. Bravo

Keyword(s):

Codon Usage ◽

Similarity Index ◽

Bimodal Distribution ◽

Nucleotide Composition ◽

Genomic Region ◽

Reference Dataset ◽

Precise Location ◽

Synonymous Codons ◽

Local Use ◽

Genome Scale

AbstractCodon Usage Preferences (CUPrefs) describe the unequal usage of synonymous codons at the gene, genomic region or genome scale. Numerous indices have been developed to measure the CUPrefs of a sequence. We introduce a normalized index to calculate CUPrefs called COUSIN for COdon Usage Similarity INdex. This index compares the CUPrefs of a query against those of a reference dataset and normalizes the output over a Null Hypothesis of random codon usage. COUSIN results can be easily interpreted, quantitatively and qualitatively. We exemplify the use of COUSIN and highlight its advantages with an analysis on the complete coding sequences of eight divergent genomes, two of them with extreme nucleotide composition. Strikingly, COUSIN captures a hitherto unreported bimodal distribution in CUPrefs in genes in the human and in the chicken genomes. We show that this bimodality can be explained by the global nucleotide composition bias of the chromosome in which the gene resides, and by the precise location within the chromosome. Our results highlight the power of the COUSIN index and uncover unexpected characteristics of the CUPrefs in human and chicken. An eponymous tool written in python3 to calculate COUSIN is available for online or local use.

Download Full-text

ProkSeq for complete analysis of RNA-Seq data from prokaryotes

Bioinformatics ◽

10.1093/bioinformatics/btaa1063 ◽

2020 ◽

Author(s):

A K M Firoj Mahmud ◽

Nicolas Delhomme ◽

Soumyadeep Nandi ◽

Maria Fällman

Keyword(s):

Gene Expression ◽

Pathogenic Bacteria ◽

Supplementary Information ◽

Complete Analysis ◽

Rna Seq ◽

Differential Gene ◽

User Friendly ◽

Multiple Samples ◽

Data Analysis Pipeline

Abstract Summary Since its introduction, RNA-Seq technology has been used extensively in studies of pathogenic bacteria to identify and quantify differences in gene expression across multiple samples from bacteria exposed to different conditions. With some exceptions, the current tools for studying gene expression, determination of differential gene expression, downstream pathway analysis, and normalization of data collected in extreme biological conditions is still lacking. Here we describe ProkSeq, a user-friendly, fully automated RNA-Seq data analysis pipeline designed for prokaryotes. ProkSeq provides a wide variety of options for analysing differential expression, normalizing expression data, and visualizing data and results. Availability and implementation ProkSeq is implemented in Python and is published under the MIT source license. The pipeline is available as a Docker container https://hub.docker.com/repository/docker/snandids/prokseq-v2.0, or can be used through Anaconda: https://anaconda.org/snandiDS/prokseq. The code is available on Github: https://github.com/snandiDS/prokseq and a detailed user documentation, including a manual and tutorial can be found at https://prokseqV20.readthedocs.io Supplementary information Supplementary data are available at Bioinformatics online

Download Full-text

Codon Usage Bias Prefers AT Bases in Coding Sequences Among the Essential Genes of Haemophilus influenzae

Notulae Scientia Biologicae ◽

10.15835/nsb649386 ◽

2014 ◽

Vol 6 (4) ◽

pp. 417-421 ◽

Cited By ~ 2

Author(s):

Chakraborty SUPRIYO ◽

Paul PROSENJIT ◽

Tarikul Huda MAZUMDER

Keyword(s):

Gene Expression ◽

Haemophilus Influenzae ◽

Codon Usage ◽

Codon Usage Bias ◽

Compositional Analysis ◽

Revealed Preference ◽

Nucleotide Composition ◽

Essential Genes ◽

Coding Sequences ◽

Codon Positions

The base composition at three different codon positions in relation to codon usagebias and gene expressivity was studied in a sample of twenty five essential genes from Haemophilus influenzae. ENC, CBI and Fop were used to quantify the variation in codon usage bias for the cds. CAI is used to estimate the level of gene expression of the cds selected in the present study. To find out the relationship between the extent of codon bias and nucleotide composition the values of A, T, G, C and GC they were compared with the A3, T3, G3, C3 and GC3 values, respectively. The results showed relatively weak codon usage bias among the coding sequences (cds) of Haemophilus influenzae. This in turn, implies that the essential genes prefer to use a set of restricted codons. However, the base compositional analysis of essential genes in Haemophilus influenzae revealed preference of AT to GC bases within their coding sequences and this preference might affect gene expression as indicated by the relatively high CAI values ofthe coding sequences.

Download Full-text

Selection at the Amino Acid Level Can Influence Synonymous Codon Usage: Implications for the Study of Codon Adaptation in Plastid Genes

Genetics ◽

10.1093/genetics/159.1.347 ◽

2001 ◽

Vol 159 (1) ◽

pp. 347-358

Author(s):

Brian R Morton

Keyword(s):

Codon Usage ◽

Synonymous Codon ◽

Amino Acid Level ◽

Synonymous Codon Usage ◽

Noncoding Dna ◽

Translation Rate ◽

Coding Sequences ◽

Synonymous Codons ◽

Synonymous Sites ◽

Translation Accuracy

Abstract A previously employed method that uses the composition of noncoding DNA as the basis of a test for selection between synonymous codons in plastid genes is reevaluated. The test requires the assumption that in the absence of selective differences between synonymous codons the composition of silent sites in coding sequences will match the composition of noncoding sites. It is demonstrated here that this assumption is not necessarily true and, more generally, that using compositional properties to draw inferences about selection on silent changes in coding sequences is much more problematic than commonly assumed. This is so because selection on nonsynonymous changes can influence the composition of synonymous sites (i.e., codon usage) in a complex manner, meaning that the composition biases of different silent sites, including neutral noncoding DNA, are not comparable. These findings also draw into question the commonly utilized method of investigating how selection to increase translation accuracy influences codon usage. The work then focuses on implications for studies that assess codon adaptation, which is selection on codon usage to enhance translation rate, in plastid genes. A new test that does not require the use of noncoding DNA is proposed and applied. The results of this test suggest that far fewer plastid genes display codon adaptation than previously thought.

Download Full-text

The 3-Base Periodicity and Codon Usage of Coding Sequences Are Correlated with Gene Expression at the Level of Transcription Elongation

PLoS ONE ◽

10.1371/journal.pone.0021590 ◽

2011 ◽

Vol 6 (6) ◽

pp. e21590 ◽

Cited By ~ 15

Author(s):

Edoardo Trotta

Keyword(s):

Gene Expression ◽

Codon Usage ◽

Transcription Elongation ◽

Coding Sequences ◽

Base Periodicity

Download Full-text

ProkSeq for complete analysis of RNA-seq data from prokaryotes

10.1101/2020.06.09.135822 ◽

2020 ◽

Cited By ~ 2

Author(s):

A K M Firoj Mahmud ◽

Soumyadeep Nandi ◽

Maria Fällman

Keyword(s):

Gene Expression ◽

Pathogenic Bacteria ◽

Complete Analysis ◽

Rna Seq ◽

Link Type ◽

Eukaryotic Genes ◽

Differential Gene ◽

User Friendly ◽

Multiple Samples

AbstractSummarySince its introduction, RNA-seq technology has been used extensively in studies of pathogenic bacteria to identify and quantify differences in gene expression across multiple samples from bacteria exposed to different conditions. With some exceptions, the current tools for assessing gene expression have been designed around the structures of eukaryotic genes. There are a few stand-alone tools designed for prokaryotes, and they require improvement. A well-defined pipeline for prokaryotes that includes all the necessary tools for quality control, determination of differential gene expression, downstream pathway analysis, and normalization of data collected in extreme biological conditions is still lacking. Here we describe ProkSeq, a user-friendly, fully automated RNA-seq data analysis pipeline designed for prokaryotes. ProkSeq provides a wide variety of options for analysing differential expression, normalizing expression data, and visualizing data and results, and it produces publication-quality figures.Availability and implementationProkSeq is implemented in Python and is published under the ISC open source license. The tool and a detailed user manual are hosted at Docker: https://hub.docker.com/repository/docker/snandids/prokseq-v2.1, Anaconda: https://anaconda.org/snandiDS/prokseq; Github: https://github.com/snandiDS/prokseq.

Download Full-text

Analysis of Mutation Bias in Shaping Codon Usage Bias and Its Association with Gene Expression Across Species

10.29007/87r9 ◽

2020 ◽

Author(s):

Zhixiu Lu ◽

Michael Gilchrist ◽

Scott Emrich

Keyword(s):

Gene Expression ◽

Codon Usage ◽

Codon Usage Bias ◽

Synonymous Codon ◽

Synonymous Codon Usage ◽

Mutation Bias ◽

Protein Coding ◽

E Coli ◽

Synonymous Codons ◽

Computation Efficiency

Codon usage bias has been known to reflect the expression level of a protein-coding gene under the evolutionary theory that selection favors certain synonymous codons. Although measuring the effect of selection in simple organisms such as yeast and E. coli has proven to be effective and accurate, codon-based methods perform less well in plants and humans. In this paper, we extend a prior method that incorporates another evolutionary factor, namely mutation bias and its effect on codon usage. Our results indicate that prediction of gene expression is significantly improved under our framework, and suggests that quantification of mutation bias is essential for fully understanding synonymous codon usage. We also propose an improved method, namely MLE-Φ, with much greater computation efficiency and a wider range of applications. An implementation of this method is provided at https://github.com/luzhixiu1996/MLE- Phi.

Download Full-text

Phylodynamics and Codon Usage Pattern Analysis of Broad Bean Wilt Virus 2

Viruses ◽

10.3390/v13020198 ◽

2021 ◽

Vol 13 (2) ◽

pp. 198

Author(s):

Zhen He ◽

Zhuozhuo Dong ◽

Lang Qin ◽

Haifeng Gan

Keyword(s):

Codon Usage ◽

Broad Bean ◽

Synonymous Codon ◽

Similarity Index ◽

Full Length ◽

Codon Usage Pattern ◽

Usage Pattern ◽

Coding Sequences ◽

Wilt Virus ◽

Broad Bean Wilt Virus

Broad bean wilt virus 2 (BBWV-2), which belongs to the genus Fabavirus of the family Secoviridae, is an important pathogen that causes damage to broad bean, pepper, yam, spinach and other economically important ornamental and horticultural crops worldwide. Previously, only limited reports have shown the genetic variation of BBWV2. Meanwhile, the detailed evolutionary changes, synonymous codon usage bias and host adaptation of this virus are largely unclear. Here, we performed comprehensive analyses of the phylodynamics, reassortment, composition bias and codon usage pattern of BBWV2 using forty-two complete genome sequences of BBWV-2 isolates together with two other full-length RNA1 sequences and six full-length RNA2 sequences. Both recombination and reassortment had a significant influence on the genomic evolution of BBWV2. Through phylogenetic analysis we detected three and four lineages based on the ORF1 and ORF2 nonrecombinant sequences, respectively. The evolutionary rates of the two BBWV2 ORF coding sequences were 8.895 × 10−4 and 4.560 × 10−4 subs/site/year, respectively. We found a relatively conserved and stable genomic composition with a lower codon usage choice in the two BBWV2 protein coding sequences. ENC-plot and neutrality plot analyses showed that natural selection is the key factor shaping the codon usage pattern of BBWV2. Strong correlations between BBWV2 and broad bean and pepper were observed from similarity index (SiD), codon adaptation index (CAI) and relative codon deoptimization index (RCDI) analyses. Our study is the first to evaluate the phylodynamics, codon usage patterns and adaptive evolution of a fabavirus, and our results may be useful for the understanding of the origin of this virus.

Download Full-text

The Added Value of Graphical Input and Display for Electron Lens Design

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100179701 ◽

1990 ◽

Vol 48 (1) ◽

pp. 190-191

Author(s):

B. Lencova ◽

G. Wisselink

Keyword(s):

Flux Density ◽

Numerical Data ◽

Added Value ◽

Lens Design ◽

Step Size ◽

Magnetic Lens ◽

Flux Lines ◽

Fine Mesh ◽

Electron Lens ◽

User Friendly

Recent progress in computer technology enables the calculation of lens fields and focal properties on commonly available computers such as IBM ATs. If we add to this the use of graphics, we greatly increase the applicability of design programs for electron lenses. Most programs for field computation are based on the finite element method (FEM). They are written in Fortran 77, so that they are easily transferred from PCs to larger machines.The design process has recently been made significantly more user friendly by adding input programs written in Turbo Pascal, which allows a flexible implementation of computer graphics. The input programs have not only menu driven input and modification of numerical data, but also graphics editing of the data. The input programs create files which are subsequently read by the Fortran programs. From the main menu of our magnetic lens design program, further options are chosen by using function keys or numbers. Some options (lens initialization and setting, fine mesh, current densities, etc.) open other menus where computation parameters can be set or numerical data can be entered with the help of a simple line editor. The "draw lens" option enables graphical editing of the mesh - see fig. I. The geometry of the electron lens is specified in terms of coordinates and indices of a coarse quadrilateral mesh. In this mesh, the fine mesh with smoothly changing step size is calculated by an automeshing procedure. The options shown in fig. 1 allow modification of the number of coarse mesh lines, change of coordinates of mesh points or lines, and specification of lens parts. Interactive and graphical modification of the fine mesh can be called from the fine mesh menu. Finally, the lens computation can be called. Our FEM program allows up to 8000 mesh points on an AT computer. Another menu allows the display of computed results stored in output files and graphical display of axial flux density, flux density in magnetic parts, and the flux lines in magnetic lenses - see fig. 2. A series of several lens excitations with user specified or default magnetization curves can be calculated and displayed in one session.

Download Full-text

Massively parallel gene expression variation measurement of a synonymous codon library

BMC Genomics ◽

10.1186/s12864-021-07462-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Alexander Schmitz ◽

Fuzhong Zhang

Keyword(s):

Gene Expression ◽

Codon Usage ◽

Single Cells ◽

Massively Parallel ◽

Protein Abundance ◽

Translation Efficiency ◽

Gene Expression Variation ◽

Expression Variation ◽

Change In Mean ◽

Adaptation Index

Abstract Background Cell-to-cell variation in gene expression strongly affects population behavior and is key to multiple biological processes. While codon usage is known to affect ensemble gene expression, how codon usage influences variation in gene expression between single cells is not well understood. Results Here, we used a Sort-seq based massively parallel strategy to quantify gene expression variation from a green fluorescent protein (GFP) library containing synonymous codons in Escherichia coli. We found that sequences containing codons with higher tRNA Adaptation Index (TAI) scores, and higher codon adaptation index (CAI) scores, have higher GFP variance. This trend is not observed for codons with high Normalized Translation Efficiency Index (nTE) scores nor from the free energy of folding of the mRNA secondary structure. GFP noise, or squared coefficient of variance (CV2), scales with mean protein abundance for low-abundant proteins but does not change at high mean protein abundance. Conclusions Our results suggest that the main source of noise for high-abundance proteins is likely not originating at translation elongation. Additionally, the drastic change in mean protein abundance with small changes in protein noise seen from our library implies that codon optimization can be performed without concerning gene expression noise for biotechnology applications.

Download Full-text