scholarly journals COUSIN (COdon Usage Similarity INdex): A Normalized Measure of Codon Usage Preferences

2019 ◽  
Vol 11 (12) ◽  
pp. 3523-3528 ◽  
Author(s):  
Jérôme Bourret ◽  
Samuel Alizon ◽  
Ignacio G Bravo

Abstract Codon Usage Preferences (CUPrefs) describe the unequal usage of synonymous codons at the gene, chromosome, or genome levels. Numerous indices have been developed to evaluate CUPrefs, either in absolute terms or with respect to a reference. We introduce the normalized index COUSIN (for COdon Usage Similarity INdex), that compares the CUPrefs of a query against those of a reference and normalizes the output over a Null Hypothesis of random codon usage. The added value of COUSIN is to be easily interpreted, both quantitatively and qualitatively. An eponymous software written in Python3 is available for local or online use (http://cousin.ird.fr). This software allows for an easy and complete analysis of CUPrefs via COUSIN, includes seven other indices, and provides additional features such as statistical analyses, clustering, and CUPrefs optimization for gene expression. We illustrate the flexibility of COUSIN and highlight its advantages by analyzing the complete coding sequences of eight divergent genomes. Strikingly, COUSIN captures a bimodal distribution in the CUPrefs of human and chicken genes hitherto unreported with such precision. COUSIN opens new perspectives to uncover CUPrefs specificities in genomes in a practical, informative, and user-friendly way.

2019 ◽  
Author(s):  
Jérôme Bourret ◽  
Samuel Alizon ◽  
Ignacio G. Bravo

AbstractCodon Usage Preferences (CUPrefs) describe the unequal usage of synonymous codons at the gene, genomic region or genome scale. Numerous indices have been developed to measure the CUPrefs of a sequence. We introduce a normalized index to calculate CUPrefs called COUSIN for COdon Usage Similarity INdex. This index compares the CUPrefs of a query against those of a reference dataset and normalizes the output over a Null Hypothesis of random codon usage. COUSIN results can be easily interpreted, quantitatively and qualitatively. We exemplify the use of COUSIN and highlight its advantages with an analysis on the complete coding sequences of eight divergent genomes, two of them with extreme nucleotide composition. Strikingly, COUSIN captures a hitherto unreported bimodal distribution in CUPrefs in genes in the human and in the chicken genomes. We show that this bimodality can be explained by the global nucleotide composition bias of the chromosome in which the gene resides, and by the precise location within the chromosome. Our results highlight the power of the COUSIN index and uncover unexpected characteristics of the CUPrefs in human and chicken. An eponymous tool written in python3 to calculate COUSIN is available for online or local use.


Author(s):  
A K M Firoj Mahmud ◽  
Nicolas Delhomme ◽  
Soumyadeep Nandi ◽  
Maria Fällman

Abstract Summary Since its introduction, RNA-Seq technology has been used extensively in studies of pathogenic bacteria to identify and quantify differences in gene expression across multiple samples from bacteria exposed to different conditions. With some exceptions, the current tools for studying gene expression, determination of differential gene expression, downstream pathway analysis, and normalization of data collected in extreme biological conditions is still lacking. Here we describe ProkSeq, a user-friendly, fully automated RNA-Seq data analysis pipeline designed for prokaryotes. ProkSeq provides a wide variety of options for analysing differential expression, normalizing expression data, and visualizing data and results. Availability and implementation ProkSeq is implemented in Python and is published under the MIT source license. The pipeline is available as a Docker container https://hub.docker.com/repository/docker/snandids/prokseq-v2.0, or can be used through Anaconda: https://anaconda.org/snandiDS/prokseq. The code is available on Github: https://github.com/snandiDS/prokseq and a detailed user documentation, including a manual and tutorial can be found at https://prokseqV20.readthedocs.io Supplementary information Supplementary data are available at Bioinformatics online


2014 ◽  
Vol 6 (4) ◽  
pp. 417-421 ◽  
Author(s):  
Chakraborty SUPRIYO ◽  
Paul PROSENJIT ◽  
Tarikul Huda MAZUMDER

The base composition at three different codon positions in relation to codon usagebias and gene expressivity was studied in a sample of twenty five essential genes from Haemophilus influenzae. ENC, CBI and Fop were used to quantify the variation in codon usage bias for the cds. CAI is used to estimate the level of gene expression of the cds selected in the present study. To find out the relationship between the extent of codon bias and nucleotide composition the values of A, T, G, C and GC they were compared with the A3, T3, G3, C3 and GC3 values, respectively. The results showed relatively weak codon usage bias among the coding sequences (cds) of Haemophilus influenzae. This in turn, implies that the essential genes prefer to use a set of restricted codons. However, the base compositional analysis of essential genes in Haemophilus influenzae revealed preference of AT to GC bases within their coding sequences and this preference might affect gene expression as indicated by the relatively high CAI values ofthe coding sequences.


Genetics ◽  
2001 ◽  
Vol 159 (1) ◽  
pp. 347-358
Author(s):  
Brian R Morton

Abstract A previously employed method that uses the composition of noncoding DNA as the basis of a test for selection between synonymous codons in plastid genes is reevaluated. The test requires the assumption that in the absence of selective differences between synonymous codons the composition of silent sites in coding sequences will match the composition of noncoding sites. It is demonstrated here that this assumption is not necessarily true and, more generally, that using compositional properties to draw inferences about selection on silent changes in coding sequences is much more problematic than commonly assumed. This is so because selection on nonsynonymous changes can influence the composition of synonymous sites (i.e., codon usage) in a complex manner, meaning that the composition biases of different silent sites, including neutral noncoding DNA, are not comparable. These findings also draw into question the commonly utilized method of investigating how selection to increase translation accuracy influences codon usage. The work then focuses on implications for studies that assess codon adaptation, which is selection on codon usage to enhance translation rate, in plastid genes. A new test that does not require the use of noncoding DNA is proposed and applied. The results of this test suggest that far fewer plastid genes display codon adaptation than previously thought.


Author(s):  
A K M Firoj Mahmud ◽  
Soumyadeep Nandi ◽  
Maria Fällman

AbstractSummarySince its introduction, RNA-seq technology has been used extensively in studies of pathogenic bacteria to identify and quantify differences in gene expression across multiple samples from bacteria exposed to different conditions. With some exceptions, the current tools for assessing gene expression have been designed around the structures of eukaryotic genes. There are a few stand-alone tools designed for prokaryotes, and they require improvement. A well-defined pipeline for prokaryotes that includes all the necessary tools for quality control, determination of differential gene expression, downstream pathway analysis, and normalization of data collected in extreme biological conditions is still lacking. Here we describe ProkSeq, a user-friendly, fully automated RNA-seq data analysis pipeline designed for prokaryotes. ProkSeq provides a wide variety of options for analysing differential expression, normalizing expression data, and visualizing data and results, and it produces publication-quality figures.Availability and implementationProkSeq is implemented in Python and is published under the ISC open source license. The tool and a detailed user manual are hosted at Docker: https://hub.docker.com/repository/docker/snandids/prokseq-v2.1, Anaconda: https://anaconda.org/snandiDS/prokseq; Github: https://github.com/snandiDS/prokseq.


10.29007/87r9 ◽  
2020 ◽  
Author(s):  
Zhixiu Lu ◽  
Michael Gilchrist ◽  
Scott Emrich

Codon usage bias has been known to reflect the expression level of a protein-coding gene under the evolutionary theory that selection favors certain synonymous codons. Although measuring the effect of selection in simple organisms such as yeast and E. coli has proven to be effective and accurate, codon-based methods perform less well in plants and humans. In this paper, we extend a prior method that incorporates another evolutionary factor, namely mutation bias and its effect on codon usage. Our results indicate that prediction of gene expression is significantly improved under our framework, and suggests that quantification of mutation bias is essential for fully understanding synonymous codon usage. We also propose an improved method, namely MLE-Φ, with much greater computation efficiency and a wider range of applications. An implementation of this method is provided at https://github.com/luzhixiu1996/MLE- Phi.


Viruses ◽  
2021 ◽  
Vol 13 (2) ◽  
pp. 198
Author(s):  
Zhen He ◽  
Zhuozhuo Dong ◽  
Lang Qin ◽  
Haifeng Gan

Broad bean wilt virus 2 (BBWV-2), which belongs to the genus Fabavirus of the family Secoviridae, is an important pathogen that causes damage to broad bean, pepper, yam, spinach and other economically important ornamental and horticultural crops worldwide. Previously, only limited reports have shown the genetic variation of BBWV2. Meanwhile, the detailed evolutionary changes, synonymous codon usage bias and host adaptation of this virus are largely unclear. Here, we performed comprehensive analyses of the phylodynamics, reassortment, composition bias and codon usage pattern of BBWV2 using forty-two complete genome sequences of BBWV-2 isolates together with two other full-length RNA1 sequences and six full-length RNA2 sequences. Both recombination and reassortment had a significant influence on the genomic evolution of BBWV2. Through phylogenetic analysis we detected three and four lineages based on the ORF1 and ORF2 nonrecombinant sequences, respectively. The evolutionary rates of the two BBWV2 ORF coding sequences were 8.895 × 10−4 and 4.560 × 10−4 subs/site/year, respectively. We found a relatively conserved and stable genomic composition with a lower codon usage choice in the two BBWV2 protein coding sequences. ENC-plot and neutrality plot analyses showed that natural selection is the key factor shaping the codon usage pattern of BBWV2. Strong correlations between BBWV2 and broad bean and pepper were observed from similarity index (SiD), codon adaptation index (CAI) and relative codon deoptimization index (RCDI) analyses. Our study is the first to evaluate the phylodynamics, codon usage patterns and adaptive evolution of a fabavirus, and our results may be useful for the understanding of the origin of this virus.


Author(s):  
B. Lencova ◽  
G. Wisselink

Recent progress in computer technology enables the calculation of lens fields and focal properties on commonly available computers such as IBM ATs. If we add to this the use of graphics, we greatly increase the applicability of design programs for electron lenses. Most programs for field computation are based on the finite element method (FEM). They are written in Fortran 77, so that they are easily transferred from PCs to larger machines.The design process has recently been made significantly more user friendly by adding input programs written in Turbo Pascal, which allows a flexible implementation of computer graphics. The input programs have not only menu driven input and modification of numerical data, but also graphics editing of the data. The input programs create files which are subsequently read by the Fortran programs. From the main menu of our magnetic lens design program, further options are chosen by using function keys or numbers. Some options (lens initialization and setting, fine mesh, current densities, etc.) open other menus where computation parameters can be set or numerical data can be entered with the help of a simple line editor. The "draw lens" option enables graphical editing of the mesh - see fig. I. The geometry of the electron lens is specified in terms of coordinates and indices of a coarse quadrilateral mesh. In this mesh, the fine mesh with smoothly changing step size is calculated by an automeshing procedure. The options shown in fig. 1 allow modification of the number of coarse mesh lines, change of coordinates of mesh points or lines, and specification of lens parts. Interactive and graphical modification of the fine mesh can be called from the fine mesh menu. Finally, the lens computation can be called. Our FEM program allows up to 8000 mesh points on an AT computer. Another menu allows the display of computed results stored in output files and graphical display of axial flux density, flux density in magnetic parts, and the flux lines in magnetic lenses - see fig. 2. A series of several lens excitations with user specified or default magnetization curves can be calculated and displayed in one session.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Alexander Schmitz ◽  
Fuzhong Zhang

Abstract Background Cell-to-cell variation in gene expression strongly affects population behavior and is key to multiple biological processes. While codon usage is known to affect ensemble gene expression, how codon usage influences variation in gene expression between single cells is not well understood. Results Here, we used a Sort-seq based massively parallel strategy to quantify gene expression variation from a green fluorescent protein (GFP) library containing synonymous codons in Escherichia coli. We found that sequences containing codons with higher tRNA Adaptation Index (TAI) scores, and higher codon adaptation index (CAI) scores, have higher GFP variance. This trend is not observed for codons with high Normalized Translation Efficiency Index (nTE) scores nor from the free energy of folding of the mRNA secondary structure. GFP noise, or squared coefficient of variance (CV2), scales with mean protein abundance for low-abundant proteins but does not change at high mean protein abundance. Conclusions Our results suggest that the main source of noise for high-abundance proteins is likely not originating at translation elongation. Additionally, the drastic change in mean protein abundance with small changes in protein noise seen from our library implies that codon optimization can be performed without concerning gene expression noise for biotechnology applications.


Sign in / Sign up

Export Citation Format

Share Document