scholarly journals corseq: fast and efficient identification of favoured codons from next generation sequencing reads

PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5099 ◽  
Author(s):  
Salvatore Camiolo ◽  
Andrea Porceddu

Background Optimization of transgene expression can be achieved by designing coding sequences with the synonymous codon usage of genes which are highly expressed in the host organism. The identification of the so-called “favoured codons” generally requires the access to either the genome or the coding sequences and the availability of expression data. Results Here we describe corseq, a fast and reliable software for detecting the favoured codons directly from RNAseq data without prior knowledge of genomic sequence or gene annotation. The presented tool allows the inference of codons that are preferentially used in highly expressed genes while estimating the transcripts abundance by a new kmer based approach. corseq is implemented in Python and runs under any operating system. The software requires the Biopython 1.65 library (or later versions) and is available under the ‘GNU General Public License version 3’ at the project webpage https://sourceforge.net/projects/corseq/files. Conclusion corseq represents a faster and easy-to-use alternative for the detection of favoured codons in non model organisms.

2009 ◽  
Vol 2009 ◽  
pp. 1-11 ◽  
Author(s):  
Sameer Hassan ◽  
Vasantha Mahalingam ◽  
Vanaja Kumar

Synonymous codon usage of protein coding genes of thirty two completely sequenced mycobacteriophage genomes was studied using multivariate statistical analysis. One of the major factors influencing codon usage is identified to be compositional bias. Codons ending with either C or G are preferred in highly expressed genes among which C ending codons are highly preferred over G ending codons. A strong negative correlation between effective number of codons (Nc) and GC3s content was also observed, showing that the codon usage was effected by gene nucleotide composition. Translational selection is also identified to play a role in shaping the codon usage operative at the level of translational accuracy. High level of heterogeneity is seen among and between the genomes. Length of genes is also identified to influence the codon usage in 11 out of 32 phage genomes. Mycobacteriophage Cooper is identified to be the highly biased genome with better translation efficiency comparing well with the host specific tRNA genes.


Genetics ◽  
1991 ◽  
Vol 129 (3) ◽  
pp. 897-907 ◽  
Author(s):  
M Bulmer

Abstract It is argued that the bias in synonymous codon usage observed in unicellular organisms is due to a balance between the forces of selection and mutation in a finite population, with greater bias in highly expressed genes reflecting stronger selection for efficiency of translation. A population genetic model is developed taking into account population size and selective differences between synonymous codons. A biochemical model is then developed to predict the magnitude of selective differences between synonymous codons in unicellular organisms in which growth rate (or possibly growth yield) can be equated with fitness. Selection can arise from differences in either the speed or the accuracy of translation. A model for the effect of speed of translation on fitness is considered in detail, a similar model for accuracy more briefly. The model is successful in predicting a difference in the degree of bias at the beginning than in the rest of the gene under some circumstances, as observed in Escherichia coli, but grossly overestimates the amount of bias expected. Possible reasons for this discrepancy are discussed.


2020 ◽  
Author(s):  
Mark G. Sterken ◽  
Ruud H.P. Wilbers ◽  
Pjotr Prins ◽  
Basten L. Snoek ◽  
George M. Giambasu ◽  
...  

ABSTRACTThe redundancy of the genetic code allows for a regulatory layer to optimize protein synthesis by modulating translation and degradation of mRNAs. Patterns in synonymous codon usage in highly expressed genes have been studied in many species, but scarcely in conjunction with mRNA secondary structure. Here, we analyzed over 2,000 expression profiles covering a range of strains, treatments, and developmental stages of five model species (Escherichia coli, Arabidopsis thaliana, Saccharomyces cerevisiae, Caenorhabditis elegans, and Mus musculus). By comparative analyses of genes constitutively expressed at high and low levels, we revealed a conserved shift in codon usage and predicted mRNA secondary structures. Highly abundant transcripts and proteins, as well as high protein per transcript ratios, were consistently associated with less variable and shorter stretches of weak mRNA secondary structures (loops). Genome-wide recoding showed that codons with the highest relative increase in highly expressed genes, often C-ending and not necessarily the most frequent, enhanced formation of uniform loop sizes. Our results point at a general selective force contributing to the optimal expression of abundant proteins as less variable secondary structures promote regular ribosome trafficking with less detrimental collisions, thereby leading to an increase in mRNA stability and a higher translation efficiency.


Author(s):  
Spyros Lytras ◽  
Joseph Hughes

AbstractDistinct patterns of dinucleotide representation, such as CpG and UpA suppression, are characteristic of certain viral genomes. Recent research has uncovered vertebrate immune mechanisms that select against specific dinucleotides in targeted viruses. This evidence highlights the importance of systematically examining the dinucleotide composition of viral genomes. We have developed a novel metric, called Synonymous Dinucleotide Usage (SDU), for quantifying dinucleotide representation in coding sequences. Our method compares the abundance of a given dinucleotide to the null hypothesis of equal synonymous codon usage in the sequence. We present a Python3 package, DinuQ, for calculating SDU and other relevant metrics. We have applied this method on two sets of invertebrate- and vertebrate-specific flaviviruses and rhabdoviruses. The SDU shows that the vertebrate viruses exhibit consistently greater under-representation of CpG dinucleotides in all three codon positions in both datasets. In comparison to existing metrics for dinucleotide quantification, the SDU allows for a statistical interpretation of its values by comparing it to a null expectation based on the codon table. Here we apply the method to viruses, but coding sequences of other living organisms can be analysed in the same way.


Viruses ◽  
2020 ◽  
Vol 12 (4) ◽  
pp. 462 ◽  
Author(s):  
Spyros Lytras ◽  
Joseph Hughes

Distinct patterns of dinucleotide representation, such as CpG and UpA suppression, are characteristic of certain viral genomes. Recent research has uncovered vertebrate immune mechanisms that select against specific dinucleotides in targeted viruses. This evidence highlights the importance of systematically examining the dinucleotide composition of viral genomes. We have developed a novel metric, called synonymous dinucleotide usage (SDU), for quantifying dinucleotide representation in coding sequences. Our method compares the abundance of a given dinucleotide to the null hypothesis of equal synonymous codon usage in the sequence. We present a Python3 package, DinuQ, for calculating SDU and other relevant metrics. We have applied this method on two sets of invertebrate- and vertebrate-specific flaviviruses and rhabdoviruses. The SDU shows that the vertebrate viruses exhibit consistently greater under-representation of CpG dinucleotides in all three codon positions in both datasets. In comparison to existing metrics for dinucleotide quantification, the SDU allows for a statistical interpretation of its values by comparing it to a null expectation based on the codon table. Here we apply the method to viruses, but coding sequences of other living organisms can be analysed in the same way.


Viruses ◽  
2019 ◽  
Vol 11 (8) ◽  
pp. 752 ◽  
Author(s):  
Zhen He ◽  
Haifeng Gan ◽  
Xinyan Liang

Potato virus M (PVM) is a member of the genus Carlavirus of the family Betaflexviridae and causes large economic losses of nightshade crops. Several previous studies have elucidated the population structure, evolutionary timescale and adaptive evolution of PVM. However, the synonymous codon usage pattern of PVM remains unclear. In this study, we performed comprehensive analyses of the codon usage and composition of PVM based on 152 nucleotide sequences of the coat protein (CP) gene and 125 sequences of the cysteine-rich nucleic acid binding protein (NABP) gene. We observed that the PVM CP and NABP coding sequences were GC-and AU-rich, respectively, whereas U- and G-ending codons were preferred in the PVM CP and NABP coding sequences. The lower codon usage of the PVM CP and NABP coding sequences indicated a relatively stable and conserved genomic composition. Natural selection and mutation pressure shaped the codon usage patterns of PVM, with natural selection being the most important factor. The codon adaptation index (CAI) and relative codon deoptimization index (RCDI) analysis revealed that the greatest adaption of PVM was to pepino, followed by tomato and potato. Moreover, similarity Index (SiD) analysis showed that pepino had a greater impact on PVM than tomato and potato. Our study is the first attempt to evaluate the codon usage pattern of the PVM CP and NABP genes to better understand the evolutionary changes of a carlavirus.


2004 ◽  
Vol 12 (01) ◽  
pp. 91-103
Author(s):  
FEI MA ◽  
YONGLONG ZHUANG ◽  
LIMING CHEN ◽  
LUPING LIN ◽  
YANDA LI ◽  
...  

It is becoming clear that alternative splicing plays an important role in expanding protein diversity. However, the previous studies on codons usage did not distinguish alternative splicing from non-alternative splicing. Do codon usage patterns hold distinctions between them? Thus, we attempted to systematically compare the differences of synonymous codon usage patterns between alternatively and non-alternatively spliced genes by analyzing the large datasets from human genome. The results indicated:(1) There are highly significant differences in the average Nc values between non-alternatively spliced genes and the longer isoform genes as well as the shorter isoform genes, and the level of codon usage bias of non-alternatively spliced genes is to some extent higher than that in alternatively spliced genes.(2) Very extensive heterogeneity of G+C content in silent third codon position (GC3s) was evident among these genes, and it could be also shown there are highly significant differences in the average GC3s values between non-alternatively spliced genes and the longer isoform genes as well as the shorter isoform genes.(3) The Nc-plots and correspondence analysis reveal that codon usage bias are mainly dominated by mutation bias, and no correlation between gene expression level and synonymous codon biased usage is found in human genes.(4) Overall codon usage data analysis indicated that the C-ending codons usage has a highly significant differences between the longer isoform genes and non-alternatively spliced genes as well as the shorter isoform genes, it further found out that there is no significant differences of C-ending codons usage between the shorter isoform genes and non-alternatively spliced genes.Finally, our results seem to imply that alternative splicing gene may originate from non-alternative splicing gene, and may be created by DNA mutation or gene fusion, and be retained through nature selection and adaptive evolution.


Genetics ◽  
2001 ◽  
Vol 159 (1) ◽  
pp. 347-358
Author(s):  
Brian R Morton

Abstract A previously employed method that uses the composition of noncoding DNA as the basis of a test for selection between synonymous codons in plastid genes is reevaluated. The test requires the assumption that in the absence of selective differences between synonymous codons the composition of silent sites in coding sequences will match the composition of noncoding sites. It is demonstrated here that this assumption is not necessarily true and, more generally, that using compositional properties to draw inferences about selection on silent changes in coding sequences is much more problematic than commonly assumed. This is so because selection on nonsynonymous changes can influence the composition of synonymous sites (i.e., codon usage) in a complex manner, meaning that the composition biases of different silent sites, including neutral noncoding DNA, are not comparable. These findings also draw into question the commonly utilized method of investigating how selection to increase translation accuracy influences codon usage. The work then focuses on implications for studies that assess codon adaptation, which is selection on codon usage to enhance translation rate, in plastid genes. A new test that does not require the use of noncoding DNA is proposed and applied. The results of this test suggest that far fewer plastid genes display codon adaptation than previously thought.


Sign in / Sign up

Export Citation Format

Share Document