scholarly journals The shiftability of protein coding genes: the genetic code was optimized for frameshift tolerating

Author(s):  
Xiaolong Wang ◽  
Xuxiang Wang ◽  
Gang Chen ◽  
Jianye Zhang ◽  
Yongqiang Liu ◽  
...  

The genetic code defines the relationship between a protein and its coding DNA sequence. It was presumed that most frameshifts would yield non-functional, truncated or cytotoxic products. In this study, we report that in E. coli, a frameshift β-lactamase (bla) gene is still functional if all of the inner stop codons were readthrough or replaced by a sense codon. By analyzing a large dataset including all available protein coding genes in major model organisms, it is demonstrated that in any species, and in any protein-coding genes, the three translational products from the three different reading frames, are always similar to each other and with constant ~50% similarities and ~100% coverages, and the similarities is predefined by the genetic code rather than the sequences themselves. It is likely that a coding gene can be translated into three isoforms from each of the three reading frames, we propose a new gene expression paradigm, “one transcript, three translations”, which is an amendment to the traditional “one gene, one/multiple peptides” hypotheses. Finally, we concluded that the genetic code was optimized for frameshift tolerating in the early evolution, which endows every protein coding gene a character of shiftability, an inherent and everlasting ability to tolerate frameshift mutations, and serves as an innate mechanism for cells to deal with the frameshift problem.

Author(s):  
Xiaolong Wang ◽  
Xuxiang Wang ◽  
Gang Chen ◽  
Jianye Zhang ◽  
Yongqiang Liu ◽  
...  

The genetic code defines the relationship between a protein and its coding DNA sequence. It was presumed that most frameshifts would yield non-functional, truncated or cytotoxic products. In this study, we report that in E. coli, a frameshift β-lactamase (bla) gene is still functional if all of the inner stop codons were readthrough or replaced by a sense codon. By analyzing a large dataset including all available protein coding genes in major model organisms, it is demonstrated that in any species, and in any protein-coding genes, the three translational products from the three different reading frames, are always similar to each other and with constant ~50% similarities and ~100% coverages, and the similarities is predefined by the genetic code rather than the sequences themselves. It is likely that a coding gene can be translated into three isoforms from each of the three reading frames, we propose a new gene expression paradigm, “one transcript, three translations”, which is an amendment to the traditional “one gene, one/multiple peptides” hypotheses. Finally, we concluded that the genetic code was optimized for frameshift tolerating in the early evolution, which endows every protein coding gene a character of shiftability, an inherent and everlasting ability to tolerate frameshift mutations, and serves as an innate mechanism for cells to deal with the frameshift problem.


2020 ◽  
Vol 21 (11) ◽  
pp. 1068-1077
Author(s):  
Xiaochao Sun ◽  
Bin Yang ◽  
Qunye Zhang

: Many studies have shown that the spatial distribution of genes within a single chromosome exhibits distinct patterns. However, little is known about the characteristics of inter-chromosomal distribution of genes (including protein-coding genes, processed transcripts and pseudogenes) in different genomes. In this study, we explored these issues using the available genomic data of both human and model organisms. Moreover, we also analyzed the distribution pattern of protein-coding genes that have been associated with 14 common diseases and the insert/deletion mutations and single nucleotide polymorphisms detected by whole genome sequencing in an acute promyelocyte leukemia patient. We obtained the following novel findings. Firstly, inter-chromosomal distribution of genes displays a nonstochastic pattern and the gene densities in different chromosomes are heterogeneous. This kind of heterogeneity is observed in genomes of both lower and higher species. Secondly, protein-coding genes involved in certain biological processes tend to be enriched in one or a few chromosomes. Our findings have added new insights into our understanding of the spatial distribution of genome and disease- related genes across chromosomes. These results could be useful in improving the efficiency of disease-associated gene screening studies by targeting specific chromosomes.


2021 ◽  
Vol 22 (4) ◽  
pp. 1876
Author(s):  
Frida Belinky ◽  
Ishan Ganguly ◽  
Eugenia Poliakov ◽  
Vyacheslav Yurchenko ◽  
Igor B. Rogozin

Nonsense mutations turn a coding (sense) codon into an in-frame stop codon that is assumed to result in a truncated protein product. Thus, nonsense substitutions are the hallmark of pseudogenes and are used to identify them. Here we show that in-frame stop codons within bacterial protein-coding genes are widespread. Their evolutionary conservation suggests that many of them are not pseudogenes, since they maintain dN/dS values (ratios of substitution rates at non-synonymous and synonymous sites) significantly lower than 1 (this is a signature of purifying selection in protein-coding regions). We also found that double substitutions in codons—where an intermediate step is a nonsense substitution—show a higher rate of evolution compared to null models, indicating that a stop codon was introduced and then changed back to sense via positive selection. This further supports the notion that nonsense substitutions in bacteria are relatively common and do not necessarily cause pseudogenization. In-frame stop codons may be an important mechanism of regulation: Such codons are likely to cause a substantial decrease of protein expression levels.


2015 ◽  
Vol 1 ◽  
pp. e33 ◽  
Author(s):  
Elisha D. Roberson

CRISPR/Cas9 is emerging as one of the most-used methods of genome modification in organisms ranging from bacteria to human cells. However, the efficiency of editing varies tremendously site-to-site. A recent report identified a novel motif, called the 3′GG motif, which substantially increases the efficiency of editing at all sites tested inC. elegans. Furthermore, they highlighted that previously published gRNAs with high editing efficiency also had this motif. I designed a Python command-line tool, ngg2, to identify 3′GG gRNA sites from indexed FASTA files. As a proof-of-concept, I screened for these motifs in six model genomes:Saccharomyces cerevisiae,Caenorhabditis elegans,Drosophila melanogaster,Danio rerio,Mus musculus, andHomo sapiens. I also scanned the genomes of pig (Sus scrofa) and African elephant (Loxodonta africana) to demonstrate the utility in non-model organisms. I identified more than 60 million single match 3′GG motifs in these genomes. Greater than 61% of all protein coding genes in the reference genomes had at least one unique 3′GG gRNA site overlapping an exon. In particular, more than 96% of mouse and 93% of human protein coding genes have at least one unique, overlapping 3′GG gRNA. These identified sites can be used as a starting point in gRNA selection, and the ngg2 tool provides an important ability to identify 3′GG editing sites in any species with an available genome sequence.


mBio ◽  
2016 ◽  
Vol 7 (6) ◽  
Author(s):  
Adi Oron-Gottesman ◽  
Martina Sauert ◽  
Isabella Moll ◽  
Hanna Engelberg-Kulka

ABSTRACT Escherichia coli mazEF is an extensively studied stress-induced toxin-antitoxin (TA) system. The toxin MazF is an endoribonuclease that cleaves RNAs at ACA sites. Thereby, under stress, the induced MazF generates a stress-induced translation machinery (STM), composed of MazF-processed mRNAs and selective ribosomes that specifically translate the processed mRNAs. Here, we further characterized the STM system, finding that MazF cleaves only ACA sites located in the open reading frames of processed mRNAs, while out-of-frame ACAs are resistant. This in-frame ACA cleavage of MazF seems to depend on MazF binding to an extracellular-death-factor (EDF)-like element in ribosomal protein bS1 (bacterial S1), apparently causing MazF to be part of STM ribosomes. Furthermore, due to the in-frame MazF cleavage of ACAs under stress, a bias occurs in the reading of the genetic code causing the amino acid threonine to be encoded only by its synonym codon ACC, ACU, or ACG, instead of by ACA. IMPORTANCE The genetic code is a universal characteristic of all living organisms. It defines the set of rules by which nucleotide triplets specify which amino acid will be incorporated into a protein. Our results represent the first existing report on a stress-induced bias in the reading of the genetic code. We found that in E. coli , under stress, the amino acid threonine is encoded only by its synonym codon ACC, ACU, or ACG, instead of by ACA. This is because under stress, MazF generates a stress-induced translation machinery (STM) in which MazF cleaves in-frame ACA sites of the processed mRNAs.


2019 ◽  
Vol 8 (40) ◽  
Author(s):  
James E. Corban ◽  
Jacob Gramer ◽  
Russell Moreland ◽  
Mei Liu ◽  
Jolene Ramsey

Escherichia coli is a Gram-negative bacterium often found in animal intestinal tracts. Here, we present the genome of the Guernseyvirinae-like E. coli 4s siphophage Snoke. The 44.4-kb genome contains 81 protein-coding genes, for which 33 functions were predicted. The capsid morphogenesis gene in Snoke contains a large intein.


eLife ◽  
2018 ◽  
Vol 7 ◽  
Author(s):  
Janaina de Freitas Nascimento ◽  
Steven Kelly ◽  
Jack Sunter ◽  
Mark Carrington

Selective transcription of individual protein coding genes does not occur in trypanosomes and the cellular copy number of each mRNA must be determined post-transcriptionally. Here, we provide evidence that codon choice directs the levels of constitutively expressed mRNAs. First, a novel codon usage metric, the gene expression codon adaptation index (geCAI), was developed that maximised the relationship between codon choice and the measured abundance for a transcriptome. Second, geCAI predictions of mRNA levels were tested using differently coded GFP transgenes and were successful over a 25-fold range, similar to the variation in endogenous mRNAs. Third, translation was necessary for the accelerated mRNA turnover resulting from codon choice. Thus, in trypanosomes, the information determining the levels of most mRNAs resides in the open reading frame and translation is required to access this information.


2000 ◽  
Vol 23 (4) ◽  
pp. 745-752 ◽  
Author(s):  
Sérgio Luiz Pereira

With the advent of DNA sequencing techniques the organization of the vertebrate mitochondrial genome shows variation between higher taxonomic levels. The most conserved gene order is found in placental mammals, turtles, fishes, some lizards and Xenopus. Birds, other species of lizards, crocodilians, marsupial mammals, snakes, tuatara, lamprey, and some other amphibians and one species of fish have gene orders that are less conserved. The most probable mechanism for new gene rearrangements seems to be tandem duplication and multiple deletion events, always associated with tRNA sequences. Some new rearrangements seem to be typical of monophyletic groups and the use of data from these groups may be useful for answering phylogenetic questions involving vertebrate higher taxonomic levels. Other features such as the secondary structure of tRNA, and the start and stop codons of protein-coding genes may also be useful in comparisons of vertebrate mitochondrial genomes.


2021 ◽  
Vol 9 (1) ◽  
pp. 129
Author(s):  
Katelyn McNair ◽  
Carol L. Ecale Zhou ◽  
Brian Souza ◽  
Stephanie Malfatti ◽  
Robert A. Edwards

One of the main steps in gene-finding in prokaryotes is determining which open reading frames encode for a protein, and which occur by chance alone. There are many different methods to differentiate the two; the most prevalent approach is using shared homology with a database of known genes. This method presents many pitfalls, most notably the catch that you only find genes that you have seen before. The four most popular prokaryotic gene-prediction programs (GeneMark, Glimmer, Prodigal, Phanotate) all use a protein-coding training model to predict protein-coding genes, with the latter three allowing for the training model to be created ab initio from the input genome. Different methods are available for creating the training model, and to increase the accuracy of such tools, we present here GOODORFS, a method for identifying protein-coding genes within a set of all possible open reading frames (ORFS). Our workflow begins with taking the amino acid frequencies of each ORF, calculating an entropy density profile (EDP), using KMeans to cluster the EDPs, and then selecting the cluster with the lowest variation as the coding ORFs. To test the efficacy of our method, we ran GOODORFS on 14,179 annotated phage genomes, and compared our results to the initial training-set creation step of four other similar methods (Glimmer, MED2, PHANOTATE, Prodigal). We found that GOODORFS was the most accurate (0.94) and had the best F1-score (0.85), while Glimmer had the highest precision (0.92) and PHANOTATE had the highest recall (0.96).


Sign in / Sign up

Export Citation Format

Share Document