A novel pH-regulated, unusual 603 bp overlapping protein coding gene pop is encoded antisense to ompA in Escherichia coli O157:H7 (EHEC)

Mapping Intimacies ◽

10.1101/852251 ◽

2019 ◽

Author(s):

Barbara Zehentner ◽

Zachary Ardern ◽

Michaela Kreitmeier ◽

Siegfried Scherer ◽

Klaus Neuhaus

Keyword(s):

Antisense Rna ◽

Prokaryotic Genome ◽

Antisense Transcription ◽

Ribosome Profiling ◽

Escherichia Coli O157 ◽

Overlapping Genes ◽

Western Blots ◽

Protein Coding ◽

Reading Frame ◽

Growth Experiments

AbstractAntisense transcription is well known in bacteria. However, translation of antisense RNAs is typically not considered, as the implied overlapping coding at a DNA locus is assumed to be highly improbable. Therefore, such overlapping genes are systematically excluded in prokaryotic genome annotation. Here we report an exceptional 603 bp long open reading frame completely embedded in antisense to the gene of the outer membrane protein ompA. Ribosomal profiling revealed translation of the mRNA and the protein was detected in Western blots. A σ70 promoter, transcription start site, Shine-Dalgarno motif and rho-independent terminator were experimentally validated. A pH-dependent phenotype conferred by the protein was shown in competitive overexpression growth experiments of a translationally arrested mutant versus wild type. We designate this novel gene pop (pH-regulated overlapping protein-coding gene). Increasing evidence based on ribosome-profiling indicates translation of antisense RNA, suggesting that more overlapping genes of unknown function may exist in bacteria.

A de novo evolved gene in the house mouse regulates female pregnancy cycles

eLife ◽

10.7554/elife.44392 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 4

Author(s):

Chen Xie ◽

Cemalettin Bekpen ◽

Sven Künzel ◽

Maryam Keshavarz ◽

Rebecca Krebs-Wheaton ◽

...

Keyword(s):

House Mouse ◽

De Novo ◽

Specific Protein ◽

Ribosome Profiling ◽

Mass Spectrometry Data ◽

Preimplantation Embryos ◽

Protein Coding ◽

Reading Frame ◽

Protein Coding Genes ◽

New Genes

The de novo emergence of new genes has been well documented through genomic analyses. However, a functional analysis, especially of very young protein-coding genes, is still largely lacking. Here, we identify a set of house mouse-specific protein-coding genes and assess their translation by ribosome profiling and mass spectrometry data. We functionally analyze one of them, Gm13030, which is specifically expressed in females in the oviduct. The interruption of the reading frame affects the transcriptional network in the oviducts at a specific stage of the estrous cycle. This includes the upregulation of Dcpp genes, which are known to stimulate the growth of preimplantation embryos. As a consequence, knockout females have their second litters after shorter times and have a higher infanticide rate. Given that Gm13030 shows no signs of positive selection, our findings support the hypothesis that a de novo evolved gene can directly adopt a function without much sequence adaptation.

uORF-Tools – Workflow for the determination of translation-regulatory upstream open reading frames

10.1101/415018 ◽

2018 ◽

Cited By ~ 1

Author(s):

Anica Scholz ◽

Florian Eggenhofer ◽

Rick Gelhausen ◽

Björn Grüning ◽

Kathi Zarnack ◽

...

Keyword(s):

Ribosome Profiling ◽

Open Reading Frames ◽

Annotation File ◽

Inhibitory Effects ◽

Protein Coding ◽

Reading Frame ◽

Upstream Open Reading Frames ◽

Induced Changes ◽

Reading Frames

AbstractRibosome profiling (ribo-seq) provides a means to analyze active translation by determining ribosome occupancy in a transcriptome-wide manner. The vast majority of ribosome protected fragments (RPFs) resides within the protein-coding sequence of mRNAs. However, commonly reads are also found within the transcript leader sequence (TLS) (aka 5’ untranslated region) preceding the main open reading frame (ORF), indicating the translation of regulatory upstream ORFs (uORFs). Here, we present a workflow for the identification of translation-regulatory uORFs. Specifically, uORF-Tools identifies uORFs within a given dataset and generates a uORF annotation file. In addition, a comprehensive human uORF annotation file, based on 35 ribo-seq files, is provided, which can serve as an alternative input file for the workflow. To assess the translation-regulatory activity of the uORFs, stimulus-induced changes in the ratio of the RPFs residing in the main ORFs relative to those found in the associated uORFs are determined. The resulting output file allows for the easy identification of candidate uORFs, which have translation-inhibitory effects on their associated main ORFs. uORF-Tools is available as a free and open Snakemake workflow at https://github.com/Biochemistry1-FFM/uORF-Tools. It is easily installed and all necessary tools are provided in a version-controlled manner, which also ensures lasting usability. uORF-Tools is designed for intuitive use and requires only limited computing times and resources.

Does everything now make (anti)sense?

Biochemical Society Transactions ◽

10.1042/bst0341148 ◽

2006 ◽

Vol 34 (6) ◽

pp. 1148-1150 ◽

Cited By ~ 14

Author(s):

J.A. Timmons ◽

L. Good

Keyword(s):

Gene Expression ◽

Antisense Rna ◽

Functional Annotation ◽

Rna World ◽

Antisense Transcription ◽

Functional Importance ◽

Protein Coding ◽

Species Diversification ◽

Mammalian Genomes ◽

Powerful Mechanism

The data generated by the FANTOM (Functional Annotation of Mouse) consortium, Compugen and Affymetrix have collectively provided evidence that most of the mammalian genomes are actively transcribed. The emergence of an antisense RNA world brings new practical complexities to the study and detection of gene expression. However, we also need to address the fundamental questions regarding the functional importance of these molecules. In this brief paper, we focus on non-coding natural antisense transcription, as it appears to be a potentially powerful mechanism for extending the complexity of the protein coding genome, which is currently unable to explain inter-species diversification.

A spectral analysis approach to detect actively translated open reading frames in high-resolution ribosome profiling data

10.1101/031625 ◽

2015 ◽

Author(s):

Lorenzo Calviello ◽

Neelanjan Mukherjee ◽

Emanuel Wyler ◽

Henrik Zauber ◽

Antje Hirsekorn ◽

...

Keyword(s):

Spectral Analysis ◽

Gene Expression Regulation ◽

De Novo ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Mass Spectrometry Data ◽

Hek293 Cells ◽

Protein Coding ◽

Reading Frame ◽

Reading Frames

RNA sequencing protocols allow for quantifying gene expression regulation at each individual step, from transcription to protein synthesis. Ribosome Profiling (Ribo-seq) maps the positions of translating ribosomes over the entire transcriptome. Despite its great potential, a rigorous statistical approach to identify translated regions by means of the characteristic three-nucleotide periodicity of Ribo-seq data is not yet available. To fill this gap, we developed RiboTaper, which quantifies the significance of periodic Ribo-seq reads via spectral analysis methods. We applied RiboTaper on newly generated, deep Ribo-seq data in HEK293 cells, to derive an extensive map of translation that covers Open Reading Frame (ORF) annotations for more than 11,000 protein- coding genes. We also find distinct ribosomal signatures for several hundred detected upstream ORFs and ORFs in annotated non-coding genes (ncORFs). Mass spectrometry data confirms that RiboTaper achieves excellent coverage of the cellular proteome and validates dozens of novel peptide products. Collectively, RiboTaper (available at https://ohlerlab.mdc-berlin.de/software/ ) is a powerful method for comprehensive de novo identification of actively used ORFs in the human genome.

A Coding Sequence-Embedded Principle Governs Translational Reading Frame Fidelity

Research ◽

10.1155/2018/7089174 ◽

2018 ◽

Vol 2018 ◽

pp. 1-15 ◽

Cited By ~ 1

Author(s):

Ji Wan ◽

Xiangwei Gao ◽

Yuanhui Mao ◽

Xingqian Zhang ◽

Shu-Bing Qian

Keyword(s):

18S Rrna ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Start Codon ◽

Data Sets ◽

Protein Coding ◽

Reading Frame ◽

Codon Composition ◽

Sequence Elements ◽

Correct Reading

Upon initiation at a start codon, the ribosome must maintain the correct reading frame for hundreds of codons in order to produce functional proteins. While some sequence elements are able to trigger programmed ribosomal frameshifting (PRF), very little is known about how the ribosome normally prevents spontaneous frameshift errors that can have dire consequences if uncorrected. Using high resolution ribosome profiling data sets, we discovered that the translating ribosome uses the 3′ end of 18S rRNA to scan the AUG-like codons after the decoding process. The postdecoding mRNA:rRNA interaction not only contributes to predominant translational pausing, but also provides a retrospective mechanism to safeguard the ribosome in the correct reading frame. Partially eliminating the AUG-like “sticky” codons in the reporter message leads to increased +1 frameshift errors. Remarkably, mutating the highly conserved CAU triplet of 18S rRNA globally changes the codon “stickiness”. Further supporting the role of “sticky” sequences in reading frame maintenance, the codon composition of open reading frames is highly optimized across eukaryotic genomes. These results suggest an important layer of information embedded within the protein-coding sequences that instructs the ribosome to ensure reading frame fidelity during translation.

Unprecedentedly efficient CUG initiation of an overlapping reading frame in POLG mRNA yields novel protein POLGARF

10.1101/2020.03.06.980391 ◽

2020 ◽

Author(s):

G Loughran ◽

AV Zhdanov ◽

MS Mikhaylova ◽

FN Rozov ◽

PN Datskevich ◽

...

Keyword(s):

Dna Polymerase ◽

De Novo ◽

Gene Evolution ◽

Ribosome Profiling ◽

Dual Coding ◽

Protein Coding ◽

Reading Frame ◽

Serum Stimulation ◽

Functional Investigation ◽

Mitochondrial Dna Polymerase

AbstractWhile near cognate codons are frequently used for translation initiation in eukaryotes, their efficiencies are usually low (<10% compared to an AUG in optimal context). Here we describe a rare case of highly efficient near cognate initiation. A CUG triplet located in the 5’ leader of POLG mRNA initiates almost as efficiently (~60-70%) as an AUG in optimal context. This CUG directs translation of a conserved 260 triplet-long overlapping ORF, which we call POLGARF (POLGAlternative Reading Frame). Translation of a short upstream ORF 5’ of this CUG governs the ratio between DNA polymerase and POLGARF produced from a single POLG mRNA. Functional investigation of POLGARF points to extracellular signalling. While unprocessed POLGARF resides in the nucleoli together with its interacting partner C1QBP, serum stimulation results in rapid secretion of POLGARF C-terminal fragment. Phylogenetic analysis shows that POLGARF evolved ~160 million years ago due to an MIR transposition into the 5’ leader sequence of the mammalian POLG gene which became fixed in placental mammals. The discovery of POLGARF unveils a previously undescribed mechanism of de novo protein-coding gene evolution.Significance StatementIn this study, we describe previously unknown mechanism of de novo protein-coding gene evolution. We show that the POLG gene, which encodes the catalytic subunit of mitochondrial DNA polymerase, is in fact a dual coding gene. Ribosome profiling, phylogenetic conservation, and reporter construct analyses all demonstrate that POLG mRNA possesses a conserved CUG codon which serves as a start of translation for an exceptionally long overlapping open reading frame (260 codons in human) present in all placental mammals. We called the protein encoded in this alternative reading frame POLGARF. We provide evidence that the evolution of POLGARF was incepted upon insertion of an MIR transposable element of the SINE family.

Noncoding AUG circRNAs constitute an abundant and conserved subclass of circles

Life Science Alliance ◽

10.26508/lsa.201900398 ◽

2019 ◽

Vol 2 (3) ◽

pp. e201900398 ◽

Cited By ~ 23

Author(s):

Lotte VW Stagsted ◽

Katrine M Nielsen ◽

Iben Daugaard ◽

Thomas B Hansen

Keyword(s):

Ribosome Profiling ◽

Start Codon ◽

Circular Rnas ◽

Protein Coding ◽

Reading Frame ◽

Flanking Sequences ◽

Bona Fide ◽

Translational Start Codon ◽

Comprehensive Classification ◽

Translational Start

Circular RNAs (circRNAs) are a subset of noncoding RNAs previously considered as products of missplicing. Now, circRNAs are considered functional molecules, although to date, only few functions have been experimentally validated. Here, based on RNA sequencing from the ENCODE consortium, we identify and characterize a subset of circRNAs, coined AUG circRNAs, encompassing the annotated translational start codon from the protein-coding host genes. AUG circRNAs are more abundantly expressed and conserved than other groups of circRNAs, and they display flanking sequences that suggest an Alu-independent mechanism of biogenesis. The AUG circRNAs contain part of bona fide open reading frame, and in the recent years, several studies have reported cases of circRNA translation. However, using thorough cross-species analysis, extensive ribosome profiling, proteomics analyses, and experimental data on a selected panel of AUG circRNAs, we observe no indications of translation of AUG circRNAs or any other circRNAs. Our data provide a comprehensive classification of circRNAs and, collectively, the data suggest that the AUG circRNAs constitute an abundant subclass of circRNAs produced independently of primate-specific Alu elements.

Translation of 5′ leaders is pervasive in genes resistant to eIF2 repression

eLife ◽

10.7554/elife.03971 ◽

2015 ◽

Vol 4 ◽

Cited By ~ 171

Author(s):

Dmitry E Andreev ◽

Patrick BF O'Connor ◽

Ciara Fahey ◽

Elaine M Kenny ◽

Ilya M Terenin ◽

...

Keyword(s):

Ribosome Profiling ◽

Open Reading Frames ◽

Translation Initiation Factor ◽

Upstream Open Reading Frame ◽

Initiation Factor ◽

Translation Control ◽

Functional Protein ◽

Protein Coding ◽

Reading Frame ◽

Initiation Factor 2

Eukaryotic cells rapidly reduce protein synthesis in response to various stress conditions. This can be achieved by the phosphorylation-mediated inactivation of a key translation initiation factor, eukaryotic initiation factor 2 (eIF2). However, the persistent translation of certain mRNAs is required for deployment of an adequate stress response. We carried out ribosome profiling of cultured human cells under conditions of severe stress induced with sodium arsenite. Although this led to a 5.4-fold general translational repression, the protein coding open reading frames (ORFs) of certain individual mRNAs exhibited resistance to the inhibition. Nearly all resistant transcripts possess at least one efficiently translated upstream open reading frame (uORF) that represses translation of the main coding ORF under normal conditions. Site-specific mutagenesis of two identified stress resistant mRNAs (PPP1R15B and IFRD1) demonstrated that a single uORF is sufficient for eIF2-mediated translation control in both cases. Phylogenetic analysis suggests that at least two regulatory uORFs (namely, in SLC35A4 and MIEF1) encode functional protein products.

Disrupting upstream translation in mRNAs is associated with human disease

Nature Communications ◽

10.1038/s41467-021-21812-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

David S. M. Lee ◽

Joseph Park ◽

Andrew Kromer ◽

Aris Baras ◽

Daniel J. Rader ◽

...

Keyword(s):

Protein Expression ◽

Biological Significance ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Protein Coding ◽

Stop Codons ◽

Human Genes ◽

Strong Negative Selection ◽

Disease Associations ◽

Reading Frames

AbstractRibosome-profiling has uncovered pervasive translation in non-canonical open reading frames, however the biological significance of this phenomenon remains unclear. Using genetic variation from 71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5’UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals, respectively, and elucidate their impact on protein expression in human cells. Our results suggest translation disrupting mechanisms relating uORF variation to reduced protein expression, and demonstrate that translation at uORFs is genetically constrained in 50% of human genes.

Profile hidden Markov model sequence analysis can help remove putative pseudogenes from DNA barcoding and metabarcoding datasets

BMC Bioinformatics ◽

10.1186/s12859-021-04180-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

T. M. Porter ◽

M. Hajibabaei

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Dna Barcoding ◽

Profile Analysis ◽

Hidden Markov ◽

Open Reading Frame ◽

Protein Coding ◽

Reading Frame ◽

Frame Length ◽

Open Reading Frame Length

Abstract Background Pseudogenes are non-functional copies of protein coding genes that typically follow a different molecular evolutionary path as compared to functional genes. The inclusion of pseudogene sequences in DNA barcoding and metabarcoding analysis can lead to misleading results. None of the most widely used bioinformatic pipelines used to process marker gene (metabarcode) high throughput sequencing data specifically accounts for the presence of pseudogenes in protein-coding marker genes. The purpose of this study is to develop a method to screen for nuclear mitochondrial DNA segments (nuMTs) in large COI datasets. We do this by: (1) describing gene and nuMT characteristics from an artificial COI barcode dataset, (2) show the impact of two different pseudogene removal methods on perturbed community datasets with simulated nuMTs, and (3) incorporate a pseudogene filtering step in a bioinformatic pipeline that can be used to process Illumina paired-end COI metabarcode sequences. Open reading frame length and sequence bit scores from hidden Markov model (HMM) profile analysis were used to detect pseudogenes. Results Our simulations showed that it was more difficult to identify nuMTs from shorter amplicon sequences such as those typically used in metabarcoding compared with full length DNA barcodes that are used in the construction of barcode libraries. It was also more difficult to identify nuMTs in datasets where there is a high percentage of nuMTs. Existing bioinformatic pipelines used to process metabarcode sequences already remove some nuMTs, especially in the rare sequence removal step, but the addition of a pseudogene filtering step can remove up to 5% of sequences even when other filtering steps are in place. Conclusions Open reading frame length filtering alone or combined with hidden Markov model profile analysis can be used to effectively screen out apparent pseudogenes from large datasets. There is more to learn from COI nuMTs such as their frequency in DNA barcoding and metabarcoding studies, their taxonomic distribution, and evolution. Thus, we encourage the submission of verified COI nuMTs to public databases to facilitate future studies.