scholarly journals Unusually efficient CUG initiation of an overlapping reading frame in POLG mRNA yields novel protein POLGARF

2020 ◽  
Vol 117 (40) ◽  
pp. 24936-24946 ◽  
Author(s):  
Gary Loughran ◽  
Alexander V. Zhdanov ◽  
Maria S. Mikhaylova ◽  
Fedor N. Rozov ◽  
Petr N. Datskevich ◽  
...  

While near-cognate codons are frequently used for translation initiation in eukaryotes, their efficiencies are usually low (<10% compared to an AUG in optimal context). Here, we describe a rare case of highly efficient near-cognate initiation. A CUG triplet located in the 5′ leader of POLG messenger RNA (mRNA) initiates almost as efficiently (∼60 to 70%) as an AUG in optimal context. This CUG directs translation of a conserved 260-triplet-long overlapping open reading frame (ORF), which we call POLGARF (POLG Alternative Reading Frame). Translation of a short upstream ORF 5′ of this CUG governs the ratio between POLG (the catalytic subunit of mitochondrial DNA polymerase) and POLGARF synthesized from a single POLG mRNA. Functional investigation of POLGARF suggests a role in extracellular signaling. While unprocessed POLGARF localizes to the nucleoli together with its interacting partner C1QBP, serum stimulation results in rapid cleavage and secretion of a POLGARF C-terminal fragment. Phylogenetic analysis shows that POLGARF evolved ∼160 million y ago due to a mammalian-wide interspersed repeat (MIR) transposition into the 5′ leader sequence of the mammalian POLG gene, which became fixed in placental mammals. This discovery of POLGARF unveils a previously undescribed mechanism of de novo protein-coding gene evolution.

2020 ◽  
Author(s):  
G Loughran ◽  
AV Zhdanov ◽  
MS Mikhaylova ◽  
FN Rozov ◽  
PN Datskevich ◽  
...  

AbstractWhile near cognate codons are frequently used for translation initiation in eukaryotes, their efficiencies are usually low (<10% compared to an AUG in optimal context). Here we describe a rare case of highly efficient near cognate initiation. A CUG triplet located in the 5’ leader of POLG mRNA initiates almost as efficiently (~60-70%) as an AUG in optimal context. This CUG directs translation of a conserved 260 triplet-long overlapping ORF, which we call POLGARF (POLGAlternative Reading Frame). Translation of a short upstream ORF 5’ of this CUG governs the ratio between DNA polymerase and POLGARF produced from a single POLG mRNA. Functional investigation of POLGARF points to extracellular signalling. While unprocessed POLGARF resides in the nucleoli together with its interacting partner C1QBP, serum stimulation results in rapid secretion of POLGARF C-terminal fragment. Phylogenetic analysis shows that POLGARF evolved ~160 million years ago due to an MIR transposition into the 5’ leader sequence of the mammalian POLG gene which became fixed in placental mammals. The discovery of POLGARF unveils a previously undescribed mechanism of de novo protein-coding gene evolution.Significance StatementIn this study, we describe previously unknown mechanism of de novo protein-coding gene evolution. We show that the POLG gene, which encodes the catalytic subunit of mitochondrial DNA polymerase, is in fact a dual coding gene. Ribosome profiling, phylogenetic conservation, and reporter construct analyses all demonstrate that POLG mRNA possesses a conserved CUG codon which serves as a start of translation for an exceptionally long overlapping open reading frame (260 codons in human) present in all placental mammals. We called the protein encoded in this alternative reading frame POLGARF. We provide evidence that the evolution of POLGARF was incepted upon insertion of an MIR transposable element of the SINE family.


eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Chen Xie ◽  
Cemalettin Bekpen ◽  
Sven Künzel ◽  
Maryam Keshavarz ◽  
Rebecca Krebs-Wheaton ◽  
...  

The de novo emergence of new genes has been well documented through genomic analyses. However, a functional analysis, especially of very young protein-coding genes, is still largely lacking. Here, we identify a set of house mouse-specific protein-coding genes and assess their translation by ribosome profiling and mass spectrometry data. We functionally analyze one of them, Gm13030, which is specifically expressed in females in the oviduct. The interruption of the reading frame affects the transcriptional network in the oviducts at a specific stage of the estrous cycle. This includes the upregulation of Dcpp genes, which are known to stimulate the growth of preimplantation embryos. As a consequence, knockout females have their second litters after shorter times and have a higher infanticide rate. Given that Gm13030 shows no signs of positive selection, our findings support the hypothesis that a de novo evolved gene can directly adopt a function without much sequence adaptation.


2015 ◽  
Author(s):  
Lorenzo Calviello ◽  
Neelanjan Mukherjee ◽  
Emanuel Wyler ◽  
Henrik Zauber ◽  
Antje Hirsekorn ◽  
...  

RNA sequencing protocols allow for quantifying gene expression regulation at each individual step, from transcription to protein synthesis. Ribosome Profiling (Ribo-seq) maps the positions of translating ribosomes over the entire transcriptome. Despite its great potential, a rigorous statistical approach to identify translated regions by means of the characteristic three-nucleotide periodicity of Ribo-seq data is not yet available. To fill this gap, we developed RiboTaper, which quantifies the significance of periodic Ribo-seq reads via spectral analysis methods. We applied RiboTaper on newly generated, deep Ribo-seq data in HEK293 cells, to derive an extensive map of translation that covers Open Reading Frame (ORF) annotations for more than 11,000 protein- coding genes. We also find distinct ribosomal signatures for several hundred detected upstream ORFs and ORFs in annotated non-coding genes (ncORFs). Mass spectrometry data confirms that RiboTaper achieves excellent coverage of the cellular proteome and validates dozens of novel peptide products. Collectively, RiboTaper (available at https://ohlerlab.mdc-berlin.de/software/ ) is a powerful method for comprehensive de novo identification of actively used ORFs in the human genome.


2016 ◽  
Author(s):  
Tomislav Domazet-Lošo ◽  
Anne-Ruxandra Carvunis ◽  
M.Mar Albà ◽  
Martin Sebastijan Šestak ◽  
Robert Bakarić ◽  
...  

AbstractPhylostratigraphy is a computational framework for dating the emergence of sequences (usually genes) in a phylogeny. It has been extensively applied to make inferences on patterns of genome evolution, including patterns of disease gene evolution, ontogeny and de novo gene origination. Phylostratigraphy typically relies on BLAST searches along a species tree, but new simulation studies have raised concerns about the ability of BLAST to detect remote homologues and its impact on phylostratigraphic inferences. These simulations called into question some of our previously published work on patterns of gene emergence and evolution inferred from phylostratigraphy. Here, we re-assessed these simulations and found major problems including unrealistic parameter choices, irreproducibility, statistical flaws and partial representation of results. We found that, even with a possible overall BLAST false negative rate between 5-15%, the large majority (>74%) of sequences assigned to a recent evolutionary origin by phylostratigraphy is unaffected by technical concerns about BLAST. Where the results of the simulations did cast doubt on our previous findings, we repeated our analyses but now excluded all questionable sequences. The originally described patterns remained essentially unchanged. These new analyses strongly support our published inferences, including: genes that emerged after the origin of eukaryotes are more likely to be expressed in the ectoderm than in the endoderm or mesoderm in Drosophila, and the de novo emergence of protein-coding genes from non-genic sequences occurs through proto-gene intermediates in yeast. We conclude that BLAST is an appropriate and sufficiently sensitive tool in phylostratigraphic analysis.


2017 ◽  
Author(s):  
Steven T. Hill ◽  
Rachael Kuintzle ◽  
Amy Teegarden ◽  
Erich Merrill ◽  
Padideh Danaee ◽  
...  

AbstractThe current deluge of newly identified RNA transcripts presents a singular opportunity for improved assessment of coding potential, a cornerstone of genome annotation, and for machine-driven discovery of biological knowledge. While traditional, feature-based methods for RNA classification are limited by current scientific knowledge, deep learning methods can independently discover complex biological rules in the data de novo. We trained a gated recurrent neural network (RNN) on human messenger RNA (mRNA) and long noncoding RNA (lncRNA) sequences. Our model, mRNA RNN (mRNN), surpasses state-of-the-art methods at predicting protein-coding potential. To understand what mRNN learned, we probed the network and uncovered several context-sensitive codons highly predictive of coding potential. Our results suggest that gated RNNs can learn complex and long-range patterns in full-length human transcripts, making them ideal for performing a wide range of difficult classification tasks and, most importantly, for harvesting new biological insights from the rising flood of sequencing data.


2017 ◽  
Author(s):  
Jorge Ruiz-Orera ◽  
José Luis Villanueva-Cañas ◽  
William Blevins ◽  
M.Mar Albà

Recent years have witnessed the discovery of protein–coding genes which appear to have evolved de novo from previously non-coding sequences. This has changed the long-standing view that coding sequences can only evolve from other coding sequences. However, there are still many open questions regarding how new protein-coding sequences can arise from non-genic DNA. Two prerequisites for the birth of a new functional protein-coding gene are that the corresponding DNA fragment is transcribed and that it is also translated. Transcription is known to be pervasive in the genome, producing a large number of transcripts that do not correspond to conserved protein-coding genes, and which are usually annotated as long non-coding RNAs (lncRNA). Recently, sequencing of ribosome protected fragments (Ribo-Seq) has provided evidence that many of these transcripts actually translate small proteins. We have used mouse non-synonymous and synonymous variation data to estimate the strength of purifying selection acting on the translated open reading frames (ORFs). Whereas a subset of the lncRNAs are likely to actually be true protein-coding genes (and thus previously misclassified), the bulk of lncRNAs code for proteins which show variation patterns consistent with neutral evolution. We also show that the ORFs that have a more favorable, coding-like, sequence composition are more likely to be translated than other ORFs in lncRNAs. This study provides strong evidence that there is a large and ever-changing reservoir of lowly abundant proteins; some of these peptides may become useful and act as seeds for de novo gene evolution.


2020 ◽  
Vol 117 (6) ◽  
pp. 3185-3191 ◽  
Author(s):  
Douglas L. Huseby ◽  
Gerrit Brandis ◽  
Lisa Praski Alzrigat ◽  
Diarmaid Hughes

A fundamental feature of life is that ribosomes read the genetic code in messenger RNA (mRNA) as triplets of nucleotides in a single reading frame. Mutations that shift the reading frame generally cause gene inactivation and in essential genes cause loss of viability. Here we report and characterize a +1-nt frameshift mutation, centrally located in rpoB, an essential gene encoding the beta-subunit of RNA polymerase. Mutant Escherichia coli carrying this mutation are viable and highly resistant to rifampicin. Genetic and proteomic experiments reveal a very high rate (5%) of spontaneous frameshift suppression occurring on a heptanucleotide sequence downstream of the mutation. Production of active protein is stimulated to 61–71% of wild-type level by a feedback mechanism increasing translation initiation. The phenomenon described here could have broad significance for predictions of phenotype from genotype. Several frameshift mutations have been reported in rpoB in rifampicin-resistant clinical isolates of Mycobacterium tuberculosis (Mtb). These mutations have never been experimentally validated, and no mechanisms of action have been proposed. This work shows that frameshift mutations in rpoB can be a mutational mechanism generating antibiotic resistance. Our analysis further suggests that genetic elements supporting productive frameshifting could rapidly evolve de novo, even in essential genes.


2017 ◽  
Author(s):  
Jorge Ruiz-Orera ◽  
José Luis Villanueva-Cañas ◽  
William Blevins ◽  
M.Mar Albà

Recent years have witnessed the discovery of protein–coding genes which appear to have evolved de novo from previously non-coding sequences. This has changed the long-standing view that coding sequences can only evolve from other coding sequences. However, there are still many open questions regarding how new protein-coding sequences can arise from non-genic DNA. Two prerequisites for the birth of a new functional protein-coding gene are that the corresponding DNA fragment is transcribed and that it is also translated. Transcription is known to be pervasive in the genome, producing a large number of transcripts that do not correspond to conserved protein-coding genes, and which are usually annotated as long non-coding RNAs (lncRNA). Recently, sequencing of ribosome protected fragments (Ribo-Seq) has provided evidence that many of these transcripts actually translate small proteins. We have used mouse non-synonymous and synonymous variation data to estimate the strength of purifying selection acting on the translated open reading frames (ORFs). Whereas a subset of the lncRNAs are likely to actually be true protein-coding genes (and thus previously misclassified), the bulk of lncRNAs code for proteins which show variation patterns consistent with neutral evolution. We also show that the ORFs that have a more favorable, coding-like, sequence composition are more likely to be translated than other ORFs in lncRNAs. This study provides strong evidence that there is a large and ever-changing reservoir of lowly abundant proteins; some of these peptides may become useful and act as seeds for de novo gene evolution.


2019 ◽  
Vol 37 (4) ◽  
pp. 1148-1164
Author(s):  
Liam Abrahams ◽  
Laurence D Hurst

Abstract Although the constraints on a gene’s sequence are often assumed to reflect the functioning of that gene, here we propose transfer selection, a constraint operating on one class of genes transferred to another, mediated by shared binding factors. We show that such transfer can explain an otherwise paradoxical depletion of stop codons in long intergenic noncoding RNAs (lincRNAs). Serine/arginine-rich proteins direct the splicing machinery by binding exonic splice enhancers (ESEs) in immature mRNA. As coding exons cannot contain stop codons in one reading frame, stop codons should be rare within ESEs. We confirm that the stop codon density (SCD) in ESE motifs is low, even accounting for nucleotide biases. Given that serine/arginine-rich proteins binding ESEs also facilitate lincRNA splicing, a low SCD could transfer to lincRNAs. As predicted, multiexon lincRNA exons are depleted in stop codons, a result not explained by open reading frame (ORF) contamination. Consistent with transfer selection, stop codon depletion in lincRNAs is most acute in exonic regions with the highest ESE density, disappears when ESEs are masked, is consistent with stop codon usage skews in ESEs, and is diminished in both single-exon lincRNAs and introns. Owing to low SCD, the maximum lengths of pseudo-ORFs frequently exceed null expectations. This has implications for ORF annotation and the evolution of de novo protein-coding genes from lincRNAs. We conclude that not all constraints operating on genes need be explained by the functioning of the gene but may instead be transferred owing to shared binding factors.


2015 ◽  
Vol 43 (5) ◽  
pp. 867-873 ◽  
Author(s):  
Erich Bornberg-Bauer ◽  
Jonathan Schmitz ◽  
Magdalena Heberlein

Proteins are the workhorses of the cell and, over billions of years, they have evolved an amazing plethora of extremely diverse and versatile structures with equally diverse functions. Evolutionary emergence of new proteins and transitions between existing ones are believed to be rare or even impossible. However, recent advances in comparative genomics have repeatedly called some 10%–30% of all genes without any detectable similarity to existing proteins. Even after careful scrutiny, some of those orphan genes contain protein coding reading frames with detectable transcription and translation. Thus some proteins seem to have emerged from previously non-coding ‘dark genomic matter’. These ‘de novo’ proteins tend to be disordered, fast evolving, weakly expressed but also rapidly assuming novel and physiologically important functions. Here we review mechanisms by which ‘de novo’ proteins might be created, under which circumstances they may become fixed and why they are elusive. We propose a ‘grow slow and moult’ model in which first a reading frame is extended, coding for an initially disordered and non-globular appendage which, over time, becomes more structured and may also become associated with other proteins.


Sign in / Sign up

Export Citation Format

Share Document