A de novo evolved gene in the house mouse regulates female pregnancy cycles

The de novo emergence of new genes has been well documented through genomic analyses. However, a functional analysis, especially of very young protein-coding genes, is still largely lacking. Here, we identify a set of house mouse-specific protein-coding genes and assess their translation by ribosome profiling and mass spectrometry data. We functionally analyze one of them, Gm13030, which is specifically expressed in females in the oviduct. The interruption of the reading frame affects the transcriptional network in the oviducts at a specific stage of the estrous cycle. This includes the upregulation of Dcpp genes, which are known to stimulate the growth of preimplantation embryos. As a consequence, knockout females have their second litters after shorter times and have a higher infanticide rate. Given that Gm13030 shows no signs of positive selection, our findings support the hypothesis that a de novo evolved gene can directly adopt a function without much sequence adaptation.

Download Full-text

A spectral analysis approach to detect actively translated open reading frames in high-resolution ribosome profiling data

10.1101/031625 ◽

2015 ◽

Author(s):

Lorenzo Calviello ◽

Neelanjan Mukherjee ◽

Emanuel Wyler ◽

Henrik Zauber ◽

Antje Hirsekorn ◽

...

Keyword(s):

Spectral Analysis ◽

Gene Expression Regulation ◽

De Novo ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Mass Spectrometry Data ◽

Hek293 Cells ◽

Protein Coding ◽

Reading Frame ◽

Reading Frames

RNA sequencing protocols allow for quantifying gene expression regulation at each individual step, from transcription to protein synthesis. Ribosome Profiling (Ribo-seq) maps the positions of translating ribosomes over the entire transcriptome. Despite its great potential, a rigorous statistical approach to identify translated regions by means of the characteristic three-nucleotide periodicity of Ribo-seq data is not yet available. To fill this gap, we developed RiboTaper, which quantifies the significance of periodic Ribo-seq reads via spectral analysis methods. We applied RiboTaper on newly generated, deep Ribo-seq data in HEK293 cells, to derive an extensive map of translation that covers Open Reading Frame (ORF) annotations for more than 11,000 protein- coding genes. We also find distinct ribosomal signatures for several hundred detected upstream ORFs and ORFs in annotated non-coding genes (ncORFs). Mass spectrometry data confirms that RiboTaper achieves excellent coverage of the cellular proteome and validates dozens of novel peptide products. Collectively, RiboTaper (available at https://ohlerlab.mdc-berlin.de/software/ ) is a powerful method for comprehensive de novo identification of actively used ORFs in the human genome.

Download Full-text

Faculty Opinions recommendation of New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.725762623.793527098 ◽

2017 ◽

Author(s):

Erich Bornberg-Bauer

Keyword(s):

De Novo ◽

Protein Coding ◽

Coding Sequence ◽

Protein Coding Genes ◽

Evolutionary Innovation ◽

New Genes

Download Full-text

Unprecedentedly efficient CUG initiation of an overlapping reading frame in POLG mRNA yields novel protein POLGARF

10.1101/2020.03.06.980391 ◽

2020 ◽

Author(s):

G Loughran ◽

AV Zhdanov ◽

MS Mikhaylova ◽

FN Rozov ◽

PN Datskevich ◽

...

Keyword(s):

Dna Polymerase ◽

De Novo ◽

Gene Evolution ◽

Ribosome Profiling ◽

Dual Coding ◽

Protein Coding ◽

Reading Frame ◽

Serum Stimulation ◽

Functional Investigation ◽

Mitochondrial Dna Polymerase

AbstractWhile near cognate codons are frequently used for translation initiation in eukaryotes, their efficiencies are usually low (<10% compared to an AUG in optimal context). Here we describe a rare case of highly efficient near cognate initiation. A CUG triplet located in the 5’ leader of POLG mRNA initiates almost as efficiently (~60-70%) as an AUG in optimal context. This CUG directs translation of a conserved 260 triplet-long overlapping ORF, which we call POLGARF (POLGAlternative Reading Frame). Translation of a short upstream ORF 5’ of this CUG governs the ratio between DNA polymerase and POLGARF produced from a single POLG mRNA. Functional investigation of POLGARF points to extracellular signalling. While unprocessed POLGARF resides in the nucleoli together with its interacting partner C1QBP, serum stimulation results in rapid secretion of POLGARF C-terminal fragment. Phylogenetic analysis shows that POLGARF evolved ~160 million years ago due to an MIR transposition into the 5’ leader sequence of the mammalian POLG gene which became fixed in placental mammals. The discovery of POLGARF unveils a previously undescribed mechanism of de novo protein-coding gene evolution.Significance StatementIn this study, we describe previously unknown mechanism of de novo protein-coding gene evolution. We show that the POLG gene, which encodes the catalytic subunit of mitochondrial DNA polymerase, is in fact a dual coding gene. Ribosome profiling, phylogenetic conservation, and reporter construct analyses all demonstrate that POLG mRNA possesses a conserved CUG codon which serves as a start of translation for an exceptionally long overlapping open reading frame (260 codons in human) present in all placental mammals. We called the protein encoded in this alternative reading frame POLGARF. We provide evidence that the evolution of POLGARF was incepted upon insertion of an MIR transposable element of the SINE family.

Download Full-text

A Depletion of Stop Codons in lincRNA is Owing to Transfer of Selective Constraint from Coding Sequences

Molecular Biology and Evolution ◽

10.1093/molbev/msz299 ◽

2019 ◽

Vol 37 (4) ◽

pp. 1148-1164

Author(s):

Liam Abrahams ◽

Laurence D Hurst

Keyword(s):

De Novo ◽

Stop Codon ◽

Noncoding Rnas ◽

Selective Constraint ◽

Protein Coding ◽

Coding Sequences ◽

Reading Frame ◽

Stop Codons ◽

Protein Coding Genes ◽

Exonic Splice

Abstract Although the constraints on a gene’s sequence are often assumed to reflect the functioning of that gene, here we propose transfer selection, a constraint operating on one class of genes transferred to another, mediated by shared binding factors. We show that such transfer can explain an otherwise paradoxical depletion of stop codons in long intergenic noncoding RNAs (lincRNAs). Serine/arginine-rich proteins direct the splicing machinery by binding exonic splice enhancers (ESEs) in immature mRNA. As coding exons cannot contain stop codons in one reading frame, stop codons should be rare within ESEs. We confirm that the stop codon density (SCD) in ESE motifs is low, even accounting for nucleotide biases. Given that serine/arginine-rich proteins binding ESEs also facilitate lincRNA splicing, a low SCD could transfer to lincRNAs. As predicted, multiexon lincRNA exons are depleted in stop codons, a result not explained by open reading frame (ORF) contamination. Consistent with transfer selection, stop codon depletion in lincRNAs is most acute in exonic regions with the highest ESE density, disappears when ESEs are masked, is consistent with stop codon usage skews in ESEs, and is diminished in both single-exon lincRNAs and introns. Owing to low SCD, the maximum lengths of pseudo-ORFs frequently exceed null expectations. This has implications for ORF annotation and the evolution of de novo protein-coding genes from lincRNAs. We conclude that not all constraints operating on genes need be explained by the functioning of the gene but may instead be transferred owing to shared binding factors.

Download Full-text

Studying the dawn of de novo gene emergence in mice reveals fast integration of new genes into functional networks

10.1101/510214 ◽

2019 ◽

Cited By ~ 3

Author(s):

Chen Xie ◽

Cemalettin Bekpen ◽

Sven Künzel ◽

Maryam Keshavarz ◽

Rebecca Krebs-Wheaton ◽

...

Keyword(s):

De Novo ◽

Expression Patterns ◽

Transcriptional Networks ◽

Protein Coding ◽

Protein Coding Genes ◽

New Genes ◽

De Novo Gene ◽

Intergenic Sequences ◽

Genomic Analyses ◽

New Protein

AbstractThe de novo emergence of new transcripts has been well documented through genomic analyses. However, a functional analysis, especially of very young protein-coding genes, is still largely lacking. Here we focus on three loci that have evolved from previously intergenic sequences in the house mouse (Mus musculus) and are not present in its closest relatives. We have obtained knockouts and analyzed their phenotypes, including a deep transcriptomic analysis, based on a dedicated power analysis. We show that the transcriptional networks are significantly disturbed in the knockouts and that all three genes have effects on phenotypes that are related to their expression patterns. This includes behavioral effects, skeletal differences and the regulation of the reproduction cycle in females. Substitution analysis suggests that all three genes have directly obtained an activity, without new adaptive substitutions. Our findings support the hypothesis that de novo genes can quickly adopt functions without extensive adaptation.Impact statementNew protein-coding genes emerging out of non-coding sequences can become directly functional without signatures of adaptive protein changes

Download Full-text

De novo annotation and characterization of the translatome with ribosome profiling data

10.1101/137216 ◽

2017 ◽

Author(s):

Zhengtao Xiao ◽

Rongyao Huang ◽

Yuling Chen ◽

Haiteng Deng ◽

Xuerui Yang

Keyword(s):

De Novo ◽

Simulated Data ◽

Ribosome Profiling ◽

Mass Spectrometry Data ◽

Protein Coding ◽

Coding Regions ◽

Genome Wide ◽

Rna Fragments ◽

Cell Type Specific ◽

Stress Signals

AbstractBy capturing and sequencing the RNA fragments protected by translating ribosomes, ribosome profiling sketches the landscape of translation at subcodon resolution. We developed a new method, RiboCode, which uses ribosome profiling data to assess the translation of each RNA transcript genome-wide. As shown by multiple tests with simulated data and cell type-specific QTI-seq and mass spectrometry data, RiboCode exhibits superior efficiency, sensitivity, and accuracy for de novo annotation of the translatome, which covers various types of novel ORFs in the previously annotated coding and non-coding regions and overlapping ORFs. Finally, to showcase its application, we applied RiboCode on a published ribosome profiling dataset and assembled the context-dependent translatomes of yeast under normal condition, heat shock, and oxidative stress. Comparisons among these translatomes revealed stress-activated novel upstream and downstream ORFs, some of which are associated with potential translational dysregulations of the main protein coding ORFs in response to the stress signals.

Download Full-text

New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2014.0332 ◽

2015 ◽

Vol 370 (1678) ◽

pp. 20140332 ◽

Cited By ~ 72

Author(s):

Aoife McLysaght ◽

Daniele Guerzoni

Keyword(s):

De Novo ◽

Complex Structure ◽

Protein Coding ◽

Functional Roles ◽

Protein Coding Genes ◽

Evolutionary Innovation ◽

New Genes ◽

De Novo Genes ◽

Novel Protein

The origin of novel protein-coding genes de novo was once considered so improbable as to be impossible. In less than a decade, and especially in the last five years, this view has been overturned by extensive evidence from diverse eukaryotic lineages. There is now evidence that this mechanism has contributed a significant number of genes to genomes of organisms as diverse as Saccharomyces , Drosophila , Plasmodium , Arabidopisis and human. From simple beginnings, these genes have in some instances acquired complex structure, regulated expression and important functional roles. New genes are often thought of as dispensable late additions; however, some recent de novo genes in human can play a role in disease. Rather than an extremely rare occurrence, it is now evident that there is a relatively constant trickle of proto-genes released into the testing ground of natural selection. It is currently unknown whether de novo genes arise primarily through an ‘RNA-first’ or ‘ORF-first’ pathway. Either way, evolutionary tinkering with this pool of genetic potential may have been a significant player in the origins of lineage-specific traits and adaptations.

Download Full-text

From de novo to ‘de nono’: The majority of novel protein coding genes identified with phylostratigraphy are old genes or recent duplicates

Genome Biology and Evolution ◽

10.1093/gbe/evy231 ◽

2018 ◽

Cited By ~ 2

Author(s):

Claudio Casola

Keyword(s):

De Novo ◽

Protein Coding ◽

Protein Coding Genes ◽

Novel Protein

Download Full-text

Chromosome-level assembly of Drosophila bifasciata reveals important karyotypic transition of the X chromosome

10.1101/847558 ◽

2019 ◽

Author(s):

Ryan Bracewell ◽

Anita Tran ◽

Kamalakar Chatla ◽

Doris Bachtrog

Keyword(s):

X Chromosome ◽

Genome Assembly ◽

De Novo ◽

Pericentromeric Region ◽

Species Group ◽

Chromosome 15 ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Read ◽

Chromosome Level

ABSTRACTThe Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromere, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.

Download Full-text

Draft genome assembly data of Anoxybacillus sp. strain MB8 isolated from Tattapani hot springs, India

10.1101/2021.06.09.447659 ◽

2021 ◽

Author(s):

VISHNU PRASOODANAN P K ◽

Shruti S. Menon ◽

Rituja Saxena ◽

Prashant Waiker ◽

Vineet K Sharma

Keyword(s):

Hot Springs ◽

De Novo ◽

Draft Genome ◽

Gc Content ◽

Central India ◽

Glycoside Hydrolases ◽

Rrna Gene ◽

Aerobic Bacterium ◽

Protein Coding ◽

Protein Coding Genes

Discovery of novel thermophiles has shown promising applications in the field of biotechnology. Due to their thermal stability, they can survive the harsh processes in the industries, which make them important to be characterized and studied. Members of Anoxybacillus are alkaline tolerant thermophiles and have been extensively isolated from manure, dairy-processed plants, and geothermal hot springs. This article reports the assembled data of an aerobic bacterium Anoxybacillus sp. strain MB8, isolated from the Tattapani hot springs in Central India, where the 16S rRNA gene shares an identity of 97% (99% coverage) with Anoxybacillus kamchatkensis strain G10. The de novo assembly and annotation performed on the genome of Anoxybacillus sp. strain MB8 comprises of 2,898,780 bp (in 190 contigs) with a GC content of 41.8% and includes 2,976 protein-coding genes,1 rRNA operon, 73 tRNAs, 1 tm-RNA and 10 CRISPR arrays. The predicted protein-coding genes have been classified into 21 eggNOG categories. The KEGG Automated Annotation Server (KAAS) analysis indicated the presence of assimilatory sulfate reduction pathway, nitrate reducing pathway, and genes for glycoside hydrolases (GHs) and glycoside transferase (GTs). GHs and GTs hold widespread applications, in the baking and food industry for bread manufacturing, and in the paper, detergent and cosmetic industry. Hence, Anoxybacillus sp. strain MB8 holds the potential to be screened and characterized for such commercially relevant enzymes.

Download Full-text