scholarly journals De Novo Genes are “Frozen Accidents” which Escaped Rapid Turnover of Pervasively Transcribed ORFs

2017 ◽  
Author(s):  
Jonathan Schmitz ◽  
Kristian Ullrich ◽  
Erich Bornberg-Bauer

AbstractA recent surge of studies suggested that many novel genes arise de novo from previously non-coding DNA and not by duplication. However, since most studies concentrated on longer evolutionary time scales and rarely considered protein structural properties, it remains unclear how these properties are shaped by evolution, depend on genetic mechanisms and influence gene survival. Here we compare open reading frames (ORFs) from high coverage transcriptomes from mouse and another four mammals covering 160 million years of evolution. We find that novel ORFs pervasively emerge from intergenic and intronic regions but are rapidly lost again while relatively fewer arise from duplications but are retained over much longer times. Surprisingly, disorder and other protein properties of young ORFs do not change with gene age. Only length and nucleotide composition change, probably to avoid aggregation. Thus de novo genes resemble frozen accidents of randomly emerged ORFs which survived initial purging, likely because they are functional.

2014 ◽  
Author(s):  
John Stewart Taylor

In 2009 Knowles and McLysaght reported the discovery of three human genes derived from non-coding DNA. They provided evidence that these genes, CLUU1, C22orf45, and DNAH10OS, were transcribed and translated, they identified orthologous non-coding DNA in chimpanzee (Pan troglodytes) and macaque (Macaca mulatta), and for each gene they located the critical ?enabler? mutations that extended the open reading frames (ORFs) allowing the production of a protein. These genes had no BLASTp hits in any other genome and were considered to be novel human genes, possibly responsible for human-specific traits. Since the discovery of these genes, new high quality Denisovan and Neanderthal genomes have been reported. I used these resources in an effort to determine whether or not CLUU1, C22orf45, and DNAH10OS were truly human-specific.


2015 ◽  
Author(s):  
David E Weinberg ◽  
Premal Shah ◽  
Stephen W Eichhorn ◽  
Jeffrey A Hussmann ◽  
Joshua B Plotkin ◽  
...  

Ribosome-footprint profiling provides genome-wide snapshots of translation, but technical challenges can confound its analysis. Here, we use improved methods to obtain ribosome-footprint profiles and mRNA abundances that more faithfully reflect gene expression in Saccharomyces cerevisiae. Our results support proposals that both the beginning of coding regions and codons matching rare tRNAs are more slowly translated. They also indicate that emergent polypeptides with as few as three basic residues within a 10-residue window tend to slow translation. With the improved mRNA measurements, the variation attributable to translational control in exponentially growing yeast was less than previously reported, and most of this variation could be predicted with a simple model that considered mRNA abundance, upstream open reading frames, cap-proximal structure and nucleotide composition, and lengths of the coding and 5′- untranslated regions. Collectively, our results reveal key features of translational control in yeast and provide a framework for executing and interpreting ribosome- profiling studies.


1999 ◽  
Vol 10 (04) ◽  
pp. 635-643 ◽  
Author(s):  
AGNIESZKA GIERLIK ◽  
PAWEŁ MACKIEWICZ ◽  
MARIA KOWALCZUK ◽  
STANISŁAW CEBRAT ◽  
MIROSŁAW R. DUDEK

Coding sequences of DNA generate Open Reading Frames (ORFs) inside them with much higher frequency than random DNA sequences do, especially in the antisense strand. This is a specific feature of the genetic code. Since coding sequences are selected for their length, the generated ORFs are indirect results of this selection and their length is also influenced by selection. That is why ORFs found in any genome, even much longer ones than those spontaneously generated in random DNA sequences, should be considered as two different sets of ORFs: The first one coding for proteins, the second one generated by the coding ORFs. Even intergenic sequences possess greater capacity for generating ORFs than random DNA sequences of the same nucleotide composition, which seems to be a premise that intergenic sequences were generated from coding sequences by recombinational mechanisms.


2018 ◽  
Author(s):  
Lisa K. Johnson ◽  
Harriet Alexander ◽  
C. Titus Brown

AbstractBackgroundDe novo transcriptome assemblies are required prior to analyzing RNAseq data from a species without an existing reference genome or transcriptome. Despite the prevalence of transcriptomic studies, the effects of using different workflows, or “pipelines”, on the resulting assemblies are poorly understood. Here, a pipeline was programmatically automated and used to assemble and annotate raw transcriptomic short read data collected by the Marine Microbial Eukaryotic Transcriptome Sequencing Project (MMETSP). The resulting transcriptome assemblies were evaluated and compared against assemblies that were previously generated with a different pipeline developed by the National Center for Genome Research (NCGR).ResultsNew transcriptome assemblies contained the majority of previous contigs as well as new content. On average, 7.8% of the annotated contigs in the new assemblies were novel gene names not found in the previous assemblies. Taxonomic trends were observed in the assembly metrics, with assemblies from the Dinoflagellata and Ciliophora phyla showing a higher percentage of open reading frames and number of contigs than transcriptomes from other phyla.ConclusionsGiven current bioinformatics approaches, there is no single ‘best’ reference transcriptome for a particular set of raw data. As the optimum transcriptome is a moving target, improving (or not) with new tools and approaches, automated and programmable pipelines are invaluable for managing the computationally-intensive tasks required for re-processing large sets of samples with revised pipelines and ensuring a common evaluation workflow is applied to all samples. Thus, re-assembling existing data with new tools using automated and programmable pipelines may yield more accurate identification of taxon-specific trends across samples in addition to novel and useful products for the community.Key PointsRe-assembly with new tools can yield new resultsAutomated and programmable pipelines can be used to process arbitrarily many samples.Analyzing many samples using a common pipeline identifies taxon-specific trends.


2019 ◽  
Vol 109 (2) ◽  
pp. 222-224 ◽  
Author(s):  
Margarita Gomila ◽  
Eduardo Moralejo ◽  
Antonio Busquets ◽  
Guillem Segui ◽  
Diego Olmo ◽  
...  

Xylella fastidiosa is a plant-pathogenic bacterium that causes serious diseases in many crops of economic importance and is a quarantine organism in the European Union. This study reports a de novo-assembled draft genome sequence of the first isolates causing Pierce’s disease in Europe: X. fastidiosa subsp. fastidiosa strains XYL1732/17 and XYL2055/17. Both strains were isolated from grapevines (Vitis vinifera) showing Pierce’s disease symptoms at two different locations in Mallorca, Spain. The XYL1732/17 genome is 2,444,109 bp long, with a G+C content of 51.5%; it contains 2,359 open reading frames and 48 tRNA genes. The XYL2055/17 genome is 2,456,780 bp long, with a G+C content of 51.5%; it contains 2,384 open reading frames and 48 tRNA genes.


2020 ◽  
Vol 12 (11) ◽  
pp. 2183-2195
Author(s):  
Daniel Dowling ◽  
Jonathan F Schmitz ◽  
Erich Bornberg-Bauer

Abstract In addition to known genes, much of the human genome is transcribed into RNA. Chance formation of novel open reading frames (ORFs) can lead to the translation of myriad new proteins. Some of these ORFs may yield advantageous adaptive de novo proteins. However, widespread translation of noncoding DNA can also produce hazardous protein molecules, which can misfold and/or form toxic aggregates. The dynamics of how de novo proteins emerge from potentially toxic raw materials and what influences their long-term survival are unknown. Here, using transcriptomic data from human and five other primates, we generate a set of transcribed human ORFs at six conservation levels to investigate which properties influence the early emergence and long-term retention of these expressed ORFs. As these taxa diverged from each other relatively recently, we present a fine scale view of the evolution of novel sequences over recent evolutionary time. We find that novel human-restricted ORFs are preferentially located on GC-rich gene-dense chromosomes, suggesting their retention is linked to pre-existing genes. Sequence properties such as intrinsic structural disorder and aggregation propensity—which have been proposed to play a role in survival of de novo genes—remain unchanged over time. Even very young sequences code for proteins with low aggregation propensities, suggesting that genomic regions with many novel transcribed ORFs are concomitantly less likely to produce ORFs which code for harmful toxic proteins. Our data indicate that the survival of these novel ORFs is largely stochastic rather than shaped by selection.


2006 ◽  
Vol 188 (17) ◽  
pp. 6261-6268 ◽  
Author(s):  
Jonathon P. Audia ◽  
Herbert H. Winkler

ABSTRACT The obligate intracytoplasmic pathogen Rickettsia prowazekii relies on the transport of many essential compounds from the cytoplasm of the eukaryotic host cell in lieu of de novo synthesis, an evolutionary outcome undoubtedly linked to obligatory growth in this metabolite-replete niche. The paradigm for the study of rickettsial transport systems is the ATP/ADP translocase Tlc1, which exchanges bacterial ADP for host cell ATP as a source of energy, rather than as a source of adenylate. Interestingly, the R. prowazekii genome encodes four open reading frames that are highly homologous to the well-characterized ATP/ADP translocase Tlc1. Therefore, by annotation, the R. prowazekii genome encodes a total of five ATP/ADP translocases: Tlc1, Tlc2, Tlc3, Tlc4, and Tlc5. We have confirmed by quantitative reverse transcriptase PCR that mRNAs corresponding to all five tlc homologues are expressed in R. prowazekii growing in L-929 cells and have shown their heterologous protein expression in Escherichia coli, suggesting that none of the tlc genes are pseudogenes in the process of evolutionary meltdown. However, we demonstrate by heterologous expression in E. coli that only Tlc1 functions as an ATP/ADP transporter. A survey of nucleotides and nucleosides has determined that Tlc4 transports CTP, UTP, and GDP. Intriguingly, although GTP was not transported by Tlc4, it was an inhibitor of CTP and UTP uptake and demonstrated a Ki similar to that of GDP. In addition, we demonstrate that Tlc5 transports GTP and GDP. We postulate that Tlc4 and Tlc5 serve the primary function of maintaining intracellular pools of nucleotides for rickettsial nucleic acid biosynthesis and do not provide the cell with nucleoside triphosphates as an energy source, as is the case for Tlc1. Although heterologous expression of Tlc2 and Tlc3 was observed in E. coli, we were unable to identify substrates for these proteins.


2015 ◽  
Author(s):  
Lorenzo Calviello ◽  
Neelanjan Mukherjee ◽  
Emanuel Wyler ◽  
Henrik Zauber ◽  
Antje Hirsekorn ◽  
...  

RNA sequencing protocols allow for quantifying gene expression regulation at each individual step, from transcription to protein synthesis. Ribosome Profiling (Ribo-seq) maps the positions of translating ribosomes over the entire transcriptome. Despite its great potential, a rigorous statistical approach to identify translated regions by means of the characteristic three-nucleotide periodicity of Ribo-seq data is not yet available. To fill this gap, we developed RiboTaper, which quantifies the significance of periodic Ribo-seq reads via spectral analysis methods. We applied RiboTaper on newly generated, deep Ribo-seq data in HEK293 cells, to derive an extensive map of translation that covers Open Reading Frame (ORF) annotations for more than 11,000 protein- coding genes. We also find distinct ribosomal signatures for several hundred detected upstream ORFs and ORFs in annotated non-coding genes (ncORFs). Mass spectrometry data confirms that RiboTaper achieves excellent coverage of the cellular proteome and validates dozens of novel peptide products. Collectively, RiboTaper (available at https://ohlerlab.mdc-berlin.de/software/ ) is a powerful method for comprehensive de novo identification of actively used ORFs in the human genome.


2018 ◽  
Author(s):  
Matthew A. Conte ◽  
Rajesh Joshi ◽  
Emily C. Moore ◽  
Sri Pratima Nandamuri ◽  
William J. Gammerdinger ◽  
...  

AbstractBackgroundAfrican cichlid fishes are well known for their rapid radiations and are a model system for studying evolutionary processes. Here we compare multiple, high-quality, chromosome-scale genome assemblies to understand the genetic mechanisms underlying cichlid diversification and study how genome structure evolves in rapidly radiating lineages.ResultsWe re-anchored our recent assembly of the Nile tilapia (Oreochromis niloticus) genome using a new high-density genetic map. We developed a new de novo genome assembly of the Lake Malawi cichlid, Metriaclima zebra, using high-coverage PacBio sequencing, and anchored contigs to linkage groups (LGs) using four different genetic maps. These new anchored assemblies allow the first chromosome-scale comparisons of African cichlid genomes.Large intra-chromosomal structural differences (~2-28Mbp) among species are common, while inter-chromosomal differences are rare (< 10Mbp total). Placement of the centromeres within chromosome-scale assemblies identifies large structural differences that explain many of the karyotype differences among species. Structural differences are also associated with unique patterns of recombination on sex chromosomes. Structural differences on LG9, LG11 and LG20 are associated with reductions in recombination, indicative of inversions between the rock- and sand-dwelling clades of Lake Malawi cichlids. M. zebra has a larger number of recent transposable element (TE) insertions compared to O. niloticus, suggesting that several TE families have a higher rate of insertion in the haplochromine cichlid lineage.ConclusionThis study identifies novel structural variation among East African cichlid genomes and provides a new set of genomic resources to support research on the mechanisms driving cichlid adaptation and speciation.


Sign in / Sign up

Export Citation Format

Share Document