scholarly journals Comparative transcriptomics and ribo-seq: Looking at de novo gene emergence in Saccharomycotina

Author(s):  
William Blevins ◽  
Mar Albà ◽  
Lucas Carey

In de novo gene emergence, a segment of non-coding DNA undergoes a series of changes which enables transcription, potentially leading to a new protein that could eventually acquire a novel function. Due to their recent origins, young de novo genes have no homology with other genes. Furthermore, de novo genes may not initially be under the same selective constraints as other genes. Dozens of de novo genes have recently been identified in many diverse species; however, the mechanisms leading to their appearance are not yet well understood. To study this phenomenon, we have performed deep RNA sequencing (RNA-seq) on 11 species of yeast from the phylum of Ascomycota in both rich media and oxidative stress conditions. Furthermore, we performed ribosome profiling (Ribo-seq) experiments in both conditions with S. cerevisiae. These data have been used to classify the conservation of genes at different depths in the yeast phylogeny. Hundreds of genes in each species were novel (unannotated), and many were identified as putative de novo genes; these candidates were then tested for signals of translation using our Ribo-seq data. We show that putative de novo genes have different properties relative to phylogenetically conserved genes. This comparative phylotranscriptomic analysis advances our understanding of de novo gene origins.

2017 ◽  
Author(s):  
William Blevins ◽  
Mar Albà ◽  
Lucas Carey

In de novo gene emergence, a segment of non-coding DNA undergoes a series of changes which enables transcription, potentially leading to a new protein that could eventually acquire a novel function. Due to their recent origins, young de novo genes have no homology with other genes. Furthermore, de novo genes may not initially be under the same selective constraints as other genes. Dozens of de novo genes have recently been identified in many diverse species; however, the mechanisms leading to their appearance are not yet well understood. To study this phenomenon, we have performed deep RNA sequencing (RNA-seq) on 11 species of yeast from the phylum of Ascomycota in both rich media and oxidative stress conditions. Furthermore, we performed ribosome profiling (Ribo-seq) experiments in both conditions with S. cerevisiae. These data have been used to classify the conservation of genes at different depths in the yeast phylogeny. Hundreds of genes in each species were novel (unannotated), and many were identified as putative de novo genes; these candidates were then tested for signals of translation using our Ribo-seq data. We show that putative de novo genes have different properties relative to phylogenetically conserved genes. This comparative phylotranscriptomic analysis advances our understanding of de novo gene origins.


2018 ◽  
Author(s):  
Claudio Casola

AbstractThe evolution of novel protein-coding genes from noncoding regions of the genome is one of the most compelling evidence for genetic innovations in nature. One popular approach to identify de novo genes is phylostratigraphy, which consists of determining the approximate time of origin (age) of a gene based on its distribution along a species phylogeny. Several studies have revealed significant flaws in determining the age of genes, including de novo genes, using phylostratigraphy alone. However, the rate of false positives in de novo gene surveys, based on phylostratigraphy, remains unknown. Here, I re-analyze the findings from three studies, two of which identified tens to hundreds of rodent-specific de novo genes adopting a phylostratigraphy-centered approach. Most of the putative de novo genes discovered in these investigations are no longer included in recently updated mouse gene sets. Using a combination of synteny information and sequence similarity searches, I show that about 60% of the remaining 381 putative de novo genes share homology with genes from other vertebrates, originated through gene duplication, and/or share no synteny information with non-rodent mammals. These results led to an estimated rate of ∼12 de novo genes per million year in mouse. Contrary to a previous study (Wilson et al. 2017), I found no evidence supporting the preadaptation hypothesis of de novo gene formation. Nearly half of the de novo genes confirmed in this study are within older genes, indicating that co-option of preexisting regulatory regions and a higher GC content may facilitate the origin of novel genes.


Genetics ◽  
2021 ◽  
Author(s):  
Julie M Cridland ◽  
Alex C Majane ◽  
Li Zhao ◽  
David J Begun

Abstract Early work on de novo gene discovery in Drosophila was consistent with the idea that many such genes have male-biased patterns of expression, including a large number expressed in the testis. However, there has been little formal analysis of variation in the abundance and properties of de novo genes expressed in different tissues. Here we investigate the population biology of recently evolved de novo genes expressed in the D. melanogaster accessory gland, a somatic male tissue that plays an important role in male and female fertility and the post mating response of females, using the same collection of inbred lines used previously to identify testis-expressed de novo genes, thus allowing for direct cross tissue comparisons of these genes in two tissues of male reproduction. Using RNA-seq data we identify candidate de novo genes located in annotated intergenic and intronic sequence and determine the properties of these genes including chromosomal location, expression, abundance, and coding capacity. Generally, we find major differences between the tissues in terms of gene abundance and expression, though other properties such as transcript length and chromosomal distribution are more similar. We also explore differences between regulatory mechanisms of de novo genes in the two tissues and how such differences may interact with selection to produce differences in D. melanogaster de novo genes expressed in the two tissues.


2021 ◽  
Author(s):  
Somya Mani ◽  
Tsvi Tlusty

Contrary to long-held views, recent evidence indicates that de novo birth of genes is not only possible but is surprisingly prevalent: a substantial fraction of eukaryotic genomes are composed of orphan genes, which show no homology with any conserved genes. And a remarkably large proportion of orphan genes likely originated denovo from non-genic regions. Here, using a parsimonious mathematical model, we investigate the probability and timescale of de novo gene birth due to spontaneous mutations. We trace how an initially non-genic locus accumulates beneficial mutations to become a gene. We sample across a wide range of biologically feasible distributions of fitness effects (DFE) of mutations, and calculate the conditions conducive to gene birth. We find that in a time frame of millions of years, gene birth is highly likely for a wide range of DFEs. Moreover, when we allow DFEs to fluctuate, which is expected given the long time frame, gene birth in the model becomes practically inevitable. This supports the idea that gene birth is a ubiquitous process, and should occur in a wide variety of organisms. Our results also demonstrate that intergenic regions are not inactive and silent but are more like dynamic storehouses of potential genes.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
William R. Blevins ◽  
Jorge Ruiz-Orera ◽  
Xavier Messeguer ◽  
Bernat Blasco-Moreno ◽  
José Luis Villanueva-Cañas ◽  
...  

AbstractDe novo gene origination has been recently established as an important mechanism for the formation of new genes. In organisms with a large genome, intergenic and intronic regions provide plenty of raw material for new transcriptional events to occur, but little is know about how de novo transcripts originate in more densely-packed genomes. Here, we identify 213 de novo originated transcripts in Saccharomyces cerevisiae using deep transcriptomics and genomic synteny information from multiple yeast species grown in two different conditions. We find that about half of the de novo transcripts are expressed from regions which already harbor other genes in the opposite orientation; these transcripts show similar expression changes in response to stress as their overlapping counterparts, and some appear to translate small proteins. Thus, a large fraction of de novo genes in yeast are likely to co-evolve with already existing genes.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Surajit Bhattacharya ◽  
Hayk Barseghyan ◽  
Emmanuèle C. Délot ◽  
Eric Vilain

Abstract Background Whole genome sequencing is effective at identification of small variants, but because it is based on short reads, assessment of structural variants (SVs) is limited. The advent of Optical Genome Mapping (OGM), which utilizes long fluorescently labeled DNA molecules for de novo genome assembly and SV calling, has allowed for increased sensitivity and specificity in SV detection. However, compared to small variant annotation tools, OGM-based SV annotation software has seen little development, and currently available SV annotation tools do not provide sufficient information for determination of variant pathogenicity. Results We developed an R-based package, nanotatoR, which provides comprehensive annotation as a tool for SV classification. nanotatoR uses both external (DGV; DECIPHER; Bionano Genomics BNDB) and internal (user-defined) databases to estimate SV frequency. Human genome reference GRCh37/38-based BED files are used to annotate SVs with overlapping, upstream, and downstream genes. Overlap percentages and distances for nearest genes are calculated and can be used for filtration. A primary gene list is extracted from public databases based on the patient’s phenotype and used to filter genes overlapping SVs, providing the analyst with an easy way to prioritize variants. If available, expression of overlapping or nearby genes of interest is extracted (e.g. from an RNA-Seq dataset, allowing the user to assess the effects of SVs on the transcriptome). Most quality-control filtration parameters are customizable by the user. The output is given in an Excel file format, subdivided into multiple sheets based on SV type and inheritance pattern (INDELs, inversions, translocations, de novo, etc.). nanotatoR passed all quality and run time criteria of Bioconductor, where it was accepted in the April 2019 release. We evaluated nanotatoR’s annotation capabilities using publicly available reference datasets: the singleton sample NA12878, mapped with two types of enzyme labeling, and the NA24143 trio. nanotatoR was also able to accurately filter the known pathogenic variants in a cohort of patients with Duchenne Muscular Dystrophy for which we had previously demonstrated the diagnostic ability of OGM. Conclusions The extensive annotation enables users to rapidly identify potential pathogenic SVs, a critical step toward use of OGM in the clinical setting.


Sign in / Sign up

Export Citation Format

Share Document