scholarly journals High temporal resolution RNA-seq time course data reveals mammalian lncRNA activation mirrors neighbouring protein-coding genes

2021 ◽  
Author(s):  
Walter Muskovic ◽  
Eva Slavich ◽  
Ben Maslen ◽  
Dominik Kaczorowski ◽  
Joseph Cursons ◽  
...  

Background: The advent of next-generation sequencing revealed extensive transcription beyond protein-coding genes, identifying tens of thousands of long non-coding RNAs (lncRNAs). Selected functional examples raised the possibility that lncRNAs, as a class, may maintain broad regulatory roles. Compellingly, lncRNA expression is strongly linked with adjacent protein-coding gene expression, suggesting a potential cis-regulatory function. Evidence for these regulatory roles may be obtained through careful examination of the precise timing of lncRNA expression relative to adjacent protein-coding genes. Results: Where causal cis-regulatory relationships exist, lncRNA activation is expected to precede changes in adjacent target gene expression. Using an RNA-seq time course of uniquely high temporal resolution, we profiled the expression dynamics of several thousand lncRNAs and protein-coding genes in synchronized, transitioning human cells. Our findings reveal lncRNAs are expressed synchronously with adjacent protein-coding genes. Analysis of lipopolysaccharide-activated mouse dendritic cells revealed the same temporal relationship observed in transitioning human cells. Conclusion: Our findings suggest broad-scale cis-regulatory roles for lncRNAs are not common. The strong association between lncRNAs and adjacent genes may instead indicate an origin as transcriptional by-products from active protein-coding gene promoters and enhancers.

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Mikhail Pomaznoy ◽  
Ashu Sethi ◽  
Jason Greenbaum ◽  
Bjoern Peters

Abstract RNA-seq methods are widely utilized for transcriptomic profiling of biological samples. However, there are known caveats of this technology which can skew the gene expression estimates. Specifically, if the library preparation protocol does not retain RNA strand information then some genes can be erroneously quantitated. Although strand-specific protocols have been established, a significant portion of RNA-seq data is generated in non-strand-specific manner. We used a comprehensive stranded RNA-seq dataset of 15 blood cell types to identify genes for which expression would be erroneously estimated if strand information was not available. We found that about 10% of all genes and 2.5% of protein coding genes have a two-fold or higher difference in estimated expression when strand information of the reads was ignored. We used parameters of read alignments of these genes to construct a machine learning model that can identify which genes in an unstranded dataset might have incorrect expression estimates and which ones do not. We also show that differential expression analysis of genes with biased expression estimates in unstranded read data can be recovered by limiting the reads considered to those which span exonic boundaries. The resulting approach is implemented as a package available at https://github.com/mikpom/uslcount.


2016 ◽  
Author(s):  
Runxuan Zhang ◽  
Cristiane P. G. Calixto ◽  
Yamile Marquez ◽  
Peter Venhuizen ◽  
Nikoleta A. Tzioutziou ◽  
...  

AbstractBackgroundAlternative splicing is the major post-transcriptional mechanism by which gene expression is regulated and affects a wide range of processes and responses in most eukaryotic organisms. RNA-sequencing (RNA-seq) can generate genome-wide quantification of individual transcript isoforms to identify changes in expression and alternative splicing. RNA-seq is an essential modern tool but its ability to accurately quantify transcript isoforms depends on the diversity, completeness and quality of the transcript information.ResultsWe have developed a new Reference Transcript Dataset for Arabidopsis (AtRTD2) for RNA-seq analysis containing over 82k non-redundant transcripts, whereby 74,194 transcripts originate from 27,667 protein-coding genes. A total of 13,524 protein-coding genes have at least one alternatively spliced transcript in AtRTD2 such that about 60% of the 22,453 protein-coding, intron-containing genes in Arabidopsis undergo alternative splicing. More than 600 putative U12 introns were identified in more than 2,000 transcripts. AtRTD2 was generated from transcript assemblies of ca. 8.5 billion pairs of reads from 285 RNA-seq data sets obtained from 129 RNA-seq libraries and merged along with the previous version, AtRTD, and Araport11 transcript assemblies. AtRTD2 increases the diversity of transcripts and through application of stringent filters represents the most extensive and accurate transcript collection for Arabidopsis to date. We have demonstrated a generally good correlation of alternative splicing ratios from RNA-seq data analysed by Salmon and experimental data from high resolution RT-PCR. However, we have observed inaccurate quantification of transcript isoforms for genes with multiple transcripts which have variation in the lengths of their UTRs. This variation is not effectively corrected in RNA-seq analysis programmes and will therefore impact RNA-seq analyses generally. To address this, we have tested different genome-wide modifications of AtRTD2 to improve transcript quantification and alternative splicing analysis. As a result, we release AtRTD2-QUASI specifically for use in Quantification of Alternatively Spliced Isoforms and demonstrate that it out-performs other available transcriptomes for RNA-seq analysis.ConclusionsWe have generated a new transcriptome resource for RNA-seq analyses in Arabidopsis (AtRTD2) designed to address quantification of different isoforms and alternative splicing in gene expression studies. Experimental validation of alternative splicing changes identified inaccuracies in transcript quantification due to UTR length variation. To solve this problem, we also release a modified reference transcriptome, AtRTD2-QUASI for quantification of transcript isoforms, which shows high correlation with experimental data.


2016 ◽  
Vol 2016 ◽  
pp. 1-14 ◽  
Author(s):  
Yan Ren ◽  
Christian I. Hong ◽  
Sookkyung Lim ◽  
Seongho Song

Identification of rhythmic gene expression from metabolic cycles to circadian rhythms is crucial for understanding the gene regulatory networks and functions of these biological processes. Recently, two algorithms, JTK_CYCLE and ARSER, have been developed to estimate periodicity of rhythmic gene expression. JTK_CYCLE performs well for long or less noisy time series, while ARSER performs well for detecting a single rhythmic category. However, observing gene expression at high temporal resolution is not always feasible, and many scientists are interested in exploring both ultradian and circadian rhythmic categories simultaneously. In this paper, a new algorithm, named autoregressive Bayesian spectral regression (ABSR), is proposed. It estimates the period of time-course experimental data and classifies gene expression profiles into multiple rhythmic categories simultaneously. Through the simulation studies, it is shown that ABSR substantially improves the accuracy of periodicity estimation and clustering of rhythmic categories as compared to JTK_CYCLE and ARSER for the data with low temporal resolution. Moreover, ABSR is insensitive to rhythmic patterns. This new scheme is applied to existing time-course mouse liver data to estimate period of rhythms and classify the genes into ultradian, circadian, and arrhythmic categories. It is observed that 49.2% of the circadian profiles detected by JTK_CYCLE with 1-hour resolution are also detected by ABSR with only 4-hour resolution.


2021 ◽  
Author(s):  
D. Kersebaum ◽  
S.‐C. Fabig ◽  
M. Sendel ◽  
A. C. Muntean ◽  
R. Baron ◽  
...  

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Verônica R. de Melo Costa ◽  
Julianus Pfeuffer ◽  
Annita Louloupi ◽  
Ulf A. V. Ørom ◽  
Rosario M. Piro

Abstract Background Introns are generally removed from primary transcripts to form mature RNA molecules in a post-transcriptional process called splicing. An efficient splicing of primary transcripts is an essential step in gene expression and its misregulation is related to numerous human diseases. Thus, to better understand the dynamics of this process and the perturbations that might be caused by aberrant transcript processing it is important to quantify splicing efficiency. Results Here, we introduce SPLICE-q, a fast and user-friendly Python tool for genome-wide SPLICing Efficiency quantification. It supports studies focusing on the implications of splicing efficiency in transcript processing dynamics. SPLICE-q uses aligned reads from strand-specific RNA-seq to quantify splicing efficiency for each intron individually and allows the user to select different levels of restrictiveness concerning the introns’ overlap with other genomic elements such as exons of other genes. We applied SPLICE-q to globally assess the dynamics of intron excision in yeast and human nascent RNA-seq. We also show its application using total RNA-seq from a patient-matched prostate cancer sample. Conclusions Our analyses illustrate that SPLICE-q is suitable to detect a progressive increase of splicing efficiency throughout a time course of nascent RNA-seq and it might be useful when it comes to understanding cancer progression beyond mere gene expression levels. SPLICE-q is available at: https://github.com/vrmelo/SPLICE-q


2017 ◽  
Author(s):  
Cristina Cruz ◽  
Monica Della Rosa ◽  
Christel Krueger ◽  
Qian Gao ◽  
Lucy Field ◽  
...  

AbstractTranscription of protein coding genes is accompanied by recruitment of COMPASS to promoter-proximal chromatin, which deposits di- and tri-methylation on histone H3 lysine 4 (H3K4) to form H3K4me2 and H3K4me3. Here we determine the importance of COMPASS in maintaining gene expression across lifespan in budding yeast. We find that COMPASS mutations dramatically reduce replicative lifespan and cause widespread gene expression defects. Known repressive functions of H3K4me2 are progressively lost with age, while hundreds of genes become dependent on H3K4me3 for full expression. Induction of these H3K4me3 dependent genes is also impacted in young cells lacking COMPASS components including the H3K4me3-specific factor Spp1. Remarkably, the genome-wide occurrence of H3K4me3 is progressively reduced with age despite widespread transcriptional induction, minimising the normal positive correlation between promoter H3K4me3 and gene expression. Our results provide clear evidence that H3K4me3 is required to attain normal expression levels of many genes across organismal lifespan.


2021 ◽  
Author(s):  
Dennis A Sun ◽  
Nipam H Patel

AbstractEmerging research organisms enable the study of biology that cannot be addressed using classical “model” organisms. The development of novel data resources can accelerate research in such animals. Here, we present new functional genomic resources for the amphipod crustacean Parhyale hawaiensis, facilitating the exploration of gene regulatory evolution using this emerging research organism. We use Omni-ATAC-Seq, an improved form of the Assay for Transposase-Accessible Chromatin coupled with next-generation sequencing (ATAC-Seq), to identify accessible chromatin genome-wide across a broad time course of Parhyale embryonic development. This time course encompasses many major morphological events, including segmentation, body regionalization, gut morphogenesis, and limb development. In addition, we use short- and long-read RNA-Seq to generate an improved Parhyale genome annotation, enabling deeper classification of identified regulatory elements. We leverage a variety of bioinformatic tools to discover differential accessibility, predict nucleosome positioning, infer transcription factor binding, cluster peaks based on accessibility dynamics, classify biological functions, and correlate gene expression with accessibility. Using a Minos transposase reporter system, we demonstrate the potential to identify novel regulatory elements using this approach, including distal regulatory elements. This work provides a platform for the identification of novel developmental regulatory elements in Parhyale, and offers a framework for performing such experiments in other emerging research organisms.Primary Findings-Omni-ATAC-Seq identifies cis-regulatory elements genome-wide during crustacean embryogenesis-Combined short- and long-read RNA-Seq improves the Parhyale genome annotation-ImpulseDE2 analysis identifies dynamically regulated candidate regulatory elements-NucleoATAC and HINT-ATAC enable inference of nucleosome occupancy and transcription factor binding-Fuzzy clustering reveals peaks with distinct accessibility and chromatin dynamics-Integration of accessibility and gene expression reveals possible enhancers and repressors-Omni-ATAC can identify known and novel regulatory elements


Sign in / Sign up

Export Citation Format

Share Document