scholarly journals A Unified Probabilistic Modeling Framework for Eukaryotic Transcription Based on Nascent RNA Sequencing Data

2021 ◽  
Author(s):  
Adam Siepel

AbstractNascent RNA sequencing protocols, such as GRO-seq and PRO-seq, are now widely used in the study of eukaryotic transcription, and these experimental techniques have given rise to a variety of statistical and machine-learning methods for data analysis. These computational methods, however, are generally designed to address specialized signal-processing or prediction tasks, rather than directly describing the dynamics of RNA polymerases as they move along the DNA template. Here, I introduce a general probabilistic model that describes the kinetics of transcription initiation, elongation, pause release, and termination, as well as the generation of sequencing read counts. I show that this generative model enables estimation of separate rates of initiation, pause-release, and termination, up to a proportionality constant. Furthermore, if applied to time-course data in a nonequilibrium setting, the model can be used to estimate elongation rates. This model additionally leads naturally to likelihood ratio tests for differences between genes, conditions, or species in various rates of interest. A version of the model in which read counts are assumed to be Poisson-distributed leads to convenient, closed-form solutions for parameter estimates and likelihood ratio tests. I present extensions to Bayesian inference and to a generalized linear model that can be used to discover genomic features associated with rates of elongation. Finally, I address technicalities concerning estimation of library size, normalization and sequencing replicates. Altogether, this modeling framework enables a unified treatment of many common tasks in the analysis of nascent RNA sequencing data.

2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Katla Kristjánsdóttir ◽  
Alexis Dziubek ◽  
Hyun Min Kang ◽  
Hojoong Kwak

AbstractEnhancer RNAs (eRNA) are unstable non-coding RNAs, transcribed bidirectionally from active regulatory sequences, whose expression levels correlate with enhancer activity. We use capped-nascent-RNA sequencing to efficiently capture bidirectional transcription initiation across several human lymphoblastoid cell lines (Yoruba population) and detect ~75,000 eRNA transcription sites with high sensitivity and specificity. The use of nascent-RNA sequencing sidesteps the confounding effect of eRNA instability. We identify quantitative trait loci (QTLs) associated with the level and directionality of eRNA expression. High-resolution analyses of these two types of QTLs reveal distinct positions of enrichment at the central transcription factor (TF) binding regions and at the flanking eRNA initiation regions, both of which are associated with mRNA expression QTLs. These two regions—the central TF-binding footprint and the eRNA initiation cores—define a bipartite architecture of enhancers, inform enhancer function, and can be used as an indicator of the significance of non-coding regulatory variants.


2021 ◽  
Author(s):  
Yixin Zhao ◽  
Noah Dukler ◽  
Gilad Barshad ◽  
Shushan Toneyan ◽  
Charles G. Danko ◽  
...  

AbstractQuantification of mature-RNA isoform abundance from RNA-seq data has been extensively studied, but much less attention has been devoted to quantifying the abundance of distinct precursor RNAs based on nascent RNA sequencing data. Here we address this problem with a new computational method called Deconvolution of Expression for Nascent RNA sequencing data (DENR). DENR models the nascent RNA read counts at each locus as a mixture of user-provided isoforms. The performance of the baseline algorithm is enhanced by the use of machine-learning predictions of transcription start sites (TSSs) and an adjustment for the typical “shape profile” of read counts along a transcription unit. We show using simulated data that DENR clearly outperforms simple read-count-based methods for estimating the abundances of both whole genes and isoforms. By applying DENR to previously published PRO-seq data from K562 and CD4+ T cells, we find that transcription of multiple isoforms per gene is widespread, and the dominant isoform frequently makes use of an internal TSS. We also identify > 200 genes whose dominant isoforms make use of different TSSs in these two cell types. Finally, we apply DENR and StringTie to newly generated PRO-seq and RNA-seq data, respectively, for human CD4+ T cells and CD14+ monocytes, and show that entropy at the pre-RNA level makes a disproportionate contribution to overall isoform diversity, especially across cell types. Altogether, DENR is the first computational tool to enable abundance quantification of pre-RNA isoforms based on nascent RNA sequencing data, and it reveals high levels of pre-RNA isoform diversity in human cells.


2022 ◽  
Vol 3 (1) ◽  
pp. 101036
Author(s):  
Adelina Rabenius ◽  
Sajitha Chandrakumaran ◽  
Lea Sistonen ◽  
Anniina Vihervaara

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Weipeng Mo ◽  
Bo Liu ◽  
Hong Zhang ◽  
Xianhao Jin ◽  
Dongdong Lu ◽  
...  

Abstract Background The dynamic process of transcription termination produces transient RNA intermediates that are difficult to distinguish from each other via short-read sequencing methods. Results Here, we use single-molecule nascent RNA sequencing to characterize the various forms of transient RNAs during termination at genome-wide scale in wildtype Arabidopsis and in atxrn3, fpa, and met1 mutants. Our data reveal a wide range of termination windows among genes, ranging from ~ 50 nt to over 1000 nt. We also observe efficient termination before downstream tRNA genes, suggesting that chromatin structure around the promoter region of tRNA genes may block pol II elongation. 5′ Cleaved readthrough transcription in atxrn3 with delayed termination can run into downstream genes to produce normally spliced and polyadenylated mRNAs in the absence of their own transcription initiation. Consistent with previous reports, we also observe long chimeric transcripts with cryptic splicing in fpa mutant; but loss of CG DNA methylation has no obvious impact on termination in the met1 mutant. Conclusions Our method is applicable to establish a comprehensive termination landscape in a broad range of species.


1992 ◽  
Vol 43 (5) ◽  
pp. 1229 ◽  
Author(s):  
K.R. Rowling ◽  
DD Reid

Estimates of von Bertalanffy growth parameters were made for mature (>3-year-old) gemfish (Rexea solandri), using otoliths sampled biennially from commercial catches between 1980 and 1986. During this period, there was a significant decline in the mean length of mature gemfish in the catch. Large variations in growth-parameter estimates were found over the period sampled (e.g. L∞ ranged between 87.3 and 130.3 cm for males and between 113.1 and 134.7 cm for females). Likelihood-ratio tests showed many of the growth-parameter estimates to be significantly different between the years sampled. Inclusion in the analyses of data for juvenile (1- to 3-year-old) fish considerably reduced both the standard errors of the parameter estimates and the significance of the variations between them. Comparison of parameters estimated by following the growth of individual cohorts spawned in 1975 and 1981 suggested that the variations in the parameters for the 1980-86 data were caused by differences in the length composition of the samples rather than by changes in the growth rate of the fish. The results suggest the need for care when comparing growth parameters estimated for different populations, especially when there are large differences in length composition. The best estimates of growth parameters for gemfish were considered to be those derived from the aggregate data for the whole of the period sampled, including the data for juveniles. For male gemfish these best estimates (with asymptotic standard errors in parentheses) were L∞ =97.5 (0.8) cm, K=0.212 (0.005) year-1 and to = - 0.54 (0.05) years, and for females the estimates were L∞ = 109.4 (0.6) cm, K= 0.180 (0.003) year-1 and to= -0.63 (0.04) years. Likelihood-ratio tests showed these estimates of L∞ and K to be significantly different between the sexes.


2020 ◽  
Author(s):  
Mo Huang ◽  
Zhaojun Zhang ◽  
Nancy R. Zhang

AbstractConfounding variation, such as batch effects, are a pervasive issue in single-cell RNA sequencing experiments. While methods exist for aligning cells across batches, it is yet unclear how to correct for other types of confounding variation which may be observed at the subject level, such as age and sex, and at the cell level, such as library size and other measures of cell quality. On the specific problem of batch alignment, many questions still persist despite recent advances: Existing methods can effectively align batches in low-dimensional representations of cells, yet their effectiveness in aligning the original gene expression matrices is unclear. Nor is it clear how batch correction can be performed alongside data denoising, the former treating technical biases due to experimental stratification while the latter treating technical variation due inherently to the random sampling that occurs during library construction and sequencing. Here, we propose SAVERCAT, a method for dimension reduction and denoising of single-cell gene expression data that can flexibly adjust for arbitrary observed covariates. We benchmark SAVERCAT against existing single-cell batch correction methods and show that while it matches the best of the field in low-dimensional cell alignment, it significantly improves upon existing methods on the task of batch correction in the high-dimensional expression matrix. We also demonstrate the ability of SAVERCAT to effectively integrate batch correction and denoising through a data down-sampling experiment. Finally, we apply SAVERCAT to a single cell study of Alzheimer’s disease where batch is confounded with the contrast of interest, and demonstrate how adjusting for covariates other than batch allows for more interpretable analysis.


2018 ◽  
Author(s):  
Athma A. Pai ◽  
Joseph Paggi ◽  
Karen Adelman ◽  
Christopher B. Burge

AbstractRecursive splicing, a process by which a single intron is removed from pre-mRNA transcripts in multiple distinct segments, has been observed in a small subset of Drosophila melanogaster introns. However, detection of recursive splicing requires observation of splicing intermediates which are inherently unstable, making it difficult to study. Here we developed new computational approaches to identify recursively spliced introns and applied them, in combination with existing methods, to nascent RNA sequencing data from Drosophila S2 cells. These approaches identified hundreds of novel sites of recursive splicing, expanding the catalog of recursively spliced fly introns by 4-fold. Recursive sites occur in most very long (> 40 kb) fly introns, including many genes involved in morphogenesis and development, and tend to occur near the midpoints of introns. Suggesting a possible function for recursive splicing, we observe that fly introns with recursive sites are spliced more accurately than comparably sized non-recursive introns.


2021 ◽  
Author(s):  
Adelina Rabenius ◽  
Sajitha Chandrakumaran ◽  
Lea Sistonen ◽  
Anniina Vihervaara

Nascent RNA-sequencing tracks transcription at nucleotide resolution. The genomic distribution of engaged transcription complexes, in turn, uncovers functional genomic regions. Here, we provide data-analytical steps to 1) identify transcribed regulatory elements de novo genome-wide, 2) quantify engaged transcription complexes at enhancers, promoter-proximal regions, divergent transcripts, gene bodies and termination windows, and 3) measure distribution of transcription machineries and regulatory proteins across functional genomic regions. This protocol follows RNA synthesis and genome-regulation in mammals, as demonstrated in human K562 erythroleukemia cells.


Sign in / Sign up

Export Citation Format

Share Document