Quantifying RNA Synthesis at Rate-Limiting Steps of Transcription Using Nascent RNA-Sequencing Data

Nascent RNA-sequencing tracks transcription at nucleotide resolution. The genomic distribution of engaged transcription complexes, in turn, uncovers functional genomic regions. Here, we provide data-analytical steps to 1) identify transcribed regulatory elements de novo genome-wide, 2) quantify engaged transcription complexes at enhancers, promoter-proximal regions, divergent transcripts, gene bodies and termination windows, and 3) measure distribution of transcription machineries and regulatory proteins across functional genomic regions. This protocol follows RNA synthesis and genome-regulation in mammals, as demonstrated in human K562 erythroleukemia cells.

Download Full-text

Quantifying RNA synthesis at rate-limiting steps of transcription using nascent RNA-sequencing data

STAR Protocols ◽

10.1016/j.xpro.2021.101036 ◽

2022 ◽

Vol 3 (1) ◽

pp. 101036

Author(s):

Adelina Rabenius ◽

Sajitha Chandrakumaran ◽

Lea Sistonen ◽

Anniina Vihervaara

Keyword(s):

Rna Sequencing ◽

Rna Synthesis ◽

Sequencing Data ◽

Nascent Rna ◽

Rate Limiting

Download Full-text

csaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows

Nucleic Acids Research ◽

10.1093/nar/gkv1191 ◽

2015 ◽

Vol 44 (5) ◽

pp. e45-e45 ◽

Cited By ~ 120

Author(s):

Aaron T.L. Lun ◽

Gordon K. Smyth

Keyword(s):

De Novo ◽

Massively Parallel Sequencing ◽

Real Data ◽

Sequencing Data ◽

Scientific Application ◽

Sliding Windows ◽

Treatment Conditions ◽

Bioconductor Project ◽

Differential Binding ◽

Genomic Regions

Abstract Chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) is widely used to identify binding sites for a target protein in the genome. An important scientific application is to identify changes in protein binding between different treatment conditions, i.e. to detect differential binding. This can reveal potential mechanisms through which changes in binding may contribute to the treatment effect. The csaw package provides a framework for the de novo detection of differentially bound genomic regions. It uses a window-based strategy to summarize read counts across the genome. It exploits existing statistical software to test for significant differences in each window. Finally, it clusters windows into regions for output and controls the false discovery rate properly over all detected regions. The csaw package can handle arbitrarily complex experimental designs involving biological replicates. It can be applied to both transcription factor and histone mark datasets, and, more generally, to any type of sequencing data measuring genomic coverage. csaw performs favorably against existing methods for de novo DB analyses on both simulated and real data. csaw is implemented as a R software package and is freely available from the open-source Bioconductor project.

Download Full-text

The rate and spectrum of mosaic mutations during embryogenesis revealed by RNA sequencing of 49 tissues

10.1101/687822 ◽

2019 ◽

Cited By ~ 1

Author(s):

Francesc Muyas ◽

Luis Zapata ◽

Roderic Guigó ◽

Stephan Ossowski

Keyword(s):

Rna Sequencing ◽

De Novo ◽

Genetic Disorders ◽

Adult Life ◽

Diagnostic Procedures ◽

Cancer Predisposition ◽

Sequencing Data ◽

Individual Study ◽

Similar Frequency ◽

Mutational Spectrum

AbstractBackgroundMosaic mutations acquired during early embryogenesis can lead to severe early-onset genetic disorders and cancer predisposition, but are often undetectable in blood samples. The rate and mutational spectrum of embryonic mosaic mutations (EMMs) have only been studied in few tissues and their contribution to genetic disorders is unknown. Therefore, we investigated how frequent mosaic mutations occur during embryogenesis across all germ layers and tissues.ResultsUsing RNA sequencing data from the Genotype-Tissue Expression (GTEx) cohort comprising 49 normal tissues and 570 individuals, we found that new-borns on average harbour 0.5 - 1 EMMs in the exome affecting multiple organs (1.3230 × 10−8 per nucleotide per individual), a similar frequency as reported for germline de novo mutations. Our multi-tissue, multi-individual study design allowed us to distinguish mosaic mutations acquired during different stages of embryogenesis and adult life, as well as to provide insights into the rate and spectrum of mosaic mutations. We observed that EMMs are dominated by a mutational signature associated with spontaneous deamination of methylated cytosines and the number of cell divisions. After birth, cells continue to accumulate somatic mutations, which can lead to the development of cancer. Investigation of the mutational spectrum of the gastrointestinal tract revealed a mutational pattern associated with the food-borne carcinogen aflatoxin, a signature that has so far only been reported in liver cancer.ConclusionIn summary, our multi-tissue, multi-individual study reveals a surprisingly high number of embryonic mosaic mutations in coding regions, implying novel hypotheses and diagnostic procedures for investigating genetic causes of disease and cancer predisposition.

Download Full-text

Repurposing RNA sequencing for discovery of RNA modifications in clinical cohorts

Science Advances ◽

10.1126/sciadv.abd2605 ◽

2021 ◽

Vol 7 (32) ◽

pp. eabd2605

Author(s):

Kar-Tong Tan ◽

Ling-Wen Ding ◽

Chan-Shuo Wu ◽

Daniel G. Tenen ◽

Henry Yang

Keyword(s):

Rna Sequencing ◽

Cancer Progression ◽

De Novo ◽

Rna Modification ◽

Rna Modifications ◽

Multiple Cancer ◽

Patients With Cancer ◽

Statistical Framework ◽

Cancer Types ◽

Nucleotide Resolution

The study of RNA modifications in large clinical cohorts can reveal relationships between the epitranscriptome and human diseases, although this is especially challenging. We developed ModTect (https://github.com/ktan8/ModTect), a statistical framework to identify RNA modifications de novo by standard RNA-sequencing with deletion and mis-incorporation signals. We show that ModTect can identify both known (N1-methyladenosine) and previously unknown types of mRNA modifications (N2,N2-dimethylguanosine) at nucleotide-resolution. Applying ModTect to 11,371 patient samples and 934 cell lines across 33 cancer types, we show that the epitranscriptome was dysregulated in patients across multiple cancer types and was additionally associated with cancer progression and survival outcomes. Some types of RNA modification were also more disrupted than others in patients with cancer. Moreover, RNA modifications contribute to multiple types of RNA-DNA sequence differences, which unexpectedly escape detection by Sanger sequencing. ModTect can thus be used to discover associations between RNA modifications and clinical outcomes in patient cohorts.

Download Full-text

SNV identification from single-cell RNA sequencing data

Human Molecular Genetics ◽

10.1093/hmg/ddz207 ◽

2019 ◽

Vol 28 (21) ◽

pp. 3569-3583 ◽

Cited By ~ 3

Author(s):

Patricia M Schnepp ◽

Mengjie Chen ◽

Evan T Keller ◽

Xiang Zhou

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Specific Gene ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Cell Rna Sequencing ◽

Sequencing Studies ◽

Genomic Regions

Abstract Integrating single-cell RNA sequencing (scRNA-seq) data with genotypes obtained from DNA sequencing studies facilitates the detection of functional genetic variants underlying cell type-specific gene expression variation. Unfortunately, most existing scRNA-seq studies do not come with DNA sequencing data; thus, being able to call single nucleotide variants (SNVs) from scRNA-seq data alone can provide crucial and complementary information, detection of functional SNVs, maximizing the potential of existing scRNA-seq studies. Here, we perform extensive analyses to evaluate the utility of two SNV calling pipelines (GATK and Monovar), originally designed for SNV calling in either bulk or single-cell DNA sequencing data. In both pipelines, we examined various parameter settings to determine the accuracy of the final SNV call set and provide practical recommendations for applied analysts. We found that combining all reads from the single cells and following GATK Best Practices resulted in the highest number of SNVs identified with a high concordance. In individual single cells, Monovar resulted in better quality SNVs even though none of the pipelines analyzed is capable of calling a reasonable number of SNVs with high accuracy. In addition, we found that SNV calling quality varies across different functional genomic regions. Our results open doors for novel ways to leverage the use of scRNA-seq for the future investigation of SNV function.

Download Full-text

YerA41, a Yersinia ruckeri Bacteriophage: Determination of a Non-Sequencable DNA Bacteriophage Genome via RNA-Sequencing

Viruses ◽

10.3390/v12060620 ◽

2020 ◽

Vol 12 (6) ◽

pp. 620

Author(s):

Katarzyna Leskinen ◽

Maria I. Pajunen ◽

Miguel Vincente Gomez-Raya Vilanova ◽

Saija Kiljunen ◽

Andrew Nelson ◽

...

Keyword(s):

Rna Sequencing ◽

Transcriptional Control ◽

De Novo ◽

Genomic Sequence ◽

Pcr Amplification ◽

Yersinia Ruckeri ◽

Sequencing Data ◽

Bacterial Gene ◽

Sequencing Technologies ◽

Bacterial Gene Expression

YerA41 is a Myoviridae bacteriophage that was originally isolated due its ability to infect Yersinia ruckeri bacteria, the causative agent of enteric redmouth disease of salmonid fish. Several attempts to determine its genomic DNA sequence using traditional and next generation sequencing technologies failed, indicating that the phage genome is modified in such a way that it is an unsuitable template for PCR amplification and for conventional sequencing. To determine the YerA41 genome sequence, we performed RNA-sequencing from phage-infected Y. ruckeri cells at different time points post-infection. The host-genome specific reads were subtracted and de novo assembly was performed on the remaining unaligned reads. This resulted in nine phage-specific scaffolds with a total length of 143 kb that shared only low level and scattered identity to known sequences deposited in DNA databases. Annotation of the sequences revealed 201 predicted genes, most of which found no homologs in the databases. Proteome studies identified altogether 63 phage particle-associated proteins. The RNA-sequencing data were used to characterize the transcriptional control of YerA41 and to investigate its impact on the bacterial gene expression. Overall, our results indicate that RNA-sequencing can be successfully used to obtain the genomic sequence of non-sequencable phages, providing simultaneous information about the phage–host interactions during the process of infection.

Download Full-text

De Novo Assembly of a Bell Pepper Endornavirus Genome Sequence Using RNA Sequencing Data

Genome Announcements ◽

10.1128/genomea.00061-15 ◽

2015 ◽

Vol 3 (2) ◽

Cited By ~ 5

Author(s):

Yeonhwa Jo ◽

Hoseng Choi ◽

Won Kyong Cho

Keyword(s):

Rna Sequencing ◽

Genome Sequence ◽

De Novo Assembly ◽

De Novo ◽

Bell Pepper ◽

Sequencing Data

Download Full-text

Genome wide efficiency profiling reveals modulation of maintenance and de novo methylation by Tets

10.1101/2020.08.06.236307 ◽

2020 ◽

Author(s):

Pascal Giehr ◽

Charalampos Kyriakopoulos ◽

Karl Nordström ◽

Abduhlrahman Salhab ◽

Fabian Müller ◽

...

Keyword(s):

Dna Methylation ◽

Molecular Mechanisms ◽

De Novo ◽

Epigenetic Modification ◽

Embryonic Stem ◽

Regulatory Elements ◽

Sequencing Data ◽

Reduced Representation ◽

Genome Wide ◽

Global And Local

AbstractBackgroundDNA methylation is an essential epigenetic modification which is set and maintained by DNA methyl transferases (Dnmts) and removed via active and passive mechanisms involving Tet mediated oxidation. While the molecular mechanisms of these enzymes are well studied, their interplay on shaping cell specific methylomes remains less well understood. In our work we model the activities of Tets and Dnmts at single CpGs across the genome using a novel type of high resolution sequencing data.ResultsTo accurately measure 5mC and 5hmC levels at single CpGs we developed RRHPoxBS, a reduced representation hairpin oxidative bisulfite sequencing approach. Using this method we mapped the methylomes and hydroxymethylomes of wild type and Tet triple knockout mouse embryonic stem cells. These comprehensive datasets were then used to develop an extended Hidden Markov model allowing us i) to determine the symmetrical methylation and hydroxymethylation state at millions of individual CpGs, ii) infer the maintenance and de novo methylation efficiencies of Dnmts and the hydroxylation efficiencies of Tets at individual CpG positions. We find that Tets exhibit their highest activity around unmethylated regulatory elements, i.e. active promoters and enhancers. Furthermore, we find that Tets’ presence has a profound effect on the global and local maintenance and de novo methylation activities by the Dnmts, not only substantially contributing to a universal demethylation of the genome but also shaping the overall methylation landscape.ConclusionsOur analysis demonstrates that a fine tuned and locally controlled interplay between Tets and Dnmts is important to modulate de novo and maintenance activities of Dnmts across the genome. Tet activities contribute to DNA methylation patterning in the following ways: They oxidize 5mC, they locally shield DNA from accidental de novo methylation and at the same time modulate maintenance and de novo methylation efficiencies of Dnmts across the genome.

Download Full-text

Deconvolution of Expression for Nascent RNA Sequencing Data (DENR) Highlights Pre-RNA Isoform Diversity in Human Cells

10.1101/2021.03.16.435537 ◽

2021 ◽

Author(s):

Yixin Zhao ◽

Noah Dukler ◽

Gilad Barshad ◽

Shushan Toneyan ◽

Charles G. Danko ◽

...

Keyword(s):

T Cells ◽

Rna Sequencing ◽

Cell Types ◽

Transcription Unit ◽

Human Cells ◽

Computational Method ◽

Rna Seq ◽

Sequencing Data ◽

Isoform Diversity ◽

Nascent Rna

AbstractQuantification of mature-RNA isoform abundance from RNA-seq data has been extensively studied, but much less attention has been devoted to quantifying the abundance of distinct precursor RNAs based on nascent RNA sequencing data. Here we address this problem with a new computational method called Deconvolution of Expression for Nascent RNA sequencing data (DENR). DENR models the nascent RNA read counts at each locus as a mixture of user-provided isoforms. The performance of the baseline algorithm is enhanced by the use of machine-learning predictions of transcription start sites (TSSs) and an adjustment for the typical “shape profile” of read counts along a transcription unit. We show using simulated data that DENR clearly outperforms simple read-count-based methods for estimating the abundances of both whole genes and isoforms. By applying DENR to previously published PRO-seq data from K562 and CD4+ T cells, we find that transcription of multiple isoforms per gene is widespread, and the dominant isoform frequently makes use of an internal TSS. We also identify > 200 genes whose dominant isoforms make use of different TSSs in these two cell types. Finally, we apply DENR and StringTie to newly generated PRO-seq and RNA-seq data, respectively, for human CD4+ T cells and CD14+ monocytes, and show that entropy at the pre-RNA level makes a disproportionate contribution to overall isoform diversity, especially across cell types. Altogether, DENR is the first computational tool to enable abundance quantification of pre-RNA isoforms based on nascent RNA sequencing data, and it reveals high levels of pre-RNA isoform diversity in human cells.

Download Full-text

Stability of DNA methylation and chromatin accessibility in structurally diverse maize genomes

10.1101/2021.03.10.434810 ◽

2021 ◽

Author(s):

Jaclyn M Noshay ◽

Zhikai Liang ◽

Peng Zhou ◽

Peter A Crisp ◽

Alexandre P Marand ◽

...

Keyword(s):

Dna Methylation ◽

Sequence Variation ◽

Reference Genome ◽

De Novo ◽

Regulatory Elements ◽

Chromatin Accessibility ◽

Genome Context ◽

Genomic Regions ◽

Genome Assemblies ◽

Accessible Chromatin

AbstractAccessible chromatin and unmethylated DNA are associated with many genes and cis-regulatory elements. Attempts to understand natural variation for accessible chromatin regions (ACRs) and unmethylated regions (UMRs) often rely upon alignments to a single reference genome. This limits the ability to assess regions that are absent in the reference genome assembly and monitor how nearby structural variants influence variation in chromatin state. In this study, de novo genome assemblies for four maize inbreds (B73, Mo17, Oh43 and W22) are utilized to assess chromatin accessibility and DNA methylation patterns in a pan-genome context. The number of UMRs and ACRs that can be identified is more accurate when chromatin data is aligned to the matched genome rather than a single reference genome. While there are UMRs and ACRs present within genomic regions that are not shared between genotypes, these features are substantially enriched within shared regions, as determined by chromosomal alignments. Characterization of UMRs present within shared genomic regions reveals that most UMRs maintain the unmethylated state in other genotypes with only a small number being polymorphic between genotypes. However, the majority of UMRs between genotypes only exhibit partial overlaps suggesting that the boundaries between methylated and unmethylated DNA are dynamic. This instability is not solely due to sequence variation as these partially overlapping UMRs are frequently found within genomic regions that lack sequence variation. The ability to compare chromatin properties among individuals with structural variation enables pan-epigenome analyses to study the sources of variation for accessible chromatin and unmethylated DNA.Article summaryRegions of the genome that have accessible chromatin or unmethylated DNA are often associated with cis-regulatory elements. We assessed chromatin accessibility and DNA methylation in four structurally diverse maize genomes. There are accessible or unmethylated regions within the non-shared portions of the genomes but these features are depleted within these regions. Evaluating the dynamics of methylation and accessibility between genotypes reveals conservation of features, albeit with variable boundaries suggesting some instability of the precise edges of unmethylated regions.

Download Full-text