scholarly journals Flexible analysis of TSS mapping data and detection of TSS shifts with TSRexploreR

2021 ◽  
Author(s):  
Robert A. Policastro ◽  
Daniel J. McDonald ◽  
Volker P. Brendel ◽  
Gabriel E. Zentner

AbstractHeterogeneity in transcription initiation has important consequences for transcript stability and translation, and shifts in transcription start site (TSS) usage are prevalent in various disease and developmental contexts. Accordingly, numerous methods for global TSS profiling have been developed, including our recently published Survey of TRanscription Initiation at Promoter Elements with high-throughput sequencing (STRIPE-seq), a method to profile transcription start sites (TSSs) on a genome-wide scale with minimal cost and time. In parallel to our development of STRIPE-seq, we built TSRexploreR, an R package for end-to-end analysis of TSS mapping data. TSRexploreR provides functions for TSS and TSR detection, normalization, correlation, visualization, and differential TSS/TSR analysis. TSRexploreR is highly interoperable, accepting the data structures of TSS and TSR sets generated by several existing tools for processing and alignment of TSS mapping data, such as CAGEr for Cap Analysis of Gene Expression (CAGE) data. Lastly, TSRexploreR implements a novel approach for the detection of shifts in TSS distribution.

2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Robert A Policastro ◽  
Daniel J McDonald ◽  
Volker P Brendel ◽  
Gabriel E Zentner

Abstract Heterogeneity in transcription initiation has important consequences for transcript stability and translation, and shifts in transcription start site (TSS) usage are prevalent in various developmental, metabolic, and disease contexts. Accordingly, numerous methods for global TSS profiling have been developed, including most recently Survey of TRanscription Initiation at Promoter Elements with high-throughput sequencing (STRIPE-seq), a method to profile transcription start sites (TSSs) on a genome-wide scale with significant cost and time savings compared to previous methods. In anticipation of more widespread adoption of STRIPE-seq and related methods for construction of promoter atlases and studies of differential gene expression, we built TSRexploreR, an R package for end-to-end analysis of TSS mapping data. TSRexploreR provides functions for TSS and transcription start region (TSR) detection, normalization, correlation, visualization, and differential TSS/TSR analyses. TSRexploreR is highly interoperable, accepting the data structures of TSS and TSR sets generated by several existing tools for processing and alignment of TSS mapping data, such as CAGEr for Cap Analysis of Gene Expression (CAGE) data. Lastly, TSRexploreR implements a novel approach for the detection of shifts in TSS distribution.


2021 ◽  
Vol 12 ◽  
Author(s):  
Huiyuan Wang ◽  
Sheng Liu ◽  
Xiufang Dai ◽  
Yongkang Yang ◽  
Yunjun Luo ◽  
...  

Populus trichocarpa (P. trichocarpa) is a model tree for the investigation of wood formation. In recent years, researchers have generated a large number of high-throughput sequencing data in P. trichocarpa. However, no comprehensive database that provides multi-omics associations for the investigation of secondary growth in response to diverse stresses has been reported. Therefore, we developed a public repository that presents comprehensive measurements of gene expression and post-transcriptional regulation by integrating 144 RNA-Seq, 33 ChIP-seq, and six single-molecule real-time (SMRT) isoform sequencing (Iso-seq) libraries prepared from tissues subjected to different stresses. All the samples from different studies were analyzed to obtain gene expression, co-expression network, and differentially expressed genes (DEG) using unified parameters, which allowed comparison of results from different studies and treatments. In addition to gene expression, we also identified and deposited pre-processed data about alternative splicing (AS), alternative polyadenylation (APA) and alternative transcription initiation (ATI). The post-transcriptional regulation, differential expression, and co-expression network datasets were integrated into a new P. trichocarpa Stem Differentiating Xylem (PSDX) database, which further highlights gene families of RNA-binding proteins and stress-related genes. The PSDX also provides tools for data query, visualization, a genome browser, and the BLAST option for sequence-based query. Much of the data is also available for bulk download. The availability of PSDX contributes to the research related to the secondary growth in response to stresses in P. trichocarpa, which will provide new insights that can be useful for the improvement of stress tolerance in woody plants.


Author(s):  
Masahiko Imashimizu ◽  
Yuji Tokunaga ◽  
Ariel Afek ◽  
Hiroki Takahashi ◽  
Nobuo Shimamoto ◽  
...  

In the process of transcription initiation by RNA polymerase, promoter DNA sequences affect multiple reaction pathways determining the productivity of transcription. However, the question of how the molecular mechanism of transcription initiation depends on sequence properties of promoter DNA remains poorly understood. Here, combining the statistical mechanical approach with high-throughput sequencing results, we characterize abortive transcription and pausing during transcription initiation by Escherichia coli RNA polymerase at a genome-wide level. Our results suggest that initially transcribed sequences enriched with thymine bases represent the signal inducing abortive transcription. On the other hand, certain repetitive sequence elements broadly embedded in promoter regions constitute the signal inducing pausing. Both signals decrease the productivity of transcription initiation. Based on solution NMR and in vitro transcription measurements, we also suggest that repetitive sequence elements of promoter DNA modulate the rigidity of its double-stranded form, which profoundly influences the reaction coordinates of the productive initiation via pausing.


2018 ◽  
Author(s):  
Nevena Cvetesic ◽  
Harry G. Leitch ◽  
Malgorzata Borkowska ◽  
Ferenc Müller ◽  
Piero Carninci ◽  
...  

ABSTRACTCap analysis of gene expression (CAGE) is a methodology for genome-wide quantitative mapping of mRNA 5’ends to precisely capture transcription start sites at a single nucleotide resolution. In combination with high-throughput sequencing, CAGE has revolutionized our understanding of rules of transcription initiation, led to discovery of new core promoter sequence features and discovered transcription initiation at enhancers genome-wide. The biggest limitation of CAGE is that even the most recently improved version (nAnT-iCAGE) still requires large amounts of total cellular RNA (5 micrograms), preventing its application to scarce biological samples such as those from early embryonic development or rare cell types. Here, we present SLIC-CAGE, a Super-Low Input Carrier-CAGE approach to capture 5’ends of RNA polymerase II transcripts from as little as 5-10 ng of total RNA. The dramatic increase in sensitivity is achieved by specially designed, selectively degradable carrier RNA. We demonstrate the ability of SLIC-CAGE to generate data for genome-wide promoterome with 1000-fold less material than required by existing CAGE methods by generating a complex, high quality library from mouse embryonic day (E) 11.5 primordial germ cells.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Lakshay Anand ◽  
Carlos M. Rodriguez Lopez

Abstract Background The recent advancements in high-throughput sequencing have resulted in the availability of annotated genomes, as well as of multi-omics data for many living organisms. This has increased the need for graphic tools that allow the concurrent visualization of genomes and feature-associated multi-omics data on single publication-ready plots. Results We present chromoMap, an R package, developed for the construction of interactive visualizations of chromosomes/chromosomal regions, mapping of any chromosomal feature with known coordinates (i.e., protein coding genes, transposable elements, non-coding RNAs, microsatellites, etc.), and chromosomal regional characteristics (i.e. genomic feature density, gene expression, DNA methylation, chromatin modifications, etc.) of organisms with a genome assembly. ChromoMap can also integrate multi-omics data (genomics, transcriptomics and epigenomics) in relation to their occurrence across chromosomes. ChromoMap takes tab-delimited files (BED like) or alternatively R objects to specify the genomic co-ordinates of the chromosomes and elements to annotate. Rendered chromosomes are composed of continuous windows of a given range, which, on hover, display detailed information about the elements annotated within that range. By adjusting parameters of a single function, users can generate a variety of plots that can either be saved as static image or as HTML documents. Conclusions ChromoMap’s flexibility allows for concurrent visualization of genomic data in each strand of a given chromosome, or of more than one homologous chromosome; allowing the comparison of multi-omic data between genotypes (e.g. species, varieties, etc.) or between homologous chromosomes of phased diploid/polyploid genomes. chromoMap is an extensive tool that can be potentially used in various bioinformatics analysis pipelines for genomic visualization of multi-omics data.


Biomolecules ◽  
2020 ◽  
Vol 10 (9) ◽  
pp. 1299 ◽  
Author(s):  
Masahiko Imashimizu ◽  
Yuji Tokunaga ◽  
Ariel Afek ◽  
Hiroki Takahashi ◽  
Nobuo Shimamoto ◽  
...  

In the process of transcription initiation by RNA polymerase, promoter DNA sequences affect multiple reaction pathways determining the productivity of transcription. However, the question of how the molecular mechanism of transcription initiation depends on the sequence properties of promoter DNA remains poorly understood. Here, combining the statistical mechanical approach with high-throughput sequencing results, we characterize abortive transcription and pausing during transcription initiation by Escherichia coli RNA polymerase at a genome-wide level. Our results suggest that initially transcribed sequences, when enriched with thymine bases, contain the signal for inducing abortive transcription, whereas certain repetitive sequence elements embedded in promoter regions constitute the signal for inducing pausing. Both signals decrease the productivity of transcription initiation. Based on solution NMR and in vitro transcription measurements, we suggest that repetitive sequence elements within the promoter DNA modulate the nonlocal base pair stability of its double-stranded form. This stability profoundly influences the reaction coordinates of the productive initiation via pausing.


2017 ◽  
Author(s):  
Gemma B. Danks ◽  
Pavla Navratilova ◽  
Boris Lenhard ◽  
Eric Thompson

AbstractDevelopment is largely driven by transitions between transcriptional programs. The initiation of transcription at appropriate sites in the genome is a key component of this and yet few rules governing selection are known. Here, we used cap analysis of gene expression (CAGE) to generate bp-resolution maps of transcription start sites (TSSs) across the genome of Oikopleura dioica, a member of the closest living relatives to vertebrates. Our TSS maps revealed promoter features in common with vertebrates, as well as striking differences, and uncovered key roles for core promoter elements in the regulation of development. During spermatogenesis there is a genome-wide shift in mode of transcription initiation characterized by a novel core promoter element. This element was associated with > 70% of transcription in the testis, including the male-specific use of cryptic internal promoters within operons. In many cases this led to the exclusion of trans-splice sites, revealing a novel mechanism for regulating which mRNAs receive the spliced leader. During oogenesis the cell cycle regulator, E2F1, has been co-opted in regulating maternal transcription in endocycling nurse nuclei. In addition, maternal promoters lack the TATA-like element found in vertebrates and have broad, rather than sharp, architectures with ordered nucleosomes. Promoters of ribosomal protein genes lack the highly conserved TCT initiator. We also report an association between DNA methylation on transcribed gene bodies and the TATA-box, which indicates that this ancient promoter motif may play a role in selecting DNA for transcription-associated methylation in invertebrate genomes.


2020 ◽  
Author(s):  
Robert A. Policastro ◽  
R. Taylor Raborn ◽  
Volker P. Brendel ◽  
Gabriel E. Zentner

AbstractAccurate mapping of transcription start sites (TSSs) is key for understanding transcriptional regulation. However, current protocols for genome-wide TSS profiling are laborious and/or expensive. We present Survey of TRanscription Initiation at Promoter Elements with high-throughput sequencing (STRIPE-seq), a simple, rapid, and cost-effective protocol for sequencing capped RNA 5’ ends from as little as 50 ng total RNA. Including depletion of uncapped RNA and SPRI bead cleanups, a STRIPE-seq library can be constructed in about five hours. We demonstrate application of STRIPE-seq to TSS profiling in yeast and human cells and show that it can also be effectively used for measuring transcript levels and differential gene expression analysis. In conjunction with our ready-to-use computational analysis workflows, STRIPE-seq is a straightforward, efficient means by which to probe the landscape of transcriptional initiation.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Arnaud Liehrmann ◽  
Guillem Rigaill ◽  
Toby Dylan Hocking

Abstract Background Histone modification constitutes a basic mechanism for the genetic regulation of gene expression. In early 2000s, a powerful technique has emerged that couples chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq). This technique provides a direct survey of the DNA regions associated to these modifications. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed or adapted to analyze the massive amount of data it generates. Many of these algorithms were built around natural assumptions such as the Poisson distribution to model the noise in the count data. In this work we start from these natural assumptions and show that it is possible to improve upon them. Results Our comparisons on seven reference datasets of histone modifications (H3K36me3 & H3K4me3) suggest that natural assumptions are not always realistic under application conditions. We show that the unconstrained multiple changepoint detection model with alternative noise assumptions and supervised learning of the penalty parameter reduces the over-dispersion exhibited by count data. These models, implemented in the R package CROCS (https://github.com/aLiehrmann/CROCS), detect the peaks more accurately than algorithms which rely on natural assumptions. Conclusion The segmentation models we propose can benefit researchers in the field of epigenetics by providing new high-quality peak prediction tracks for H3K36me3 and H3K4me3 histone modifications.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xue Lin ◽  
Yingying Hua ◽  
Shuanglin Gu ◽  
Li Lv ◽  
Xingyu Li ◽  
...  

Abstract Background Genomic localized hypermutation regions were found in cancers, which were reported to be related to the prognosis of cancers. This genomic localized hypermutation is quite different from the usual somatic mutations in the frequency of occurrence and genomic density. It is like a mutations “violent storm”, which is just what the Greek word “kataegis” means. Results There are needs for a light-weighted and simple-to-use toolkit to identify and visualize the localized hypermutation regions in genome. Thus we developed the R package “kataegis” to meet these needs. The package used only three steps to identify the genomic hypermutation regions, i.e., i) read in the variation files in standard formats; ii) calculate the inter-mutational distances; iii) identify the hypermutation regions with appropriate parameters, and finally one step to visualize the nucleotide contents and spectra of both the foci and flanking regions, and the genomic landscape of these regions. Conclusions The kataegis package is available on Bionconductor/Github (https://github.com/flosalbizziae/kataegis), which provides a light-weighted and simple-to-use toolkit for quickly identifying and visualizing the genomic hypermuation regions.


Sign in / Sign up

Export Citation Format

Share Document