scholarly journals CAGEfightR: Cap Analysis of Gene Expression (CAGE) in R/Bioconductor

2018 ◽  
Author(s):  
Malte Thodberg ◽  
Axel Thieffry ◽  
Kristoffer Vitting-Seerup ◽  
Robin Andersson ◽  
Albin Sandelin

AbstractWe developed the CAGEfightR R/Biconductor-package for analyzing CAGE data. CAGEfightR allows for fast and memory efficient identification of transcription start sites (TSSs) and predicted enhancers. Downstream analysis, including annotation, quantification, visualization and TSS shape statistics are implemented in easy-to-use functions. The package is freely available at https://bioconductor.org/packages/CAGEfightR

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Malte Thodberg ◽  
Axel Thieffry ◽  
Kristoffer Vitting-Seerup ◽  
Robin Andersson ◽  
Albin Sandelin

Abstract Background 5′-end sequencing assays, and Cap Analysis of Gene Expression (CAGE) in particular, have been instrumental in studying transcriptional regulation. 5′-end methods provide genome-wide maps of transcription start sites (TSSs) with base pair resolution. Because active enhancers often feature bidirectional TSSs, such data can also be used to predict enhancer candidates. The current availability of mature and comprehensive computational tools for the analysis of 5′-end data is limited, preventing efficient analysis of new and existing 5′-end data. Results We present CAGEfightR, a framework for analysis of CAGE and other 5′-end data implemented as an R/Bioconductor-package. CAGEfightR can import data from BigWig files and allows for fast and memory efficient prediction and analysis of TSSs and enhancers. Downstream analyses include quantification, normalization, annotation with transcript and gene models, TSS shape statistics, linking TSSs to enhancers via co-expression, identification of enhancer clusters, and genome-browser style visualization. While built to analyze CAGE data, we demonstrate the utility of CAGEfightR in analyzing nascent RNA 5′-data (PRO-Cap). CAGEfightR is implemented using standard Bioconductor classes, making it easy to learn, use and combine with other Bioconductor packages, for example popular differential expression tools such as limma, DESeq2 and edgeR. Conclusions CAGEfightR provides a single, scalable and easy-to-use framework for comprehensive downstream analysis of 5′-end data. CAGEfightR is designed to be interoperable with other Bioconductor packages, thereby unlocking hundreds of mature transcriptomic analysis tools for 5′-end data. CAGEfightR is freely available via Bioconductor: bioconductor.org/packages/CAGEfightR .


Author(s):  
Masaki Suimye Morioka ◽  
Hideya Kawaji ◽  
Hiromi Nishiyori-Sueki ◽  
Mitsuyoshi Murata ◽  
Miki Kojima-Ishiyama ◽  
...  

2022 ◽  
Vol 5 (4) ◽  
pp. e202101234
Author(s):  
Sonal Dahale ◽  
Jorge Ruiz-Orera ◽  
Jan Silhavy ◽  
Norbert Hübner ◽  
Sebastiaan van Heesch ◽  
...  

The role of alternative promoter usage in tissue-specific gene expression has been well established; however, its role in complex diseases is poorly understood. We performed cap analysis of gene expression (CAGE) sequencing from the left ventricle of a rat model of hypertension, the spontaneously hypertensive rat (SHR), and a normotensive strain, Brown Norway to understand the role of alternative promoter usage in complex disease. We identified 26,560 CAGE-defined transcription start sites in the rat left ventricle, including 1,970 novel cardiac transcription start sites. We identified 28 genes with alternative promoter usage between SHR and Brown Norway, which could lead to protein isoforms differing at the amino terminus between two strains and 475 promoter switching events altering the length of the 5′ UTR. We found that the shift in Insr promoter usage was significantly associated with insulin levels and blood pressure within a panel of HXB/BXH recombinant inbred rat strains, suggesting that hyperinsulinemia due to insulin resistance might lead to hypertension in SHR. Our study provides a preliminary evidence of alternative promoter usage in complex diseases.


BMC Genomics ◽  
2014 ◽  
Vol 15 (1) ◽  
pp. 982 ◽  
Author(s):  
Masayuki Yasuda ◽  
Yuji Tanaka ◽  
Koji M Nishiguchi ◽  
Morin Ryu ◽  
Satoru Tsuda ◽  
...  

F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 886 ◽  
Author(s):  
Malte Thodberg ◽  
Albin Sandelin

Cap Analysis of Gene Expression (CAGE) is one of the most popular 5'-end sequencing methods. In a single experiment, CAGE can be used to locate and quantify the expression of both Transcription Start Sites (TSSs) and enhancers. This is workflow is a case study on how to use the CAGEfightR package to orchestrate analysis of CAGE data within the Bioconductor project. This workflow starts from BigWig-files and covers both basic CAGE analyses such as identifying, quantifying and annotating TSSs and enhancers, advanced analysis such as finding interacting TSS-enhancer pairs and enhancer clusters, to differential expression analysis and alternative TSS usage. R-code, discussion and references are intertwined to help provide guidelines for future CAGE studies of the same kind.


2021 ◽  
Vol 3 (3) ◽  
Author(s):  
Isaac Shamie ◽  
Sascha H Duttke ◽  
Karen J la Cour Karottki ◽  
Claudia Z Han ◽  
Anders H Hansen ◽  
...  

Abstract Chinese hamster ovary (CHO) cells are widely used for producing biopharmaceuticals, and engineering gene expression in CHO is key to improving drug quality and affordability. However, engineering gene expression or activating silent genes requires accurate annotation of the underlying regulatory elements and transcription start sites (TSSs). Unfortunately, most TSSs in the published Chinese hamster genome sequence were computationally predicted and are frequently inaccurate. Here, we use nascent transcription start site sequencing methods to revise TSS annotations for 15 308 Chinese hamster genes and 3034 non-coding RNAs based on experimental data from CHO-K1 cells and 10 hamster tissues. We further capture tens of thousands of putative transcribed enhancer regions with this method. Our revised TSSs improves upon the RefSeq annotation by revealing core sequence features of gene regulation such as the TATA box and the Initiator and, as exemplified by targeting the glycosyltransferase gene Mgat3, facilitate activating silent genes by CRISPRa. Together, we envision our revised annotation and data will provide a rich resource for the CHO community, improve genome engineering efforts and aid comparative and evolutionary studies.


Genetics ◽  
2021 ◽  
Author(s):  
John M Schoelz ◽  
Justina X Feng ◽  
Nicole C Riddle

Abstract Drosophila Heterochromatin Protein 1a (HP1a) is essential for heterochromatin formation and is involved in transcriptional silencing. However, certain loci require HP1a to be transcribed. One model posits that HP1a acts as a transcriptional silencer within euchromatin while acting as an activator within heterochromatin. However, HP1a has been observed as an activator of a set of euchromatic genes. Therefore, it is not clear whether, or how, chromatin context informs the function of HP1 proteins. To understand the role of HP1 proteins in transcription, we examined the genome-wide binding profile of HP1a as well as two other Drosophila HP1 family members, HP1B and HP1C, to determine whether coordinated binding of these proteins is associated with specific transcriptional outcomes. We found that HP1 proteins share many of their endogenous binding targets. These genes are marked by active histone modifications and are expressed at higher levels than non-target genes in both heterochromatin and euchromatin. In addition, HP1 binding targets displayed increased RNA polymerase pausing compared to non-target genes. Specifically, co-localization of HP1B and HP1C was associated with the highest levels of polymerase pausing and gene expression. Analysis of HP1 null mutants suggests these proteins coordinate activity at transcription start sites to regulate transcription. Depletion of HP1B or HP1C alters expression of protein-coding genes bound by HP1 family members. Our data broaden understanding of the mechanism of transcriptional activation by HP1a and highlight the need to consider particular protein-protein interactions, rather than broader chromatin context, to predict impacts of HP1 at transcription start sites.


2021 ◽  
Author(s):  
Jill E Moore ◽  
Xiao-Ou Zhang ◽  
Shaimae I Elhajjajy ◽  
Kaili Fan ◽  
Fairlie Reese ◽  
...  

Accurate transcription start site (TSS) annotations are essential for understanding transcriptional regulation and its role in human disease. Gene collections such as GENCODE contain annotations for tens of thousands of TSSs, but not all of these annotations are experimentally validated nor do they contain information on cell type-specific usage. Therefore, we sought to generate a collection of experimentally validated TSSs by integrating RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression (RAMPAGE) data from 115 cell and tissue types, which resulted in a collection of approximately 50 thousand representative RAMPAGE peaks. These peaks were primarily proximal to GENCODE-annotated TSSs and were concordant with other transcription assays. Because RAMPAGE uses paired-end reads, we were then able to connect peaks to transcripts by analyzing the genomic positions of the 3' ends of read mates. Using this paired-end information, we classified the vast majority (37 thousand) of our RAMPAGE peaks as verified TSSs, updating TSS annotations for 20% of GENCODE genes. We also found that these updated TSS annotations were supported by epigenomic and other transcriptomic datasets. To demonstrate the utility of this RAMPAGE rPeak collection, we intersected it with the NHGRI/EBI GWAS catalog and identified new candidate GWAS genes. Overall, our work demonstrates the importance of integrating experimental data to further refine TSS annotations and provides a valuable resource for the biological community.


Sign in / Sign up

Export Citation Format

Share Document