scholarly journals Accurate estimation of intrinsic biases for improved analysis of chromatin accessibility sequencing data using SELMA

2021 ◽  
Author(s):  
Shengen Shawn Hu ◽  
Lin Liu ◽  
Qi Li ◽  
Wenjing Ma ◽  
Michael J Guertin ◽  
...  

Genome-wide profiling of chromatin accessibility by DNase-seq or ATAC-seq has been widely used to identify regulatory DNA elements and transcription factor binding sites. However, enzymatic DNA cleavage exhibits intrinsic sequence biases that confound chromatin accessibility profiling data analysis. Existing computational tools are limited in their ability to account for such intrinsic biases. Here, we present Simplex Encoded Linear Model for Accessible Chromatin (SELMA), a computational method for systematic estimation of intrinsic cleavage biases from genomic chromatin accessibility profiling data. We demonstrate that SELMA yields accurate and robust bias estimation from both bulk and single-cell DNase-seq and ATAC-seq data. We show that transcription factor binding inference from DNase footprints can be improved by incorporating estimated biases using SELMA. We also demonstrate improved cell clustering of single-cell ATAC-seq data by considering the SELMA-estimated bias effect. SELMA can be applied to existing bioinformatics tools to improve the analysis of chromatin accessibility sequencing data.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Sarah E. Pierce ◽  
Jeffrey M. Granja ◽  
William J. Greenleaf

AbstractChromatin accessibility profiling can identify putative regulatory regions genome wide; however, pooled single-cell methods for assessing the effects of regulatory perturbations on accessibility are limited. Here, we report a modified droplet-based single-cell ATAC-seq protocol for perturbing and evaluating dynamic single-cell epigenetic states. This method (Spear-ATAC) enables simultaneous read-out of chromatin accessibility profiles and integrated sgRNA spacer sequences from thousands of individual cells at once. Spear-ATAC profiling of 104,592 cells representing 414 sgRNA knock-down populations reveals the temporal dynamics of epigenetic responses to regulatory perturbations in cancer cells and the associations between transcription factor binding profiles.


2019 ◽  
Author(s):  
Arif Harmanci ◽  
Akdes Serin Harmanci ◽  
Jyothishmathi Swaminathan ◽  
Vidya Gopalakrishnan

Abstract Motivation Functional genomics experiments generate genomewide signal profiles that are dense information sources for annotating the regulatory elements. These profiles measure epigenetic activity at the nucleotide resolution and they exhibit distinctive patterns as they fluctuate along the genome. Most notable of these patterns are the valley patterns that are prevalently observed in assays such as ChIP Sequencing and bisulfite sequencing. The genomic positions of valleys pinpoint locations of cis-regulatory elements such as enhancers and insulators. Systematic identification of the valleys provides novel information for delineating the annotation of regulatory elements. Nevertheless, the valleys are not reported by majority of the analysis pipelines. Results We describe EpiSAFARI, a computational method for sensitive detection of valleys from diverse types of epigenetic profiles. EpiSAFARI employs a novel smoothing method for decreasing noise in signal profiles and accounts for technical factors such as sparse signals, mappability, and nucleotide content. In performance comparisons, EpiSAFARI performs favorably in terms of accuracy. The histone modification valleys detected by EpiSAFARI exhibit high conservation, transcription factor binding, and they are enriched in nascent transcription. In addition, the large clusters of histone valleys are found to be enriched at the promoters of the developmentally associated genes. Differential histone valleys exhibit concordance with differential DNase signal at cell line specific valleys. DNA methylation valleys exhibit elevated conservation and high transcription factor binding. Specifically, we observed enriched binding of transcription factors associated with chromatin structure around methyl-valleys. Availability EpiSAFARI is publicly available at https://github.com/harmancilab/EpiSAFARI Supplementary information Supplementary data are available at Bioinformatics online.


2010 ◽  
Vol 21 (3) ◽  
pp. 447-455 ◽  
Author(s):  
R. Pique-Regi ◽  
J. F. Degner ◽  
A. A. Pai ◽  
D. J. Gaffney ◽  
Y. Gilad ◽  
...  

2021 ◽  
Vol 118 (20) ◽  
pp. e2026754118
Author(s):  
Chun-Ping Yu ◽  
Chen-Hao Kuo ◽  
Chase W. Nelson ◽  
Chi-An Chen ◽  
Zhi Thong Soh ◽  
...  

Transcription factor binding sites (TFBSs) are essential for gene regulation, but the number of known TFBSs remains limited. We aimed to discover and characterize unknown TFBSs by developing a computational pipeline for analyzing ChIP-seq (chromatin immunoprecipitation followed by sequencing) data. Applying it to the latest ENCODE ChIP-seq data for human and mouse, we found that using the irreproducible discovery rate as a quality-control criterion resulted in many experiments being unnecessarily discarded. By contrast, the number of motif occurrences in ChIP-seq peak regions provides a highly effective criterion, which is reliable even if supported by only one experimental replicate. In total, we obtained 2,058 motifs from 1,089 experiments for 354 human TFs and 163 motifs from 101 experiments for 34 mouse TFs. Among these motifs, 487 have not previously been reported. Mapping the canonical motifs to the human genome reveals a high TFBS density ±2 kb around transcription start sites (TSSs) with a peak at −50 bp. On average, a promoter contains 5.7 TFBSs. However, 70% of TFBSs are in introns (41%) and intergenic regions (29%), whereas only 12% are in promoters (−1 kb to +100 bp from TSSs). Notably, some TFs (e.g., CTCF, JUN, JUNB, and NFE2) have motifs enriched in intergenic regions, including enhancers. We inferred 142 cobinding TF pairs and 186 (including 115 completely) tethered binding TF pairs, indicating frequent interactions between TFs and a higher frequency of tethered binding than cobinding. This study provides a large number of previously undocumented motifs and insights into the biological and genomic features of TFBSs.


2018 ◽  
Author(s):  
Peter Ulz ◽  
Samantha Perakis ◽  
Qing Zhou ◽  
Tina Moser ◽  
Jelena Belic ◽  
...  

AbstractDeregulation of transcription factors (TFs) is an important driver of tumorigenesis. We developed and validated a minimally invasive method for assessing TF activity based on cell-free DNA sequencing and nucleosome footprint analysis. We analyzed whole genome sequencing data for >1,000 cell-free DNA samples from cancer patients and healthy controls using a newly developed bioinformatics pipeline that infers accessibility of TF binding sites from cell-free DNA fragmentation patterns. We observed patient-specific as well as tumor-specific patterns, including accurate prediction of tumor subtypes in prostate cancer, with important clinical implications for the management of patients. Furthermore, we show that cell-free DNA TF profiling is capable of early detection of colorectal carcinomas. Our approach for mapping tumor-specific transcription factor bindingin vivobased on blood samples makes a key part of the noncoding genome amenable to clinical analysis.


2020 ◽  
Author(s):  
Liliang Yang ◽  
Kaizhen Wang ◽  
Wenjing Guo ◽  
Xian Chen ◽  
Qinglong Guo ◽  
...  

Abstract Background:RNA polymerase II subunit K (POLR2K) belongs to one of the multiple subunits of RNA polymerase II (Pol II), whose biological function is to synthesize mRNA. Aberrant POLR2K expression is related to carcinogenesis. However, POLR2K’s underlying role in bladder cancer has not been explored. In the current study, we intend to analyze the function of POLR2K and its regulatory network within bladder cancer.Methods: Public sequencing data was obtained from GEO and TCGA to investigate POLR2K expression and regulatory network within bladder cancer (BLCA) by using GEPIA and Oncomine as well as cBioPortal online tool. LinkedOmics was employed to identify genes displaying significantly differential expression patterns and to perform GO and KEGG analyses. After differential genes was assigned and ranked, GSEA analyses was performed to obtain target networks for transcription factors, miRNAs, and kinases that could regulate POLR2K–associated gene network. Subsequent functional webwork analyses were used to identify cancer-relevant pathways Moreover, POLR2K gene is verified, by ChIP-seq in MCF-7 cell line , with transcription factor binding evidence in the ENCODE Transcription Factor Binding Site Profiles dataset.Conclusions: The current study implies that POLR2K gene is overexpressed and often amplified in BLCA, providing the first evidence that POLR2K deregulation, in particular increased transcription, may promote BLCA. These findings uncover a unique expression patterns of POLR2K and its potential regulatory networks in BLCA, contributing greatly to study of the role of POLR2K in cancer development.


BMC Biology ◽  
2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Lianggang Huang ◽  
Xuejie Li ◽  
Liangbo Dong ◽  
Bin Wang ◽  
Li Pan

Abstract Background The identification of open chromatin regions and transcription factor binding sites (TFBs) is an important step in understanding the regulation of gene expression in diverse species. ATAC-seq is a technique used for such purpose by providing high-resolution measurements of chromatin accessibility revealed through integration of Tn5 transposase. However, the existence of cell walls in filamentous fungi and associated difficulty in purifying nuclei have precluded the routine application of this technique, leading to a lack of experimentally determined and computationally inferred data on the identity of genome-wide cis-regulatory elements (CREs) and TFBs. In this study, we constructed an ATAC-seq platform suitable for filamentous fungi and generated ATAC-seq libraries of Aspergillus niger and Aspergillus oryzae grown under a variety of conditions. Results We applied the ATAC-seq assay for filamentous fungi to delineate the syntenic orthologue and differentially changed chromatin accessibility regions among different Aspergillus species, during different culture conditions, and among specific TF-deleted strains. The syntenic orthologues of accessible regions were responsible for the conservative functions across Aspergillus species, while regions differentially changed between culture conditions and TFs mutants drove differential gene expression programs. Importantly, we suggest criteria to determine TFBs through the analysis of unbalanced cleavage of distinct TF-bound DNA strands by Tn5 transposase. Based on this criterion, we constructed data libraries of the in vivo genomic footprint of A. niger under distinct conditions, and generated a database of novel transcription factor binding motifs through comparison of footprints in TF-deleted strains. Furthermore, we validated the novel TFBs in vivo through an artificial synthetic minimal promoter system. Conclusions We characterized the chromatin accessibility regions of filamentous fungi species, and identified a complete TFBs map by ATAC-seq, which provides valuable data for future analyses of transcriptional regulation in filamentous fungi.


2011 ◽  
Vol 12 (4) ◽  
pp. R34 ◽  
Author(s):  
Xiao-Yong Li ◽  
Sean Thomas ◽  
Peter J Sabo ◽  
Michael B Eisen ◽  
John A Stamatoyannopoulos ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document