scholarly journals Sensitive and robust assessment of ChIP-seq read distribution using a strand-shift profile

2017 ◽  
Author(s):  
Ryuichiro Nakato ◽  
Katsuhiko Shirahige

AbstractChromatin immunoprecipitation followed by sequencing (ChIP-seq) can detect read-enriched DNA loci for point-source (e.g., transcription factor binding) and broad-source factors (e.g., various histone modifications). Although numerous quality metrics for ChIP-seq data have been developed, the ‘peaks’ thus obtained are still difficult to assess with respect to signal-to-noise ratio (S/N) and the percentage of false positives.We developed a quality-assessment tool for ChIP-seq data, SSP (strand-shift profile), that quantifies S/N and peak reliability without peak calling. We validated SSP in-depth using ≥ 1,000 publicly available ChIP-seq datasets along with virtual data to demonstrate that SSP is quantifiable and sensitive to different S/Ns for both pointand broad-source factors. Moreover, SSP is consistent among cell types and with respect to variance of sequencing depth, and identifies low-quality samples that cannot be identified by quality metrics currently available. Finally, we show that “hidden-duplicate reads” cause aberrantly high S/Ns, and SSP provides an additional metric to avoid them, which can also contribute to estimation of peak mode (pointor broad-source) of samples.Availabilityhttps://github.com/rnakato/SSP"

2020 ◽  
Vol 48 (8) ◽  
pp. e43-e43 ◽  
Author(s):  
Guanjue Xiang ◽  
Cheryl A Keller ◽  
Belinda Giardine ◽  
Lin An ◽  
Qunhua Li ◽  
...  

Abstract Quantitative comparison of epigenomic data across multiple cell types or experimental conditions is a promising way to understand the biological functions of epigenetic modifications. However, differences in sequencing depth and signal-to-noise ratios in the data from different experiments can hinder our ability to identify real biological variation from raw epigenomic data. Proper normalization is required prior to data analysis to gain meaningful insights. Most existing methods for data normalization standardize signals by rescaling either background regions or peak regions, assuming that the same scale factor is applicable to both background and peak regions. While such methods adjust for differences in sequencing depths, they do not address differences in the signal-to-noise ratios across different experiments. We developed a new data normalization method, called S3norm, that normalizes the sequencing depths and signal-to-noise ratios across different data sets simultaneously by a monotonic nonlinear transformation. We show empirically that the epigenomic data normalized by our method, compared to existing methods, can better capture real biological variation, such as impact on gene expression regulation.


2019 ◽  
Vol 2 (4) ◽  
pp. e201900318 ◽  
Author(s):  
Junaid Akhtar ◽  
Piyush More ◽  
Steffen Albrecht ◽  
Federico Marini ◽  
Waldemar Kaiser ◽  
...  

Chromatin immunoprecipitation (ChIP) followed by next generation sequencing (ChIP-Seq) is a powerful technique to study transcriptional regulation. However, the requirement of millions of cells to generate results with high signal-to-noise ratio precludes it in the study of small cell populations. Here, we present a tagmentation-assisted fragmentation ChIP (TAF-ChIP) and sequencing method to generate high-quality histone profiles from low cell numbers. The data obtained from the TAF-ChIP approach are amenable to standard tools for ChIP-Seq analysis, owing to its high signal-to-noise ratio. The epigenetic profiles from TAF-ChIP approach showed high agreement with conventional ChIP-Seq datasets, thereby underlining the utility of this approach.


2020 ◽  
Author(s):  
Hayato Anzawa ◽  
Hitoshi Yamagata ◽  
Kengo Kinoshita

Abstract Background: Strand cross-correlation profiles are used for both peak calling pre-analysis and quality control (QC) in chromatin immunoprecipitation followed by sequencing (ChIP-seq) analysis. Despite its potential for robust and accurate assessments of signal-to-noise ratio (S/N) because of its peak calling independence, it remains unclear what aspects of quality such strand cross-correlation profiles actually measure. Results: We introduced a simple model to simulate the mapped read-density of ChIP-seq and then derived the theoretical maximum and minimum of cross-correlation coefficients between strands. The results suggest that the maximum coefficient of typical ChIP-seq samples is directly proportional to the number of total mapped reads and the square of the ratio of signal reads, and inversely proportional to the number of peaks and the length of read-enriched regions. Simulation analysis supported our results and evaluation using 790 ChIP-seq data obtained from the public database demonstrated high consistency between calculated cross-correlation coefficients and estimated coefficients based on the theoretical relations and peak calling results. In addition, we found that the mappability-bias-correction improved sensitivity, enabling differentiation of maximum coefficients from the noise level. Based on these insights, we proposed virtual S/N (VSN), a novel peak call-free metric for S/N assessment. We also developed PyMaSC, a tool to calculate strand cross-correlation and VSN efficiently. VSN achieved most consistent S/N estimation for various ChIP targets and sequencing read depths. Furthermore, we demonstrated that a combination of VSN and pre-existing peak calling results enable the estimation of the numbers of detectable peaks for posterior experiments and assess peak calling results. Conclusions: We present the first theoretical insights into the strand cross-correlation, and the results reveal the potential and the limitations of strand cross-correlation analysis. Our quality assessment framework using VSN provides peak call-independent QC and will help in the evaluation of peak call analysis in ChIP-seq experiments.


2019 ◽  
Author(s):  
Hayato Anzawa ◽  
Hitoshi Yamagata ◽  
Kengo Kinoshita

Abstract Background: Strand cross-correlation profiles are used for both peak calling pre-analysis and quality control in chromatin immunoprecipitation followed by sequencing (ChIP-seq) analysis. Despite its potential for robust and accurate assessments of signal-to-noise ratio (S/N) of ChIP-seq samples, it remains unclear what aspects of quality such strand cross-correlation profiles actually measure. Results: We introduced a simple model to simulate the mapped read-density of ChIP-seq and then derived the theoretical maximum and minimum of cross-correlation coefficients between strands. The results suggest that the maximum coefficient of typical ChIP-seq samples is directly proportional to the number of total mapped reads and the square of the ratio of signal reads, and inversely proportional to the number of peaks and the length of read-enriched regions. We also developed PyMaSC to efficiently generate strand cross-correlation profiles. Simulation analysis supported our results and evaluation using 790 ChIP-seq data obtained from the public database demonstrated high consistency between calculated cross-correlation coefficients and estimated coefficients based on the theoretical relations and peak calling results. In addition, we found that the mappability-bias-correction improved sensitivity, enabling differentiation of maximum coefficients from the noise level. Conclusions: We present the first theoretical insights into the strand cross-correlation and the results reveal the potential and the limitations of strand cross-correlation analysis. Our work will help in the establishment of better QC metrics using strand cross-correlation.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Hayato Anzawa ◽  
Hitoshi Yamagata ◽  
Kengo Kinoshita

Abstract Background Strand cross-correlation profiles are used for both peak calling pre-analysis and quality control (QC) in chromatin immunoprecipitation followed by sequencing (ChIP-seq) analysis. Despite its potential for robust and accurate assessments of signal-to-noise ratio (S/N) because of its peak calling independence, it remains unclear what aspects of quality such strand cross-correlation profiles actually measure. Results We introduced a simple model to simulate the mapped read-density of ChIP-seq and then derived the theoretical maximum and minimum of cross-correlation coefficients between strands. The results suggest that the maximum coefficient of typical ChIP-seq samples is directly proportional to the number of total mapped reads and the square of the ratio of signal reads, and inversely proportional to the number of peaks and the length of read-enriched regions. Simulation analysis supported our results and evaluation using 790 ChIP-seq data obtained from the public database demonstrated high consistency between calculated cross-correlation coefficients and estimated coefficients based on the theoretical relations and peak calling results. In addition, we found that the mappability-bias-correction improved sensitivity, enabling differentiation of maximum coefficients from the noise level. Based on these insights, we proposed virtual S/N (VSN), a novel peak call-free metric for S/N assessment. We also developed PyMaSC, a tool to calculate strand cross-correlation and VSN efficiently. VSN achieved most consistent S/N estimation for various ChIP targets and sequencing read depths. Furthermore, we demonstrated that a combination of VSN and pre-existing peak calling results enable the estimation of the numbers of detectable peaks for posterior experiments and assess peak calling results. Conclusions We present the first theoretical insights into the strand cross-correlation, and the results reveal the potential and the limitations of strand cross-correlation analysis. Our quality assessment framework using VSN provides peak call-independent QC and will help in the evaluation of peak call analysis in ChIP-seq experiments.


2017 ◽  
Vol 149 (7) ◽  
pp. 689-701 ◽  
Author(s):  
Luba A. Astakhova ◽  
Darya A. Nikolaeva ◽  
Tamara V. Fedotkina ◽  
Victor I. Govardovskii ◽  
Michael L. Firsov

The absolute sensitivity of vertebrate retinas is set by a background noise, called dark noise, which originates from several different cell types and is generated by different molecular mechanisms. The major share of dark noise is produced by photoreceptors and consists of two components, discrete and continuous. Discrete noise is generated by spontaneous thermal activations of visual pigment. These events are undistinguishable from real single-photon responses (SPRs) and might be considered an equivalent of the signal. Continuous noise is produced by spontaneous fluctuations of the catalytic activity of the cGMP phosphodiesterase. This masks both SPR and spontaneous SPR-like responses. Circadian rhythms affect photoreceptors, among other systems by periodically increasing intracellular cAMP levels ([cAMP]in), which increases the size and changes the shape of SPRs. Here, we show that forskolin, a tool that increases [cAMP]in, affects the magnitude and frequency spectrum of the continuous and discrete components of dark noise in photoreceptors. By changing both components of rod signaling, the signal and the noise, cAMP is able to increase the photoreceptor signal-to-noise ratio by twofold. We propose that this results in a substantial improvement of signal detection, without compromising noise rejection, at the rod bipolar cell synapse.


2021 ◽  
Author(s):  
Bofeng Liu ◽  
Fengling Chen ◽  
Wei Xie

Several chromatin immunocleavage-based (ChIC) methods using Tn5 transposase have been developed to profile histone modifications and transcription factors bindings. A recent preprint by Wang et al. raised potential concerns that these methods are prone to open chromatin bias. While the authors are appreciated for alerting the community for this issue, it has been previously described and discussed by Henikoff and colleagues in the original CUT&Tag paper. However, as described for CUT&Tag, the signal-to-noise ratio is essential for Tn5-based profiling methods and all antibody-based enrichment assays. Based on this notion, we would like to point out a major analysis issue in Wang et al. that caused a complete loss or dramatic reduction of enrichment at true targets for datasets generated by Tn5-based methods, which in turn artificially enhanced the relative enrichment of potential open chromatin bias. Such analysis issue is caused by distinct background normalizations used towards ChIP-based (chromatin immunoprecipitation) data and Tn5-based data in Wang et al. Only the normalization for Tn5-based data, but not ChIP-seq based data, yielded such effects. Distortion of such signal-to-noise ratio would consequently lead to misleading results.


2018 ◽  
Author(s):  
Junaid Akhtar ◽  
Piyush More ◽  
Steffen Albrecht ◽  
Federico Marini ◽  
Waldemar Kaiser ◽  
...  

AbstractChromatin immunoprecipitation (ChIP) followed by next generation sequencing (ChIP-Seq) is powerful technique to study transcriptional regulation. However, the requirement of millions of cells to generate results with high signal-to-noise ratio precludes it in the study of small cell populations. Here, we present a Tagmentation-Assisted Fragmentation ChIP (TAF-ChIP) and sequencing method to generate high-quality results from low cell numbers. The data obtained from the TAF-ChIP approach is amenable to standard tools for ChIP-Seq analysis, owing to its high signal-to-noise ratio. The epigenetic profiles from TAF-ChIP approach showed high agreement with conventional ChIP-Seq datasets, thereby underlining the utility of this approach.


2018 ◽  
Author(s):  
Guanjue Xiang ◽  
Cheryl A. Keller ◽  
Belinda Giardine ◽  
Lin An ◽  
Qunhua Li ◽  
...  

ABSTRACTQuantitative comparison of epigenomic data across multiple cell types or experimental conditions is a promising way to understand the biological functions of epigenetic modifications. However, differences in sequencing depth and signal-to-noise ratios in the data from different experiments can hinder our ability to identify real biological variation from raw epigenomic data. Proper normalization is required prior to data analysis to gain meaningful insights. Most existing methods for data normalization standardize signals by rescaling either background regions or peak regions, assuming that the same scale factor is applicable to both background and peak regions. While such methods adjust for differences in sequencing depths, they do not address differences in the signal-to-noise ratios across different experiments. We developed a new data normalization method, called S3norm, that normalizes the sequencing depths and signal-to-noise ratios across different data sets simultaneously by a monotonic nonlinear transformation. We show empirically that the epigenomic data normalized by our method, compared to existing methods, can better capture real biological variation, such as impact on gene expression regulation.


2018 ◽  
Vol 2 (1) ◽  
pp. e201800115 ◽  
Author(s):  
Abhishek A Singh ◽  
Karianne Schuurman ◽  
Ekaterina Nevedomskaya ◽  
Suzan Stelloo ◽  
Simon Linder ◽  
...  

Chromatin immunoprecipitation (ChIP)-seq analyses of transcription factors in clinical specimens are challenging due to the technical limitations and low quantities of starting material, often resulting in low enrichments and poor signal-to-noise ratio. Here, we present an optimized protocol for transcription factor ChIP-seq analyses in human tissue, yielding an ∼100% success rate for all transcription factors analyzed. As proof of concept and to illustrate general applicability of the approach, human tissue from the breast, prostate, and endometrial cancers were analyzed. In addition to standard formaldehyde fixation, disuccinimidyl glutarate was included in the procedure, greatly increasing data quality. To illustrate the sensitivity of the optimized protocol, we provide high-quality ChIP-seq data for three independent factors (AR, FOXA1, and H3K27ac) from a single core needle prostate cancer biopsy specimen. In summary, double-cross-linking strongly improved transcription factor ChIP-seq quality on human tumor samples, further facilitating and enhancing translational research on limited amounts of tissue.


Sign in / Sign up

Export Citation Format

Share Document