Sensitive and robust assessment of ChIP-seq read distribution using a strand-shift profile
AbstractChromatin immunoprecipitation followed by sequencing (ChIP-seq) can detect read-enriched DNA loci for point-source (e.g., transcription factor binding) and broad-source factors (e.g., various histone modifications). Although numerous quality metrics for ChIP-seq data have been developed, the ‘peaks’ thus obtained are still difficult to assess with respect to signal-to-noise ratio (S/N) and the percentage of false positives.We developed a quality-assessment tool for ChIP-seq data, SSP (strand-shift profile), that quantifies S/N and peak reliability without peak calling. We validated SSP in-depth using ≥ 1,000 publicly available ChIP-seq datasets along with virtual data to demonstrate that SSP is quantifiable and sensitive to different S/Ns for both pointand broad-source factors. Moreover, SSP is consistent among cell types and with respect to variance of sequencing depth, and identifies low-quality samples that cannot be identified by quality metrics currently available. Finally, we show that “hidden-duplicate reads” cause aberrantly high S/Ns, and SSP provides an additional metric to avoid them, which can also contribute to estimation of peak mode (pointor broad-source) of samples.Availabilityhttps://github.com/rnakato/SSP"