Crunch: Integrated processing and modeling of ChIP-seq data in terms of regulatory motifs
Although it has become routine for experimental groups to apply ChIP-seq technology to quantitatively characterize the genome-wide binding of transcription factors (TFs), computational analysis procedures remain far from standardized, making it difficult to meaningfully compare ChIP-seq results across experiments. In addition, while genome-wide binding patterns must ultimately be determined by local constellations of binding sites in the DNA, current analysis is typically limited to a standard search for enriched motifs in ChIP-seq peaks.Here we present Crunch, a completely automated computational method that performs all ChIP-seq analysis from quality control through read mapping and peak detecting, and integrates comprehensive modeling of the ChIP signal in terms of known and novel binding motifs, quantifying the contribution of each motif, and annotating which combinations of motifs explain each binding peak.Applying Crunch to 128 ChIP-seq datasets from the ENCODE project we find that TFs naturally separate into ‘solitary TFs’, for which a single motif explains the ChIP-peaks, and ‘co-binding TFs’ for which multiple motifs co-occur within peaks. Moreover, for most datasets the motifs that Crunch identifiedde novooutperform known motifs and both the set of co-binding motifs and the top motif of solitary TFs are consistent across experiments and cell lines. Crunch is implemented as a web server (crunch.unibas.ch), enabling standardized analysis of any collection of ChIP-seq datasets by simply uploading raw sequencing data. Results are provided both in a graphical interface and as downloadable files.