MAGIC: A tool for predicting transcription factors and cofactors binding sites in gene sets using ENCODE data
ABSTRACTTranscriptomic profiling is an immensly powerful hypothesis generating tool. Whether one is comparing an experimental versus control condition or collecting transcriptomes from cohorts of disease tissue, it is often necessary to determine which transcription factors (TFs) and cofactors drive programs of gene expression in the datasets. Most available tools rely on searching for TF binding motifs near promoters of genes in a gene set. This approach can work well for TFs with extended recognition elements but is less useful for shorter elements and does not work at all for cofactors. The Encyclopedia Of DNA Elements (ENCODE) archives ChIPseq tracks of 169 TFs and cofactors assayed in 91 cells lines. The algorithm presented herein, Multiple Aligned Genomic Integration of ChIP (MAGIC), uses ENCODE ChIPseq data to look for statistical enrichment of TFs and cofactors in gene bodies and flanking regions in gene sets. When compared to a commonly used web resource o-Possum, MAGIC was able to more accurately predict TFs and cofactors that drive gene changes in 3 settings: 1) A cell line expressing or lacking REST, 2) Breast tumors divided along PAM50 designations and 3) Whole brain samples from WT mice or mice lacking CTCF in a particular neuronal subtype. In summary, MAGIC is a standalone application that runs on OSX machines and has a simple interface that produces meaningful predictions of which TFs and cofactors are enriched in a gene set.