Genomic and bioinformatics tools to understand the biology of signal transducers and activators of transcription
AbstractThe signal transducer and activator of transcription (STAT) family is activated by cytokines and conveys biochemical signals to the genome through binding to specific regulatory sequences, called IFN-γ-activated sequence (GAS) motifs. As common GAS motifs (TTCnnnGAA) contain only six conserved nucleotides, the mammalian genome harbors hundreds of thousands of copies of this sequence. However, it is not possible to predict which specific GAS motifs bind to STATs and are of functional significance. Here, we apply several layers of statistical, bioinformatics and experimental analyses to narrow down the number of GAS sites that might be of biological relevance. In particular, we determined the number of bona fide GAS motifs by utilizing publically available genome-wide STAT5 ChIP-seq data sets. Less than 10% of GAS motifs within the mouse genome are recognized by STAT5 in vivo and only a small portion of them are shared across different cell types. However, even bona fide STAT5 binding did not predict that the respective gene was under cytokine-STAT control. Therefore, additional bioinformatics, genomic and epigenetic parameters, such as patterns of histone modifications, are required to more reliably predict the behavior of cytokine-STAT regulatory networks.