Synthetic and genomic regulatory elements reveal aspects of cis regulatory grammar in Mouse Embryonic Stem Cells
In embryonic stem cells (ESCs), a core network of transcription factors establish and maintain the gene expression program necessary to grow indefinitely in cell culture and generate all three primary germ layers. To understand how interactions between four key pluripotency transcription factors (TFs), SOX2, POU5F1 (OCT4), KLF4, and ESRRB, contribute to cis-regulation in mouse ESCs, we assayed two massively parallel reporter assay (MPRA) libraries composed of different combinations of binding sites for these TFs. One library was an exhaustive set of synthetic cis-regulatory elements and the second was a set of genomic sequences with comparable configurations of binding sites. Comparisons between the libraries allowed us to determine the regulatory grammar requirements for these binding sites in constrained synthetic contexts versus genomic sequence contexts. We found that binding site quality is a common attribute for active elements in both the synthetic and genomic contexts. For synthetic regulatory elements, the level of expression is mostly determined by the number of binding sites but is tuned by a grammar that includes position effects. Surprisingly, this grammar appears to only play a small role in setting the output levels of genomic sequences. The relative activity of genomic sequences is best explained by the predicted affinity of binding sites, regardless of identity, and optimized spacing between sites. Our findings highlight the need for detailed examinations of complex sequence space when trying to understand cis-regulatory grammar in the genome.