Estimating Motifs Under Order Restrictions

Author(s):  
Erik W van Zwet ◽  
Katherina J Kechris ◽  
Peter J Bickel ◽  
Michael B. Eisen

Transcription factors and many other DNA-binding proteins recognize more than one specific sequence. Among sequences recognized by a given DNA-binding protein, different positions exhibit varying degrees of conservation. The reason is that base pairs that are more extensively contacted by the protein tend to be more conserved. This observation can be used in the discovery of transcription factor binding sites. Here we present a rigorous means to accomplish this. In particular, we constrain the order of the information (entropy) in the columns of the position specific weight matrix (PWM) which characterizes the motif being sought. We then show how to compute the maximum likelihood estimate of a PWM under such order restrictions. This computation is easily integrated with the EM algorithm or the Gibbs sampler to enhance performance in the search for motifs in unaligned sequences. We demonstrate our method on a well-known data set of binding sites of the transcription factor Crp in E. coli.

2021 ◽  
Vol 49 (7) ◽  
pp. 3856-3875
Author(s):  
Marina Kulik ◽  
Melissa Bothe ◽  
Gözde Kibar ◽  
Alisa Fuchs ◽  
Stefanie Schöne ◽  
...  

Abstract The glucocorticoid (GR) and androgen (AR) receptors execute unique functions in vivo, yet have nearly identical DNA binding specificities. To identify mechanisms that facilitate functional diversification among these transcription factor paralogs, we studied them in an equivalent cellular context. Analysis of chromatin and sequence suggest that divergent binding, and corresponding gene regulation, are driven by different abilities of AR and GR to interact with relatively inaccessible chromatin. Divergent genomic binding patterns can also be the result of subtle differences in DNA binding preference between AR and GR. Furthermore, the sequence composition of large regions (>10 kb) surrounding selectively occupied binding sites differs significantly, indicating a role for the sequence environment in guiding AR and GR to distinct binding sites. The comparison of binding sites that are shared shows that the specificity paradox can also be resolved by differences in the events that occur downstream of receptor binding. Specifically, shared binding sites display receptor-specific enhancer activity, cofactor recruitment and changes in histone modifications. Genomic deletion of shared binding sites demonstrates their contribution to directing receptor-specific gene regulation. Together, these data suggest that differences in genomic occupancy as well as divergence in the events that occur downstream of receptor binding direct functional diversification among transcription factor paralogs.


PLoS ONE ◽  
2009 ◽  
Vol 4 (10) ◽  
pp. e7526 ◽  
Author(s):  
Alfredo Mendoza-Vargas ◽  
Leticia Olvera ◽  
Maricela Olvera ◽  
Ricardo Grande ◽  
Leticia Vega-Alvarado ◽  
...  

1994 ◽  
Vol 14 (5) ◽  
pp. 3292-3309
Author(s):  
M Lopez ◽  
P Oettgen ◽  
Y Akbarali ◽  
U Dendorfer ◽  
T A Libermann

The ets gene family encodes a group of proteins which function as transcription factors under physiological conditions and, if aberrantly expressed, can cause cellular transformation. We have recently identified two regulatory elements in the murine immunoglobulin heavy-chain (IgH) enhancer, pi and microB, which exhibit striking similarity to binding sites for ets-related proteins. To identify ets-related transcriptional regulators expressed in pre-B lymphocytes that may interact with either the pi or the microB site, we have used a PCR approach with degenerate oligonucleotides encoding conserved sequences in all members of the ets family. We have cloned the gene for a new ets-related transcription factor, ERP (ets-related protein), from the murine pre-B cell line BASC 6C2 and from mouse lung tissue. The ERP protein contains a region of high homology with the ETS DNA-binding domain common to all members of the ets transcription factor/oncoprotein family. Three additional smaller regions show homology to the ELK-1 and SAP-1 genes, a subgroup of the ets gene family that interacts with the serum response factor. Full-length ERP expresses only negligible DNA-binding activity by itself. Removal of the carboxy terminus enables ERP to interact with a variety of ets-binding sites including the E74 site, the IgH enhancer pi site, and the lck promoter ets site, suggesting a carboxy-terminal negative regulatory domain. At least three ERP-related transcripts are expressed in a variety of tissues. However, within the B-cell lineage, ERP is highly expressed primarily at early stages of B-lymphocyte development, and expression declines drastically upon B-cell maturation, correlating with the enhancer activity of the IgH pi site. These data suggest that ERP might play a role in B-cell development and in IgH gene regulation.


2018 ◽  
Vol 39 (3) ◽  
Author(s):  
Kyle T. Helzer ◽  
Mary Szatkowski Ozers ◽  
Mark B. Meyer ◽  
Nancy A. Benkusky ◽  
Natalia Solodin ◽  
...  

ABSTRACT Posttranslational modifications are key regulators of protein function, providing cues that can alter protein interactions and cellular location. Phosphorylation of estrogen receptor α (ER) at serine 118 (pS118-ER) occurs in response to multiple stimuli and is involved in modulating ER-dependent gene transcription. While the cistrome of ER is well established, surprisingly little is understood about how phosphorylation impacts ER-DNA binding activity. To define the pS118-ER cistrome, chromatin immunoprecipitation sequencing was performed on pS118-ER and ER in MCF-7 cells treated with estrogen. pS118-ER occupied a subset of ER binding sites which were associated with an active enhancer mark, acetylated H3K27. Unlike ER, pS118-ER sites were enriched in GRHL2 DNA binding motifs, and estrogen treatment increased GRHL2 recruitment to sites occupied by pS118-ER. Additionally, pS118-ER occupancy sites showed greater enrichment of full-length estrogen response elements relative to ER sites. In an in vitro DNA binding array of genomic binding sites, pS118-ER was more commonly associated with direct DNA binding events than indirect binding events. These results indicate that phosphorylation of ER at serine 118 promotes direct DNA binding at active enhancers and is a distinguishing mark for associated transcription factor complexes on chromatin.


Blood ◽  
2008 ◽  
Vol 112 (11) ◽  
pp. 283-283
Author(s):  
Andre M. Pilon ◽  
Elliott H. Margulies ◽  
Hatice Ozel Abaan ◽  
Amy Werner- Allen ◽  
Tim M. Townes ◽  
...  

Abstract Erythroid Kruppel-Like Factor (EKLF; KLF1) is the founding member of the Kruppel family of transcription factors, with 3 C2H2 zinc-fingers that bind a 9-base consensus sequence (NCNCNCCCN). The functions of EKLF, first identified as an activator of the beta-globin locus, include gene activation and chromatin remodeling. Our knowledge of genes regulated by EKLF is limited, as EKLF-deficient mice die by embryonic day 15 (E15), due to a severe anemia. Analysis of E13.5 wild type and EKLF-deficient fetal liver (FL) erythroid cells revealed that EKLF-deficient cells fail to complete terminal erythroid maturation (Pilon et al. submitted). Coupling chromatin immunoprecipitation and ultra high-throughput massively parallel sequencing (ChIP-seq) is increasingly being used for mapping protein-DNA interactions in vivo on a genome-wide scale. ChIP-seq allows a simultaneous analysis of transcription factor binding in every region of the genome, defining an “interactome”. To elucidate direct EKLF-dependent effects on erythropoiesis, we have combined ChIP-seq with expression array (“transcriptome”) analyses. We feel that integration of ChIP-seq and microarray data can provide us detailed knowledge of the role of EKLF in erythropoiesis. Chromatin was isolated from E13.5 FL cells of mice whose endogenous EKLF gene was replaced with a fully functional HA-tagged EKLF gene. ChIP was performed using a highly specific high affinity anti-HA antibody. A library of EKLF-bound FL chromatin enriched by anti-HA IP was created and subjected to fluorescent in situ sequencing on a Solexa 1G platform, providing 36-base signatures that were mapped to unique sites in the mouse genome, defining the EKLF “interactome.” The frequency with which a given signature appears provides a measurable peak of enrichment. We performed three biological/technical replicates and analyzed each data set individually as well as the combined data. To validate ChIP-seq results, we examined the locus of a known EKLF target gene, a-hemoglobin stabilizing protein (AHSP). Peaks corresponded to previously identified DNase hypersensitive sites, regions of histone hyperacetylation, and sites of promoter-occupancy determined by ChIP-PCR. A genome wide analysis, focusing on the regions with the highest EKLF occupancy revealed a set of 531 locations where high levels EKLF binding occurs. Of these sites, 119 (22%) are located 10 kb or more from the nearest gene and are classified as intergenic EKLF binding sites. Another 78 sites (14.6%) are within 10 kb of an annotated RefSeq gene. A plurality of the binding sites, 222 (42%), are within RefSeq coordinates and are classified as intragenic EKLF binding sites. Microarray profiling of mRNA from sorted, matched populations of dE13.5 WT and EKLF-deficient FL erythroid progenitor cells showed dysregulation of >3000 genes (p<0.05). Ingenuity Pathways Analysis (IPA) of the >3000 dysregulated mRNAs indicated significant alteration of a cell cycle-control network, centered about the transcription factor, E2f2. We confirmed significantly decreased E2f2 mRNA and protein levels by real-time PCR and Western blot, respectively; demonstrated that EKLF-deficient FL cells accumulate in G0/G1 by cell cycle analysis; and verified EKLF-binding to motifs within the E2f2 promoter by ChIP-PCR and analysis of the ChIP Seq data. We hypothesized that only a subset of the 3000 dysregulated genes would be direct EKLF targets. We limited the ChIP-seq library to display the top 5% most frequently represented fragments across the genome, and applied this criterion to the network of dysregulated mRNAs in the IPA cell cycle network. ChIP-seq identified peaks of EKLF association with 60% of the loci in this pathway. However, consistent with the role of EKLF as a transcriptional activator, 95% of the occupied genomic loci corresponded to mRNAs whose expression in EKLF-deficient FL cells was significantly decreased (p<0.05). The majority (59%) of these EKLF-bound sites were located at intragenic sites (i.e., introns), while a minority (15% and 26%) were found adjacent to the genes or in intergenic regions. We have shown that both the AHSP and E2f2 loci require EKLF to cause the locus to become activated and sensitive to DNase I digestion in erythroid cells. Based on the increased frequency of intragenic EKLF-binding sites, particularly in genes of the cell cycle network, we propose that the occupancy of intragenic sites by EKLF may facilitate chromatin modification.


2016 ◽  
Vol 212 (6) ◽  
pp. 633-646 ◽  
Author(s):  
Carlo Randise-Hinchliff ◽  
Robert Coukos ◽  
Varun Sood ◽  
Michael Chas Sumner ◽  
Stefan Zdraljevic ◽  
...  

In budding yeast, targeting of active genes to the nuclear pore complex (NPC) and interchromosomal clustering is mediated by transcription factor (TF) binding sites in the gene promoters. For example, the binding sites for the TFs Put3, Ste12, and Gcn4 are necessary and sufficient to promote positioning at the nuclear periphery and interchromosomal clustering. However, in all three cases, gene positioning and interchromosomal clustering are regulated. Under uninducing conditions, local recruitment of the Rpd3(L) histone deacetylase by transcriptional repressors blocks Put3 DNA binding. This is a general function of yeast repressors: 16 of 21 repressors blocked Put3-mediated subnuclear positioning; 11 of these required Rpd3. In contrast, Ste12-mediated gene positioning is regulated independently of DNA binding by mitogen-activated protein kinase phosphorylation of the Dig2 inhibitor, and Gcn4-dependent targeting is up-regulated by increasing Gcn4 protein levels. These different regulatory strategies provide either qualitative switch-like control or quantitative control of gene positioning over different time scales.


1997 ◽  
Vol 17 (12) ◽  
pp. 6994-7007 ◽  
Author(s):  
Y Tao ◽  
R F Kassatly ◽  
W D Cress ◽  
J M Horowitz

The product of the retinoblastoma (Rb) susceptibility gene, Rb-1, regulates the activity of a wide variety of transcription factors, such as E2F, in a cell cycle-dependent fashion. E2F is a heterodimeric transcription factor composed of two subunits each encoded by one of two related gene families, denoted E2F and DP. Five E2F genes, E2F-1 through E2F-5, and two DP genes, DP-1 and DP-2, have been isolated from mammals, and heterodimeric complexes of these proteins are expressed in most, if not all, vertebrate cells. It is not yet clear whether E2F/DP complexes regulate overlapping and/or specific cellular genes. Moreover, little is known about whether Rb regulates all or a subset of E2F-dependent genes. Using recombinant E2F, DP, and Rb proteins prepared in baculovirus-infected cells and a repetitive immunoprecipitation-PCR procedure (CASTing), we have identified consensus DNA-binding sites for E2F-1/DP-1, E2F-1/DP-2, E2F-4/DP-1, and E2F-4/DP-2 complexes as well as an Rb/E2F-1/DP-1 trimeric complex. Our data indicate that (i) E2F, DP, and Rb proteins each influence the selection of E2F-binding sites; (ii) E2F sites differ with respect to their intrinsic DNA-bending properties; (iii) E2F/DP complexes induce distinct degrees of DNA bending; and (iv) complex-specific E2F sites selected in vitro function distinctly as regulators of cell cycle-dependent transcription in vivo. These data indicate that the specific sequence of an E2F site may determine its role in transcriptional regulation and suggest that Rb/E2F complexes may regulate subsets of E2F-dependent cellular genes.


PLoS Genetics ◽  
2020 ◽  
Vol 16 (11) ◽  
pp. e1009189
Author(s):  
Alejandro Martin-Trujillo ◽  
Nihir Patel ◽  
Felix Richter ◽  
Bharati Jadhav ◽  
Paras Garg ◽  
...  

Although DNA methylation is the best characterized epigenetic mark, the mechanism by which it is targeted to specific regions in the genome remains unclear. Recent studies have revealed that local DNA methylation profiles might be dictated by cis-regulatory DNA sequences that mainly operate via DNA-binding factors. Consistent with this finding, we have recently shown that disruption of CTCF-binding sites by rare single nucleotide variants (SNVs) can underlie cis-linked DNA methylation changes in patients with congenital anomalies. These data raise the hypothesis that rare genetic variation at transcription factor binding sites (TFBSs) might contribute to local DNA methylation patterning. In this work, by combining blood genome-wide DNA methylation profiles, whole genome sequencing-derived SNVs from 247 unrelated individuals along with 133 predicted TFBS motifs derived from ENCODE ChIP-Seq data, we observed an association between the disruption of binding sites for multiple TFs by rare SNVs and extreme DNA methylation values at both local and, to a lesser extent, distant CpGs. While the majority of these changes affected only single CpGs, 24% were associated with multiple outlier CpGs within ±1kb of the disrupted TFBS. Interestingly, disruption of functionally constrained sites within TF motifs lead to larger DNA methylation changes at nearby CpG sites. Altogether, these findings suggest that rare SNVs at TFBS negatively influence TF-DNA binding, which can lead to an altered local DNA methylation profile. Furthermore, subsequent integration of DNA methylation and RNA-Seq profiles from cardiac tissues enabled us to observe an association between rare SNV-directed DNA methylation and outlier expression of nearby genes. In conclusion, our findings not only provide insights into the effect of rare genetic variation at TFBS on shaping local DNA methylation and its consequences on genome regulation, but also provide a rationale to incorporate DNA methylation data to interpret the functional role of rare variants.


eLife ◽  
2015 ◽  
Vol 4 ◽  
Author(s):  
Dave W Anderson ◽  
Alesia N McKeown ◽  
Joseph W Thornton

Complexes of specifically interacting molecules, such as transcription factor proteins (TFs) and the DNA response elements (REs) they recognize, control most biological processes, but little is known concerning the functional and evolutionary effects of epistatic interactions across molecular interfaces. We experimentally characterized all combinations of genotypes in the joint protein-DNA sequence space defined by an historical transition in TF-RE specificity that occurred some 500 million years ago in the DNA-binding domain of an ancient steroid hormone receptor. We found that rampant epistasis within and between the two molecules was essential to specific TF-RE recognition and to the evolution of a novel TF-RE complex with unique derived specificity. Permissive and restrictive epistatic mutations across the TF-RE interface opened and closed potential evolutionary paths accessible by the other, making the evolution of each molecule contingent on its partner's history and allowing a molecular complex with novel specificity to evolve.


Sign in / Sign up

Export Citation Format

Share Document