scholarly journals Motif analysis in co-expression networks reveals regulatory elements in plants: The peach as a model

Author(s):  
Najla Ksouri ◽  
Jaime A. Castro-Mondragón ◽  
Francesc Montardit-Tardà ◽  
Jacques van Helden ◽  
Bruno Contreras-Moreira ◽  
...  

AbstractIdentification of functional regulatory elements encoded in plant genomes is a fundamental need to understand gene regulation. While much attention has been given to model species as Arabidopsis thaliana, little is known about regulatory motifs in other plant genera. Here, we describe an accurate bottom-up approach using the online workbench RSAT::Plants for a versatile ab-initio motif discovery taking Prunus persica as a model. These predictions rely on the construction of a co-expression network to generate modules with similar expression trends and assess the effect of increasing upstream region length on the sensitivity of motif discovery. Applying two discovery algorithms, 18 out of 45 modules were found to be enriched in motifs typical of well-known transcription factor families (bHLH, bZip, BZR, CAMTA, DOF, E2FE, AP2-ERF, Myb-like, NAC, TCP, WRKY) and a novel motif. Our results indicate that small number of input sequences and short promoter length are preferential to minimize the amount of uninformative signals in peach. The spatial distribution of TF binding sites revealed an unbalanced distribution where motifs tend to lie around the transcriptional start site region. The reliability of this approach was also benchmarked in Arabidopsis thaliana, where it recovered the expected motifs from promoters of genes containing ChIPseq peaks. Overall, this paper presents a glimpse of the peach regulatory components at genome scale and provides a general protocol that can be applied to many other species. Additionally, a RSAT Docker container was released to facilitate similar analyses on other species or to reproduce our results.One sentence summaryMotifs prediction depends on the promoter size. A proximal promoter region defined as an interval of -500 bp to +200 bp seems to be the adequate stretch to predict de novo regulatory motifs in peach

Blood ◽  
2012 ◽  
Vol 120 (21) ◽  
pp. 1277-1277
Author(s):  
Hongfang Wang ◽  
Chongzhi Zang ◽  
Len Taing ◽  
Hoifung Wong ◽  
Yumi Yashiro-Ohtani ◽  
...  

Abstract Abstract 1277 NOTCH1 regulates gene expression by forming transcription activation complexes with the DNA-binding factor RBPJ and gain-of-function NOTCH1 mutations are common in human and murine T lymphoblastic leukemia/lymphoma (T-LL). Via ChIP-seq studies of T-LL cells with constitutive Notch activation, we previously showed that NOTCH1/RBPJ binding sites in T-LL genomes are highly enriched for motifs corresponding to Ets factors and Runx factors. In this study, we determined the relationship of NOTCH1, RBPJ, ETS1, GABPA and RUNX1 binding sites in human T-LL cells by performing ChIP-Seq for each of these factors, as well as the chromatin marks H3K4me1, H3K4me3, and H3K27me3, and aligning the resulting sequences to human genome reference hg19 using programs available through Cistrome. Peak calling was performed with MACS2, and motif analysis was performed using SeqPos, which relies on JASPAR, TRANSFAC, Protein Binding Microarray (PBM), Yeast-1-hybrid (y1h), and human protein-DNA interaction (hPDI) databases to find known motifs and can also perform de novo motif discovery. Our analysis showed even more pervasive overlap of NOTCH1/RBPJ binding with ETS1/GABPA and RUNX1 factor binding than was predicted by motif analysis, in part due to binding of Ets factors and RUNX1 to non-canonical sequences. Heat-map analysis with K-means clustering on NOTCH1 binding regions identified three major classes of RBPJ/NOTCH1: class 1, characterized by high NOTCH/RBPJ signals, binding of the cofactors ZNF143, ETS1 and GABPA, high H3K4me3 signals, localization to promoters, and binding motifs for ZNF143; class 2, characterized by low NOTCH/RBPJ signals, binding of the cofactors ETS1, GABPA and RUNX1, high H3K4me3 signals, and Ets factor and CREB binding motifs; and class 3, characterized by high NOTCH/RBPJ signals, binding of RUNX1 and ETS1 cofactors, high H3K4me1 signals, intergenic localization (consistent with enhancers), and motifs for RUNX factors, ETS factors, and RBPJ. Of note, the nearest binding sites to the most responsive NOTCH1 target genes (defined as >2 fold stimulation when NOTCH1 was activated following release of gamma-secretase inhibitor (GSI) blockade by drug washout) were preferentially associated with Class 3 sites. Furthermore, shRNA knockdown of Ets factors and RUNX1 in T-LL cell lines induced apoptosis and reduced cell proliferation, implicating these factors in maintenance of T-LL growth and survival. Combination of knockdown of either Ets factors or RUNX1 with GSI treatment resulted in more severe phenotype in terms of apoptosis and cell growth compared to the knockdown or GSI treatment alone. In summary, our studies represent a step forward towards genome-wide understanding of how Notch works in concerts with other transcription factors to regulate the transcriptome of T-LL cells. Disclosures: No relevant conflicts of interest to declare.


2020 ◽  
Author(s):  
Yichao Li ◽  
Maxwell Mullin ◽  
Yingnan Zhang ◽  
Frank Drews ◽  
Lonnie Welch ◽  
...  

ABSTRACTHydroxyproline-rich glycoproteins (HRGPs) are a superfamily of plant cell wall structural proteins that function in various aspects of plant growth and development, including pollen tube growth. We have previously characterized HRGP superfamily into three family members: the hyperglycosylated arabinogalactan-proteins, the moderately glycosylated extensins, and the lightly glycosylated proline-rich proteins. However, the mechanism of pollen-specific HRGP expression remains untouched. To this end, we developed an integrative analysis pipeline combining RNA-seq gene expression and promoter sequences that identified 15 transcriptional cis-regulatory motifs responsible for pollen-specific expression of HRGP in Arabidopsis Thaliana. Specifically, we mined the public RNA-seq datasets and identified 13 pollen-specific HRGP genes. Ensemble motif discovery with various filters identified 15 conserved promoter elements between Thaliana and Lyrata. Known motif analysis revealed pollen related transcription factors of GATA12 and brassinosteroid (BR) signaling pathway regulator BZR1. Lastly, we performed a machine learning regression analysis and demonstrated that the identified 15 motifs well captured the HRGP gene expression in pollen (R=0.61). In conclusion, we performed the integrative analysis as the first-of-its-kind study to identify cis-regulatory motifs in pollen-specific HRGP genes and shed light on its transcriptional regulation in pollen.


2018 ◽  
Author(s):  
Niklas Bruse ◽  
Simon J. van Heeringen

AbstractBackgroundTranscription factors (TFs) bind to specific DNA sequences, TF motifs, in cis-regulatory sequences and control the expression of the diverse transcriptional programs encoded in the genome. The concerted action of TFs within the chromatin context enables precise temporal and spatial expression patterns. To understand how TFs control gene expression it is essential to model TF binding. TF motif information can help to interpret the exact role of individual regulatory elements, for instance to predict the functional impact of non-coding variants.FindingsHere we present GimmeMotifs, a comprehensive computational framework for TF motif analysis. Compared to the previously published version, this release adds a whole range of new functionality and analysis methods. It now includes tools for de novo motif discovery, motif scanning and sequence analysis, motif clustering, calculation of performance metrics and visualization. Included with GimmeMotifs is a non-redundant database of clustered motifs. Compared to other motif databases, this collection of motifs shows competitive performance in discriminating bound from unbound sequences. Using our de novo motif discovery pipeline we find large differences in performance between de novo motif finders on ChIP-seq data. Using an ensemble method such as implemented in GimmeMotifs will generally result in improved motif identification compared to a single motif finder. Finally, we demonstrate maelstrom, a new ensemble method that enables comparative analysis of TF motifs between multiple high-throughput sequencing experiments, such as ChIP-seq or ATAC-seq. Using a collection of ~200 H3K27ac ChIP-seq data sets we identify TFs that play a role in hematopoietic differentiation and lineage commitment.ConclusionGimmeMotifs is a fully-featured and flexible framework for TF motif analysis. It contains both command-line tools as well as a Python API and is freely available at: https://github.com/vanheeringen-lab/gimmemotifs.


Author(s):  
Charles E. Grant ◽  
Timothy L. Bailey

AbstractXSTREME is a web-based tool for performing comprehensive motif discovery and analysis in DNA, RNA or protein sequences, as well as in sequences in user-defined alphabets. It is designed for both very large and very small datasets. XSTREME is similar to the MEME-ChIP tool, but expands upon its capabilities in several ways. Like MEME-ChIP, XSTREME performs two types of de novo motif discovery, and also performs motif enrichment analysis of the input sequences using databases of known motifs. Unlike MEME-ChIP, which ranks motifs based on their enrichment in the centers of the input sequences, XSTREME uses enrichment anywhere in the sequences for this purpose. Consequently, XSTREME is more appropriate for motif-based analysis of sequences regardless of how the motifs are distributed within the sequences. XSTREME uses the MEME and STREME algorithms for motif discovery, and the recently developed SEA algorithm for motif enrichment analysis. The interactive HTML output produced by XSTREME includes highly accurate motif significance estimates, plots of the positional distribution of each motif, and histograms of the number of motif matches in each sequences. XSTREME is easy to use via its web server at https://meme-suite.org, and is fully integrated with the widely-used MEME Suite of sequence analysis tools, which can be freely downloaded at the same web site for non-commercial use.


Plants ◽  
2020 ◽  
Vol 9 (12) ◽  
pp. 1751
Author(s):  
Yichao Li ◽  
Maxwell Mullin ◽  
Yingnan Zhang ◽  
Frank Drews ◽  
Lonnie R. Welch ◽  
...  

Hydroxyproline-rich glycoproteins (HRGPs) are a superfamily of plant cell wall structural proteins that function in various aspects of plant growth and development, including pollen tube growth. We have previously characterized protein sequence signatures for three family members in the HRGP superfamily: the hyperglycosylated arabinogalactan-proteins (AGPs), the moderately glycosylated extensins (EXTs), and the lightly glycosylated proline-rich proteins (PRPs). However, the mechanism of pollen-specific HRGP gene expression remains unexplored. To this end, we developed an integrative analysis pipeline combining RNA-seq gene expression and promoter sequences to identify cis-regulatory motifs responsible for pollen-specific expression of HRGP genes in Arabidopsis thaliana. Specifically, we mined the public RNA-seq datasets and identified 13 pollen-specific HRGP genes. Ensemble motif discovery identified 15 conserved promoter elements between A.thaliana and A. lyrata. Motif scanning revealed two pollen related transcription factors: GATA12 and brassinosteroid (BR) signaling pathway regulator BZR1. Finally, we performed a regression analysis and demonstrated that the 15 motifs provided a good model of HRGP gene expression in pollen (R = 0.61). In conclusion, we performed the first integrative analysis of cis-regulatory motifs in pollen-specific HRGP genes, revealing important insights into transcriptional regulation in pollen tissue.


2015 ◽  
Vol 35 (suppl_1) ◽  
Author(s):  
Nicholas T Hogan ◽  
Casey E Romanoski ◽  
Michael T Lam ◽  
Christopher K Glass

Introduction: Sequence-specific transcription factors bind DNA regulatory elements and play a key role in establishing cellular identity. Studies comparing macrophages to B cells have revealed that small numbers of such collaborative or lineage-determining transcription factors (LDTF) establish distinct enhancers in each cell type. These factors also allow for the binding of signal dependent transcription factors. Here we present data which suggest members of the AP-1, ETS, and STAT transcription factor families serve as collaborative transcriptional regulators in human aortic endothelial cells (HAEC). Hypothesis: We hypothesize that a set of AP-1 and ETS transcription factors collaborate to establish key endothelial cell enhancers. Methods: Working in HAEC, we measured poised and active enhancers using ChIP-seq for the epigenetic histone modifications H3K4me2 and H3K27Ac, performed motif analysis, and measured transcription factor binding for candidate factors. Knockdowns of JUN, ERG, and STAT3 followed by RNA-seq were used to evaluate altered enhancer function and gene targets of candidate factors. Results: Our de novo motif analysis revealed that motifs for ETS and AP-1 transcription factors are highly enriched at HAEC enhancers. ChIP-seq experiments for JUN, JUNB, ERG, and STAT3 showed between 8,000 and 55,000 intergenic peaks for each factor. Together these peaks bind 50% of poised enhancers, with a subset co-localizing at these sites. Gene ontology analysis showed that gene targets of these enhancers are involved in endothelial-specific functions. Further, knockdown of JUN, ERG, and STAT3 resulted in a twofold or greater change in expression of hundreds of HAEC transcripts. Conclusion: The genome-wide pattern of JUN, JUNB, ERG, and STAT3 co-localization at enhancers in HAEC suggests these factors serve as key regulators that collaboratively modulate endothelial-specific gene expression. Further investigation of candidate lineage-determining transcription factors using pro-atherogenic signals could reveal regulatory mechanisms of disease-relevant endothelial transcriptional programs.


2010 ◽  
Vol 08 (02) ◽  
pp. 219-246 ◽  
Author(s):  
ARVIND RAO ◽  
DAVID J. STATES ◽  
ALFRED O. HERO ◽  
JAMES DOUGLAS ENGEL

Gene regulation in eukaryotes involves a complex interplay between the proximal promoter and distal genomic elements (such as enhancers) which work in concert to drive precise spatio-temporal gene expression. The experimental localization and characterization of gene regulatory elements is a very complex and resource-intensive process. The computational identification of regulatory regions that confer spatiotemporally specific tissue-restricted expression of a gene is thus an important challenge for computational biology. One of the most popular strategies for enhancer localization from DNA sequence is the use of conservation-based prefiltering and more recently, the use of canonical (transcription factor motifs) or de novo tissue-specific sequence motifs. However, there is an ongoing effort in the computational biology community to further improve the fidelity of enhancer predictions from sequence data by integrating other, complementary genomic modalities. In this work, we propose a framework that complements existing methodologies for prospective enhancer identification. The methods in this work are derived from two key insights: (i) that chromatin modification signatures can discriminate proximal and distally located regulatory regions and (ii) the notion of promoter-enhancer cross-talk (as assayed in 3C/5C experiments) might have implications in the search for regulatory sequences that co-operate with the promoter to yield tissue-restricted, gene-specific expression.


Endocrinology ◽  
2005 ◽  
Vol 146 (12) ◽  
pp. 5321-5331 ◽  
Author(s):  
Martina Fink ◽  
Jure Ačimovič ◽  
Tadeja Režen ◽  
Nataša Tanšek ◽  
Damjana Rozman

Lanosterol 14α-demethylase (CYP51) responds to cholesterol feedback regulation through sterol regulatory element binding proteins (SREBPs). The proximal promoter of CYP51 contains a conserved region with clustered regulatory elements: GC box, cAMP-response elements (CRE-like), and sterol regulatory element (SRE). In lipid-rich (SREBP-poor) conditions, the CYP51 mRNA drops gradually, the promoter activity is diminished, and no DNA-protein complex is observed at the CYP51-SRE1 site. The majority of cAMP-dependent transactivation is mediated through a single CRE (CYP51-CRE2). Exposure of JEG-3 cells to forskolin, a mediator of the cAMP-dependent signaling pathway, provokes an immediate early response of CYP51, which has not been described before for any cholesterogenic gene. The CYP51 mRNA increases up to 4-fold in 2 h and drops to basal level after 4 h. The inducible cAMP early repressor (ICER) is involved in attenuation of transcription. Overexpressed CRE-binding protein (CREB)/CRE modulator (CREM) transactivates the mouse/human CYP51 promoters containing CYP51-CRE2 independently of SREBPs, and ICER decreases the CREB-induced transcription. Besides the increased CYP51 mRNA, forskolin affects the de novo sterol biosynthesis in JEG-3 cells. An increased consumption of lanosterol, a substrate of CYP51, is observed together with modulation of the postlanosterol cholesterogenesis, indicating that cAMP-dependent stimuli cross-talk with cholesterol feedback regulation. CRE-2 is essential for cAMP-dependent transactivation, whereas SRE seems to be less important. Interestingly, when CREB is not limiting, the increasing amounts of SREBP-1a fail to transactivate the CYP51 promoter above the CREB-only level, suggesting that hormones might have an important role in regulating cholesterogenesis in vivo.


1997 ◽  
Vol 327 (2) ◽  
pp. 507-512 ◽  
Author(s):  
Weei-Yuarn HUANG ◽  
Jin-Jer CHEN ◽  
N.-L. SHIH ◽  
Choong-Chin LIEW

Using nuclei isolated from neonatal cardiomyocytes, we have mapped the DNase I hypersensitive sites (DHSs) residing within the 5ʹ-upstream regions of the hamster cardiac myosin heavy-chain (MyHC) gene. Two cardiac-specific DHSs within the 5 kb upstream region of the cardiac MyHC gene were identified. One of the DHSs was mapped to the -2.3 kb (β-2.3 kb) region and the other to the proximal promoter region. We further localized the β-2.3 kb site to a range of 250 bp. Multiple, conserved, muscle regulatory motifs were found within the β-2.3 kb site, consisting of three E-boxes, one AP-2 site, one CArG motif, one CT/ACCC box and one myocyte-specific enhancer factor-2 site. This cluster of regulatory elements is strikingly similar to a cluster found in the enhancer of the mouse muscle creatine kinase gene (-1256 to -1050). The specific interaction of the motifs within the β-2.3 kb site and the cardiac nuclear proteins was demonstrated using gel mobility-shift assays and footprinting analysis. In addition, transfection analysis revealed a significant increase in chloramphenicol acetyltransferase activity when the β-2.3 kb site was linked to a heterologous promoter. These results suggest that previously undefined regulatory elements of the β-MyHC gene may be associated with the β-2.3 kb site.


Sign in / Sign up

Export Citation Format

Share Document