scholarly journals GimmeMotifs: an analysis framework for transcription factor motif analysis

2018 ◽  
Author(s):  
Niklas Bruse ◽  
Simon J. van Heeringen

AbstractBackgroundTranscription factors (TFs) bind to specific DNA sequences, TF motifs, in cis-regulatory sequences and control the expression of the diverse transcriptional programs encoded in the genome. The concerted action of TFs within the chromatin context enables precise temporal and spatial expression patterns. To understand how TFs control gene expression it is essential to model TF binding. TF motif information can help to interpret the exact role of individual regulatory elements, for instance to predict the functional impact of non-coding variants.FindingsHere we present GimmeMotifs, a comprehensive computational framework for TF motif analysis. Compared to the previously published version, this release adds a whole range of new functionality and analysis methods. It now includes tools for de novo motif discovery, motif scanning and sequence analysis, motif clustering, calculation of performance metrics and visualization. Included with GimmeMotifs is a non-redundant database of clustered motifs. Compared to other motif databases, this collection of motifs shows competitive performance in discriminating bound from unbound sequences. Using our de novo motif discovery pipeline we find large differences in performance between de novo motif finders on ChIP-seq data. Using an ensemble method such as implemented in GimmeMotifs will generally result in improved motif identification compared to a single motif finder. Finally, we demonstrate maelstrom, a new ensemble method that enables comparative analysis of TF motifs between multiple high-throughput sequencing experiments, such as ChIP-seq or ATAC-seq. Using a collection of ~200 H3K27ac ChIP-seq data sets we identify TFs that play a role in hematopoietic differentiation and lineage commitment.ConclusionGimmeMotifs is a fully-featured and flexible framework for TF motif analysis. It contains both command-line tools as well as a Python API and is freely available at: https://github.com/vanheeringen-lab/gimmemotifs.

Author(s):  
Najla Ksouri ◽  
Jaime A. Castro-Mondragón ◽  
Francesc Montardit-Tardà ◽  
Jacques van Helden ◽  
Bruno Contreras-Moreira ◽  
...  

AbstractIdentification of functional regulatory elements encoded in plant genomes is a fundamental need to understand gene regulation. While much attention has been given to model species as Arabidopsis thaliana, little is known about regulatory motifs in other plant genera. Here, we describe an accurate bottom-up approach using the online workbench RSAT::Plants for a versatile ab-initio motif discovery taking Prunus persica as a model. These predictions rely on the construction of a co-expression network to generate modules with similar expression trends and assess the effect of increasing upstream region length on the sensitivity of motif discovery. Applying two discovery algorithms, 18 out of 45 modules were found to be enriched in motifs typical of well-known transcription factor families (bHLH, bZip, BZR, CAMTA, DOF, E2FE, AP2-ERF, Myb-like, NAC, TCP, WRKY) and a novel motif. Our results indicate that small number of input sequences and short promoter length are preferential to minimize the amount of uninformative signals in peach. The spatial distribution of TF binding sites revealed an unbalanced distribution where motifs tend to lie around the transcriptional start site region. The reliability of this approach was also benchmarked in Arabidopsis thaliana, where it recovered the expected motifs from promoters of genes containing ChIPseq peaks. Overall, this paper presents a glimpse of the peach regulatory components at genome scale and provides a general protocol that can be applied to many other species. Additionally, a RSAT Docker container was released to facilitate similar analyses on other species or to reproduce our results.One sentence summaryMotifs prediction depends on the promoter size. A proximal promoter region defined as an interval of -500 bp to +200 bp seems to be the adequate stretch to predict de novo regulatory motifs in peach


Blood ◽  
2012 ◽  
Vol 120 (21) ◽  
pp. 1277-1277
Author(s):  
Hongfang Wang ◽  
Chongzhi Zang ◽  
Len Taing ◽  
Hoifung Wong ◽  
Yumi Yashiro-Ohtani ◽  
...  

Abstract Abstract 1277 NOTCH1 regulates gene expression by forming transcription activation complexes with the DNA-binding factor RBPJ and gain-of-function NOTCH1 mutations are common in human and murine T lymphoblastic leukemia/lymphoma (T-LL). Via ChIP-seq studies of T-LL cells with constitutive Notch activation, we previously showed that NOTCH1/RBPJ binding sites in T-LL genomes are highly enriched for motifs corresponding to Ets factors and Runx factors. In this study, we determined the relationship of NOTCH1, RBPJ, ETS1, GABPA and RUNX1 binding sites in human T-LL cells by performing ChIP-Seq for each of these factors, as well as the chromatin marks H3K4me1, H3K4me3, and H3K27me3, and aligning the resulting sequences to human genome reference hg19 using programs available through Cistrome. Peak calling was performed with MACS2, and motif analysis was performed using SeqPos, which relies on JASPAR, TRANSFAC, Protein Binding Microarray (PBM), Yeast-1-hybrid (y1h), and human protein-DNA interaction (hPDI) databases to find known motifs and can also perform de novo motif discovery. Our analysis showed even more pervasive overlap of NOTCH1/RBPJ binding with ETS1/GABPA and RUNX1 factor binding than was predicted by motif analysis, in part due to binding of Ets factors and RUNX1 to non-canonical sequences. Heat-map analysis with K-means clustering on NOTCH1 binding regions identified three major classes of RBPJ/NOTCH1: class 1, characterized by high NOTCH/RBPJ signals, binding of the cofactors ZNF143, ETS1 and GABPA, high H3K4me3 signals, localization to promoters, and binding motifs for ZNF143; class 2, characterized by low NOTCH/RBPJ signals, binding of the cofactors ETS1, GABPA and RUNX1, high H3K4me3 signals, and Ets factor and CREB binding motifs; and class 3, characterized by high NOTCH/RBPJ signals, binding of RUNX1 and ETS1 cofactors, high H3K4me1 signals, intergenic localization (consistent with enhancers), and motifs for RUNX factors, ETS factors, and RBPJ. Of note, the nearest binding sites to the most responsive NOTCH1 target genes (defined as >2 fold stimulation when NOTCH1 was activated following release of gamma-secretase inhibitor (GSI) blockade by drug washout) were preferentially associated with Class 3 sites. Furthermore, shRNA knockdown of Ets factors and RUNX1 in T-LL cell lines induced apoptosis and reduced cell proliferation, implicating these factors in maintenance of T-LL growth and survival. Combination of knockdown of either Ets factors or RUNX1 with GSI treatment resulted in more severe phenotype in terms of apoptosis and cell growth compared to the knockdown or GSI treatment alone. In summary, our studies represent a step forward towards genome-wide understanding of how Notch works in concerts with other transcription factors to regulate the transcriptome of T-LL cells. Disclosures: No relevant conflicts of interest to declare.


2017 ◽  
Author(s):  
Wolfgang Kopp ◽  
Roman Schulte-Sasse

AbstractTranscription factors (TFs) are important contributors to gene regulation. They specifically bind to short DNA stretches known as transcription factor binding sites (TFBSs), which are contained in regulatory regions (e.g. promoters), and thereby influence a target gene’s expression level. Computational biology has contributed substantially to understanding regulatory regions by developing numerous tools, including for discovering de novo motif. While those tools primarily focus on determining and studying TFBSs, the surrounding sequence context is often given less attention. In this paper, we attempt to fill this gap by adopting a so-called convolutional restricted Boltzmann machine (cRBM) that captures redundant features from the DNA sequences. The model uses an unsupervised learning approach to derive a rich, yet interpretable, description of the entire sequence context. We evaluated the cRBM on a range of publicly available ChIP-seq peak regions and investigated its capability to summarize heterogeneous sets of regulatory sequences in comparison with MEME-Chip, a popular motif discovery tool. In summary, our method yields a considerably more accurate description of the sequence composition than MEME-Chip, providing both a summary of strong TF motifs as well as subtle low-complexity features.


2018 ◽  
Author(s):  
Leslie A. Mitchell ◽  
Laura H. McCulloch ◽  
Sudarshan Pinglay ◽  
Henri Berger ◽  
Nazario Bosco ◽  
...  

AbstractDesign and large-scale synthesis of DNA has been applied to the functional study of viral and microbial genomes. New and expanded technology development is required to unlock the transformative potential of such bottom-up approaches to the study of larger mammalian genomes. Two major challenges include assembling and delivering long DNA sequences. Here we describe a pipeline for de novo DNA assembly and delivery that enables functional evaluation of mammalian genes on the length scale of 100 kb. The DNA assembly step is supported by an integrated robotic workcell. We assembled the 101 kb human HPRT1 gene in yeast, delivered it to mouse embryonic stem cells, and showed expression of the human protein from its full-length gene. This pipeline provides a framework for producing systematic, designer variants of any mammalian gene locus for functional evaluation in cells.Significance StatementMammalian genomes consist of a tiny proportion of relatively well-characterized coding regions and vast swaths of poorly characterized “dark matter” containing critical but much less well-defined regulatory sequences. Given the dominant role of noncoding DNA in common human diseases and traits, the interconnectivity of regulatory elements, and the importance of genomic context, de novo design, assembly, and delivery can enable large-scale manipulation of these elements on a locus scale. Here we outline a pipeline for de novo assembly, delivery and expression of mammalian genes replete with native regulatory sequences. We expect this pipeline will be useful for dissecting the function of non-coding sequence variation in mammalian genomes.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ruifeng Cui ◽  
Xiaoge Wang ◽  
Waqar Afzal Malik ◽  
Xuke Lu ◽  
Xiugui Chen ◽  
...  

Abstract Background The Raffinose synthetase (RAFS) genes superfamily is critical for the synthesis of raffinose, which accumulates in plant leaves under abiotic stress. However, it remains unclear whether RAFS contributes to resistance to abiotic stress in plants, specifically in the Gossypium species. Results In this study, we identified 74 RAFS genes from G. hirsutum, G. barbadense, G. arboreum and G. raimondii by using a series of bioinformatic methods. Phylogenetic analysis showed that the RAFS gene family in the four Gossypium species could be divided into four major clades; the relatively uniform distribution of the gene number in each species ranged from 12 to 25 based on species ploidy, most likely resulting from an ancient whole-genome polyploidization. Gene motif analysis showed that the RAFS gene structure was relatively conservative. Promoter analysis for cis-regulatory elements showed that some RAFS genes might be regulated by gibberellins and abscisic acid, which might influence their expression levels. Moreover, we further examined the functions of RAFS under cold, heat, salt and drought stress conditions, based on the expression profile and co-expression network of RAFS genes in Gossypium species. Transcriptome analysis suggested that RAFS genes in clade III are highly expressed in organs such as seed, root, cotyledon, ovule and fiber, and under abiotic stress in particular, indicating the involvement of genes belonging to clade III in resistance to abiotic stress. Gene co-expressed network analysis showed that GhRFS2A-GhRFS6A, GhRFS6D, GhRFS7D and GhRFS8A-GhRFS11A were key genes, with high expression levels under salt, drought, cold and heat stress. Conclusion The findings may provide insights into the evolutionary relationships and expression patterns of RAFS genes in Gossypium species and a theoretical basis for the identification of stress resistance materials in cotton.


Development ◽  
1989 ◽  
Vol 107 (2) ◽  
pp. 189-200 ◽  
Author(s):  
U. Grossniklaus ◽  
H.J. Bellen ◽  
C. Wilson ◽  
W.J. Gehring

We have stained the ovaries of nearly 600 different Drosophila strains carrying single copies of a P-element enhancer detector. This transposon detects neighbouring genomic transcriptional regulatory sequences by means of a beta-galactosidase reporter gene. Numerous strains are stained in specific cells and at specific stages of oogenesis and provide useful ovarian markers for cell types that in some cases have not previously been recognized by morphological criteria. Since recent data have suggested that a substantial number of the regulatory elements detected by enhancer detection control neighbouring genes, we discuss the implications of our results concerning ovarian gene expression patterns in Drosophila. We have also identified a small number of insertion-linked recessive mutants that are sterile or lead to ovarian defects. We observe a strong correlation with specific germ line staining patterns in these strains, suggesting that certain patterns are more likely to be associated with female sterile genes than others. On the basis of our results, we suggest new strategies, which are not primarily based on the generation of mutants, to screen for and isolated female sterile genes.


2019 ◽  
Vol 70 (15) ◽  
pp. 3867-3879 ◽  
Author(s):  
Anneke Frerichs ◽  
Julia Engelhorn ◽  
Janine Altmüller ◽  
Jose Gutierrez-Marcos ◽  
Wolfgang Werr

Abstract Fluorescence-activated cell sorting (FACS) and assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) were combined to analyse the chromatin state of lateral organ founder cells (LOFCs) in the peripheral zone of the Arabidopsis apetala1-1 cauliflower-1 double mutant inflorescence meristem. On a genome-wide level, we observed a striking correlation between transposase hypersensitive sites (THSs) detected by ATAC-seq and DNase I hypersensitive sites (DHSs). The mostly expanded DHSs were often substructured into several individual THSs, which correlated with phylogenetically conserved DNA sequences or enhancer elements. Comparing chromatin accessibility with available RNA-seq data, THS change configuration was reflected by gene activation or repression and chromatin regions acquired or lost transposase accessibility in direct correlation with gene expression levels in LOFCs. This was most pronounced immediately upstream of the transcription start, where genome-wide THSs were abundant in a complementary pattern to established H3K4me3 activation or H3K27me3 repression marks. At this resolution, the combined application of FACS/ATAC-seq is widely applicable to detect chromatin changes during cell-type specification and facilitates the detection of regulatory elements in plant promoters.


2006 ◽  
Vol 26 (22) ◽  
pp. 8623-8638 ◽  
Author(s):  
Smitha P. Sripathy ◽  
Jessica Stevens ◽  
David C. Schultz

ABSTRACT KAP1/TIF1β is proposed to be a universal corepressor protein for the KRAB zinc finger protein (KRAB-zfp) superfamily of transcriptional repressors. To characterize the role of KAP1 and KAP1-interacting proteins in transcriptional repression, we investigated the regulation of stably integrated reporter transgenes by hormone-responsive KRAB and KAP1 repressor proteins. Here, we demonstrate that depletion of endogenous KAP1 levels by small interfering RNA (siRNA) significantly inhibited KRAB-mediated transcriptional repression of a chromatin template. Similarly, reduction in cellular levels of HP1α/β/γ and SETDB1 by siRNA attenuated KRAB-KAP1 repression. We also found that direct tethering of KAP1 to DNA was sufficient to repress transcription of an integrated transgene. This activity is absolutely dependent upon the interaction of KAP1 with HP1 and on an intact PHD finger and bromodomain of KAP1, suggesting that these domains function cooperatively in transcriptional corepression. The achievement of the repressed state by wild-type KAP1 involves decreased recruitment of RNA polymerase II, reduced levels of histone H3 K9 acetylation and H3K4 methylation, an increase in histone occupancy, enrichment of trimethyl histone H3K9, H3K36, and histone H4K20, and HP1 deposition at proximal regulatory sequences of the transgene. A KAP1 protein containing a mutation of the HP1 binding domain failed to induce any change in the histone modifications associated with DNA sequences of the transgene, implying that HP1-directed nuclear compartmentalization is required for transcriptional repression by the KRAB/KAP1 repression complex. The combination of these data suggests that KAP1 functions to coordinate activities that dynamically regulate changes in histone modifications and deposition of HP1 to establish a de novo microenvironment of heterochromatin, which is required for repression of gene transcription by KRAB-zfps.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Xiaomeng Zhao ◽  
Long Su ◽  
Weilin Xu ◽  
Sarah Schaack ◽  
Cheng Sun

Abstract Bumblebees (Hymenoptera: Apidae) are important pollinating insects that play pivotal roles in crop production and natural ecosystem services. Although protein-coding genes in bumblebees have been extensively annotated, regulatory sequences of the genome, such as promoters and enhancers, have been poorly annotated. To achieve a comprehensive profile of accessible chromatin regions and provide clues for all possible regulatory elements in the bumblebee genome, we performed ATAC-seq (Assay for Transposase-Accessible Chromatin with high-throughput sequencing) on Bombus terrestris samples derived from four developmental stages: egg, larva, pupa, and adult, respectively. The ATAC-seq reads were mapped to the B. terrestris reference genome, and its accessible chromatin regions were identified and characterized using bioinformatic methods. We identified 36,390 chromatin accessible regions in total, including both shared and stage-specific chromatin accessible signals. Our study will provide an important resource, not only for uncovering regulatory elements in the bumblebee genome, but also for expanding our understanding of bumblebee biology throughout development.


Sign in / Sign up

Export Citation Format

Share Document