scholarly journals XSTREME: Comprehensive motif analysis of biological sequence datasets

Author(s):  
Charles E. Grant ◽  
Timothy L. Bailey

AbstractXSTREME is a web-based tool for performing comprehensive motif discovery and analysis in DNA, RNA or protein sequences, as well as in sequences in user-defined alphabets. It is designed for both very large and very small datasets. XSTREME is similar to the MEME-ChIP tool, but expands upon its capabilities in several ways. Like MEME-ChIP, XSTREME performs two types of de novo motif discovery, and also performs motif enrichment analysis of the input sequences using databases of known motifs. Unlike MEME-ChIP, which ranks motifs based on their enrichment in the centers of the input sequences, XSTREME uses enrichment anywhere in the sequences for this purpose. Consequently, XSTREME is more appropriate for motif-based analysis of sequences regardless of how the motifs are distributed within the sequences. XSTREME uses the MEME and STREME algorithms for motif discovery, and the recently developed SEA algorithm for motif enrichment analysis. The interactive HTML output produced by XSTREME includes highly accurate motif significance estimates, plots of the positional distribution of each motif, and histograms of the number of motif matches in each sequences. XSTREME is easy to use via its web server at https://meme-suite.org, and is fully integrated with the widely-used MEME Suite of sequence analysis tools, which can be freely downloaded at the same web site for non-commercial use.

2015 ◽  
Author(s):  
Bong-Hyun Kim ◽  
Jiali Zhuang ◽  
Jie Wang ◽  
Zhiping Weng

Summary: High-throughput sequencing technologies such as ChIP-seq have deepened our understanding in many biological processes. De novo motif search is one of the key downstream computational analysis following the ChIP-seq experiments and several algorithms have been proposed for this purpose. However, most web-based systems do not perform independent filtering or enrichment analyses to ensure the quality of the discovered motifs. Here, we developed a web server Factorbook Motif Pipeline based on an algorithm used in analyzing ENCODE consortium ChIP-seq datasets. It performs comprehensive analysis on the set of peaks detected from a ChIP-seq experiments: (i) de novo motif discovery; (ii) independent composition and bias analyses and (iii) matching to the annotated motifs. The statistical tests employed in our pipeline provide a reliable measure of confidence as to how significant are the motifs reported in the discovery step. Availability: Factorbook Motif Pipeline source code is accessible through the following URL. https://github.com/joshuabhk/factorbook-motif-pipeline


Author(s):  
Marjan Trutschl ◽  
Phillip C. S. R. Kilgore ◽  
Rona S. Scott ◽  
Christine E. Birdwell ◽  
Urška Cvek

Biological sequence motifs are short nucleotide or amino acid sequences that are biologically significant and are attractive to scientists because they are usually highly conserved and result in structural and regulatory implications. In this chapter, the authors show practical applications of these data, followed by a review of the algorithms, techniques, and tools. They address the nature of motifs and elucidate on several methods for de novo motif discovery, covering the algorithms based on Gibbs sampling, expectation maximization, Bayesian inference, covariance models, and discriminative learning. The authors present the tools and their requirements to weigh their individual benefits and challenges. Since interpretation of a large set of results can pose significant challenges, they discuss several methods for handling data that span from visualization to integration into pipelines and curated databases. Additionally, the authors show practical applications of these data with examples.


2016 ◽  
Author(s):  
Morten Muhlig Nielsen ◽  
Paula Tataru ◽  
Tobias Madsen ◽  
Asger Hobolth ◽  
Jakob Skou Pedersen

Motif analysis has long been an important method to characterize biological functionality and the current growth of sequencing-based genomics experiments further extends its potential. These diverse experiments often generate sequence lists ranked by some functional property. There is therefore a growing need for motif analysis methods that can exploit this coupled data structure and be tailored for specific biological questions. Here, we present a motif analysis tool, Regmex (REGular expression Motif EXplorer), which offers several methods to identify overrepresented motifs in a ranked list of sequences. Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact probabilities for motif observations in sequences. Motif enrichment is optionally evaluated using random walks, Brownian bridges, or modified rank based statistics. These features make Regmex well suited for a range of biological sequence analysis problems related to motif discovery. We demonstrate different usage scenarios including rank correlation of microRNA binding sites co-occurring with a U-rich motif. The method is available as an R package.


Author(s):  
Najla Ksouri ◽  
Jaime A. Castro-Mondragón ◽  
Francesc Montardit-Tardà ◽  
Jacques van Helden ◽  
Bruno Contreras-Moreira ◽  
...  

AbstractIdentification of functional regulatory elements encoded in plant genomes is a fundamental need to understand gene regulation. While much attention has been given to model species as Arabidopsis thaliana, little is known about regulatory motifs in other plant genera. Here, we describe an accurate bottom-up approach using the online workbench RSAT::Plants for a versatile ab-initio motif discovery taking Prunus persica as a model. These predictions rely on the construction of a co-expression network to generate modules with similar expression trends and assess the effect of increasing upstream region length on the sensitivity of motif discovery. Applying two discovery algorithms, 18 out of 45 modules were found to be enriched in motifs typical of well-known transcription factor families (bHLH, bZip, BZR, CAMTA, DOF, E2FE, AP2-ERF, Myb-like, NAC, TCP, WRKY) and a novel motif. Our results indicate that small number of input sequences and short promoter length are preferential to minimize the amount of uninformative signals in peach. The spatial distribution of TF binding sites revealed an unbalanced distribution where motifs tend to lie around the transcriptional start site region. The reliability of this approach was also benchmarked in Arabidopsis thaliana, where it recovered the expected motifs from promoters of genes containing ChIPseq peaks. Overall, this paper presents a glimpse of the peach regulatory components at genome scale and provides a general protocol that can be applied to many other species. Additionally, a RSAT Docker container was released to facilitate similar analyses on other species or to reproduce our results.One sentence summaryMotifs prediction depends on the promoter size. A proximal promoter region defined as an interval of -500 bp to +200 bp seems to be the adequate stretch to predict de novo regulatory motifs in peach


Blood ◽  
2012 ◽  
Vol 120 (21) ◽  
pp. 1277-1277
Author(s):  
Hongfang Wang ◽  
Chongzhi Zang ◽  
Len Taing ◽  
Hoifung Wong ◽  
Yumi Yashiro-Ohtani ◽  
...  

Abstract Abstract 1277 NOTCH1 regulates gene expression by forming transcription activation complexes with the DNA-binding factor RBPJ and gain-of-function NOTCH1 mutations are common in human and murine T lymphoblastic leukemia/lymphoma (T-LL). Via ChIP-seq studies of T-LL cells with constitutive Notch activation, we previously showed that NOTCH1/RBPJ binding sites in T-LL genomes are highly enriched for motifs corresponding to Ets factors and Runx factors. In this study, we determined the relationship of NOTCH1, RBPJ, ETS1, GABPA and RUNX1 binding sites in human T-LL cells by performing ChIP-Seq for each of these factors, as well as the chromatin marks H3K4me1, H3K4me3, and H3K27me3, and aligning the resulting sequences to human genome reference hg19 using programs available through Cistrome. Peak calling was performed with MACS2, and motif analysis was performed using SeqPos, which relies on JASPAR, TRANSFAC, Protein Binding Microarray (PBM), Yeast-1-hybrid (y1h), and human protein-DNA interaction (hPDI) databases to find known motifs and can also perform de novo motif discovery. Our analysis showed even more pervasive overlap of NOTCH1/RBPJ binding with ETS1/GABPA and RUNX1 factor binding than was predicted by motif analysis, in part due to binding of Ets factors and RUNX1 to non-canonical sequences. Heat-map analysis with K-means clustering on NOTCH1 binding regions identified three major classes of RBPJ/NOTCH1: class 1, characterized by high NOTCH/RBPJ signals, binding of the cofactors ZNF143, ETS1 and GABPA, high H3K4me3 signals, localization to promoters, and binding motifs for ZNF143; class 2, characterized by low NOTCH/RBPJ signals, binding of the cofactors ETS1, GABPA and RUNX1, high H3K4me3 signals, and Ets factor and CREB binding motifs; and class 3, characterized by high NOTCH/RBPJ signals, binding of RUNX1 and ETS1 cofactors, high H3K4me1 signals, intergenic localization (consistent with enhancers), and motifs for RUNX factors, ETS factors, and RBPJ. Of note, the nearest binding sites to the most responsive NOTCH1 target genes (defined as >2 fold stimulation when NOTCH1 was activated following release of gamma-secretase inhibitor (GSI) blockade by drug washout) were preferentially associated with Class 3 sites. Furthermore, shRNA knockdown of Ets factors and RUNX1 in T-LL cell lines induced apoptosis and reduced cell proliferation, implicating these factors in maintenance of T-LL growth and survival. Combination of knockdown of either Ets factors or RUNX1 with GSI treatment resulted in more severe phenotype in terms of apoptosis and cell growth compared to the knockdown or GSI treatment alone. In summary, our studies represent a step forward towards genome-wide understanding of how Notch works in concerts with other transcription factors to regulate the transcriptome of T-LL cells. Disclosures: No relevant conflicts of interest to declare.


2018 ◽  
Author(s):  
Niklas Bruse ◽  
Simon J. van Heeringen

AbstractBackgroundTranscription factors (TFs) bind to specific DNA sequences, TF motifs, in cis-regulatory sequences and control the expression of the diverse transcriptional programs encoded in the genome. The concerted action of TFs within the chromatin context enables precise temporal and spatial expression patterns. To understand how TFs control gene expression it is essential to model TF binding. TF motif information can help to interpret the exact role of individual regulatory elements, for instance to predict the functional impact of non-coding variants.FindingsHere we present GimmeMotifs, a comprehensive computational framework for TF motif analysis. Compared to the previously published version, this release adds a whole range of new functionality and analysis methods. It now includes tools for de novo motif discovery, motif scanning and sequence analysis, motif clustering, calculation of performance metrics and visualization. Included with GimmeMotifs is a non-redundant database of clustered motifs. Compared to other motif databases, this collection of motifs shows competitive performance in discriminating bound from unbound sequences. Using our de novo motif discovery pipeline we find large differences in performance between de novo motif finders on ChIP-seq data. Using an ensemble method such as implemented in GimmeMotifs will generally result in improved motif identification compared to a single motif finder. Finally, we demonstrate maelstrom, a new ensemble method that enables comparative analysis of TF motifs between multiple high-throughput sequencing experiments, such as ChIP-seq or ATAC-seq. Using a collection of ~200 H3K27ac ChIP-seq data sets we identify TFs that play a role in hematopoietic differentiation and lineage commitment.ConclusionGimmeMotifs is a fully-featured and flexible framework for TF motif analysis. It contains both command-line tools as well as a Python API and is freely available at: https://github.com/vanheeringen-lab/gimmemotifs.


2013 ◽  
Vol 42 (5) ◽  
pp. 2976-2987 ◽  
Author(s):  
Pouya Kheradpour ◽  
Manolis Kellis

AbstractRecent advances in technology have led to a dramatic increase in the number of available transcription factor ChIP-seq and ChIP-chip data sets. Understanding the motif content of these data sets is an important step in understanding the underlying mechanisms of regulation. Here we provide a systematic motif analysis for 427 human ChIP-seq data sets using motifs curated from the literature and also discovered de novo using five established motif discovery tools. We use a systematic pipeline for calculating motif enrichment in each data set, providing a principled way for choosing between motif variants found in the literature and for flagging potentially problematic data sets. Our analysis confirms the known specificity of 41 of the 56 analyzed factor groups and reveals motifs of potential cofactors. We also use cell type-specific binding to find factors active in specific conditions. The resource we provide is accessible both for browsing a small number of factors and for performing large-scale systematic analyses. We provide motif matrices, instances and enrichments in each of the ENCODE data sets. The motifs discovered here have been used in parallel studies to validate the specificity of antibodies, understand cooperativity between data sets and measure the variation of motif binding across individuals and species.


2021 ◽  
Vol 118 (46) ◽  
pp. e2104297118
Author(s):  
Sameena Nikhat ◽  
Anurupa D. Yadavalli ◽  
Arpita Prusty ◽  
Priyanka K. Narayan ◽  
Dasaradhi Palakodeti ◽  
...  

The commitment of hematopoietic multipotent progenitors (MPPs) toward a particular lineage involves activation of cell type–specific genes and silencing of genes that promote alternate cell fates. Although the gene expression programs of early–B and early–T lymphocyte development are mutually exclusive, we show that these cell types exhibit significantly correlated microRNA (miRNA) profiles. However, their corresponding miRNA targetomes are distinct and predominated by transcripts associated with natural killer, dendritic cell, and myeloid lineages, suggesting that miRNAs function in a cell-autonomous manner. The combinatorial expression of miRNAs miR-186-5p, miR-128-3p, and miR-330-5p in MPPs significantly attenuates their myeloid differentiation potential due to repression of myeloid-associated transcripts. Depletion of these miRNAs caused a pronounced de-repression of myeloid lineage targets in differentiating early–B and early–T cells, resulting in a mixed-lineage gene expression pattern. De novo motif analysis combined with an assay of promoter activities indicates that B as well as T lineage determinants drive the expression of these miRNAs in lymphoid lineages. Collectively, we present a paradigm that miRNAs are conserved between developing B and T lymphocytes, yet they target distinct sets of promiscuously expressed lineage-inappropriate genes to suppress the alternate cell-fate options. Thus, our studies provide a comprehensive compendium of miRNAs with functional implications for B and T lymphocyte development.


2019 ◽  
Vol 35 (24) ◽  
pp. 5339-5340 ◽  
Author(s):  
Laura Puente-Santamaria ◽  
Wyeth W Wasserman ◽  
Luis del Peso

Abstract Summary The computational identification of the transcription factors (TFs) [more generally, transcription regulators, (TR)] responsible for the co-regulation of a specific set of genes is a common problem found in genomic analysis. Herein, we describe TFEA.ChIP, a tool that makes use of ChIP-seq datasets to estimate and visualize TR enrichment in gene lists representing transcriptional profiles. We validated TFEA.ChIP using a wide variety of gene sets representing signatures of genetic and chemical perturbations as input and found that the relevant TR was correctly identified in 126 of a total of 174 analyzed. Comparison with other TR enrichment tools demonstrates that TFEA.ChIP is an highly customizable package with an outstanding performance. Availability and implementation TFEA.ChIP is implemented as an R package available at Bioconductor https://www.bioconductor.org/packages/devel/bioc/html/TFEA.ChIP.html and github https://github.com/LauraPS1/TFEA.ChIP_downloads. A web-based GUI to the package is also available at https://www.iib.uam.es/TFEA.ChIP/ Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document