Greedy de novo motif discovery to construct motif repositories for bacterial proteomes

De novo motif discovery is essential in understanding the cis-regulatory processes that play a role in gene expression. Finding unknown patterns of unknown lengths in massive amounts of data has long been a major challenge in computational biology. Because algorithms for motif prediction have always suffered of low performance issues, there is a constant effort to find better techniques. Evolutionary methods, including swarm intelligence algorithms, have been applied with limited success for motif prediction. However, recently developed methods, such as the Fireworks Algorithm (FWA) which simulates the explosion process of fireworks, may show better prospects. This paper describes a motif finding algorithm based on FWA that maximizes the Kullback-Leibler divergence between candidate solutions and the background noise. Following the terminology of FWA's framework, the candidate motifs are fireworks that generate additional sparks (i.e. derived motifs) in their neighborhood. During the iterations, better sparks can replace the fireworks, as the Fireworks Motif Finder (FW-MF) assumes a one occurrence per sequence mode. The results obtained on a standard benchmark for promoter analysis show that our proof of concept is promising.

Download Full-text

A Clustering-Based Algorithm for De Novo Motif Discovery in DNA Sequences

2017 24th National and 2nd International Iranian Conference on Biomedical Engineering (ICBME) ◽

10.1109/icbme.2017.8430242 ◽

2017 ◽

Author(s):

Mohammad Haghir Ebrahim-Abadi ◽

Emad Fatemizadeh

Keyword(s):

Dna Sequences ◽

Motif Discovery ◽

De Novo ◽

De Novo Motif Discovery

Download Full-text

De novo Motif Prediction using the Fireworks Algorithm

International Journal of Swarm Intelligence Research ◽

10.4018/ijsir.2015070102 ◽

2015 ◽

Vol 6 (3) ◽

pp. 24-40 ◽

Cited By ~ 6

Author(s):

Andrei Lihu ◽

Ștefan Holban

Keyword(s):

Motif Discovery ◽

De Novo ◽

Proof Of Concept ◽

Fireworks Algorithm ◽

Regulatory Processes ◽

Motif Prediction ◽

Leibler Divergence ◽

De Novo Motif Discovery ◽

Low Performance ◽

Motif Finding Algorithm

De novo motif discovery is essential in understanding the cis-regulatory processes that play a role in gene expression. Finding unknown patterns of unknown lengths in massive amounts of data has long been a major challenge in computational biology. Because algorithms for motif prediction have always suffered of low performance issues, there is a constant effort to find better techniques. Evolutionary methods, including swarm intelligence algorithms, have been applied with limited success for motif prediction. However, recently developed methods, such as the Fireworks Algorithm (FWA) which simulates the explosion process of fireworks, may show better prospects. This paper describes a motif finding algorithm based on FWA that maximizes the Kullback-Leibler divergence between candidate solutions and the background noise. Following the terminology of FWA's framework, the candidate motifs are fireworks that generate additional sparks (i.e. derived motifs) in their neighborhood. During the iterations, better sparks can replace the fireworks, as the Fireworks Motif Finder (FW-MF) assumes a one occurrence per sequence mode. The results obtained on a standard benchmark for promoter analysis show that our proof of concept is promising.

Download Full-text

AptCompare: optimized de novo motif discovery of RNA aptamers via HTS-SELEX

Bioinformatics ◽

10.1093/bioinformatics/btaa054 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2905-2906 ◽

Cited By ~ 1

Author(s):

Kevin R Shieh ◽

Christina Kratschmer ◽

Keith E Maier ◽

John M Greally ◽

Matthew Levy ◽

...

Keyword(s):

Motif Discovery ◽

High Throughput Sequencing ◽

De Novo ◽

Rna Aptamers ◽

Supplementary Information ◽

Good Correspondence ◽

Detection Algorithms ◽

De Novo Motif Discovery ◽

Exponential Enrichment ◽

Analytical Approaches

Abstract Summary High-throughput sequencing can enhance the analysis of aptamer libraries generated by the Systematic Evolution of Ligands by EXponential enrichment. Robust analysis of the resulting sequenced rounds is best implemented by determining a ranked consensus of reads following the processing by multiple aptamer detection algorithms. While several such approaches have been developed to this end, their installation and implementation is problematic. We developed AptCompare, a cross-platform program that combines six of the most widely used analytical approaches for the identification of RNA aptamer motifs and uses a simple weighted ranking to order the candidate aptamers, all driven within the same GUI-enabled environment. We demonstrate AptCompare’s performance by identifying the top-ranked candidate aptamers from a previously published selection experiment in our laboratory, with follow-up bench assays demonstrating good correspondence between the sequences’ rankings and their binding affinities. Availability and implementation The source code and pre-built virtual machine images are freely available at https://bitbucket.org/shiehk/aptcompare. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Factorbook Motif Pipeline: A de novo motif discovery and filtering web server for ChIP-seq peaks

10.1101/033670 ◽

2015 ◽

Cited By ~ 1

Author(s):

Bong-Hyun Kim ◽

Jiali Zhuang ◽

Jie Wang ◽

Zhiping Weng

Keyword(s):

Motif Discovery ◽

High Throughput Sequencing ◽

De Novo ◽

Statistical Tests ◽

Web Server ◽

Biological Processes ◽

Web Based ◽

Sequencing Technologies ◽

De Novo Motif Discovery

Summary: High-throughput sequencing technologies such as ChIP-seq have deepened our understanding in many biological processes. De novo motif search is one of the key downstream computational analysis following the ChIP-seq experiments and several algorithms have been proposed for this purpose. However, most web-based systems do not perform independent filtering or enrichment analyses to ensure the quality of the discovered motifs. Here, we developed a web server Factorbook Motif Pipeline based on an algorithm used in analyzing ENCODE consortium ChIP-seq datasets. It performs comprehensive analysis on the set of peaks detected from a ChIP-seq experiments: (i) de novo motif discovery; (ii) independent composition and bias analyses and (iii) matching to the annotated motifs. The statistical tests employed in our pipeline provide a reliable measure of confidence as to how significant are the motifs reported in the discovery step. Availability: Factorbook Motif Pipeline source code is accessible through the following URL. https://github.com/joshuabhk/factorbook-motif-pipeline

Download Full-text

Detection and Employment of Biological Sequence Motifs

Big Data Analytics in Bioinformatics and Healthcare - Advances in Bioinformatics and Biomedical Engineering ◽

10.4018/978-1-4666-6611-5.ch005 ◽

2015 ◽

pp. 86-116

Author(s):

Marjan Trutschl ◽

Phillip C. S. R. Kilgore ◽

Rona S. Scott ◽

Christine E. Birdwell ◽

Urška Cvek

Keyword(s):

Motif Discovery ◽

De Novo ◽

Amino Acid Sequences ◽

Discriminative Learning ◽

Large Set ◽

Sequence Motifs ◽

Biological Sequence ◽

Practical Applications ◽

De Novo Motif Discovery ◽

Covariance Models

Biological sequence motifs are short nucleotide or amino acid sequences that are biologically significant and are attractive to scientists because they are usually highly conserved and result in structural and regulatory implications. In this chapter, the authors show practical applications of these data, followed by a review of the algorithms, techniques, and tools. They address the nature of motifs and elucidate on several methods for de novo motif discovery, covering the algorithms based on Gibbs sampling, expectation maximization, Bayesian inference, covariance models, and discriminative learning. The authors present the tools and their requirements to weigh their individual benefits and challenges. Since interpretation of a large set of results can pose significant challenges, they discuss several methods for handling data that span from visualization to integration into pipelines and curated databases. Additionally, the authors show practical applications of these data with examples.

Download Full-text

PairMotif+: A Fast and Effective Algorithm for De Novo Motif Discovery in DNA sequences

International Journal of Biological Sciences ◽

10.7150/ijbs.5786 ◽

2013 ◽

Vol 9 (4) ◽

pp. 412-424 ◽

Cited By ~ 6

Author(s):

Qiang Yu ◽

Hongwei Huo ◽

Yipu Zhang ◽

Hongzhi Guo ◽

Haitao Guo

Keyword(s):

Dna Sequences ◽

Motif Discovery ◽

De Novo ◽

Effective Algorithm ◽

De Novo Motif Discovery

Download Full-text

SIOMICS: a novel approach for systematic identification of motifs in ChIP-seq data

Nucleic Acids Research ◽

10.1093/nar/gkt1288 ◽

2013 ◽

Vol 42 (5) ◽

pp. e35-e35 ◽

Cited By ~ 15

Author(s):

Jun Ding ◽

Haiyan Hu ◽

Xiaoman Li

Keyword(s):

Motif Discovery ◽

De Novo ◽

Data Sets ◽

Random Data ◽

Data Set ◽

Binding Motifs ◽

Gene Transcriptional Regulation ◽

Novel Approach ◽

De Novo Motif Discovery ◽

Systematic Identification

Abstract The identification of transcription factor binding motifs is important for the study of gene transcriptional regulation. The chromatin immunoprecipitation (ChIP), followed by massive parallel sequencing (ChIP-seq) experiments, provides an unprecedented opportunity to discover binding motifs. Computational methods have been developed to identify motifs from ChIP-seq data, while at the same time encountering several problems. For example, existing methods are often not scalable to the large number of sequences obtained from ChIP-seq peak regions. Some methods heavily rely on well-annotated motifs even though the number of known motifs is limited. To simplify the problem, de novo motif discovery methods often neglect underrepresented motifs in ChIP-seq peak regions. To address these issues, we developed a novel approach called SIOMICS to de novo discover motifs from ChIP-seq data. Tested on 13 ChIP-seq data sets, SIOMICS identified motifs of many known and new cofactors. Tested on 13 simulated random data sets, SIOMICS discovered no motif in any data set. Compared with two recently developed methods for motif discovery, SIOMICS shows advantages in terms of speed, the number of known cofactor motifs predicted in experimental data sets and the number of false motifs predicted in random data sets. The SIOMICS software is freely available at http://eecs.ucf.edu/∼xiaoman/SIOMICS/SIOMICS.html.

Download Full-text