Top-Down Motif Discovery in Biological Sequence Datasets by Genetic Algorithm

In this study, a new genetic algorithm was developed to discover the best motifs in a set of DNA sequences. The main steps were: finding the potential positions in each sequence by using few voters (1–5 sequences), constructing the chromosomes from the potential positions, evaluating the fitness for each gene (position) and for each chromosome, calculating the new random distribution, and using the new distribution to generate the next generation. To verify the effectiveness of the proposed algorithm, several real and artificial datasets were used; the results are compared to the standard genetic algorithm, and Gibbs, MEME, and consensus algorithms. Although all the algorithms have low correlation with the correct motifs, the new algorithm exhibits higher accuracy, without sacrificing implementation time.

Download Full-text

Genetic algorithm for dimer-led and error-restricted spaced motif discovery

2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) ◽

10.1109/cibcb.2013.6595409 ◽

2013 ◽

Author(s):

Tak-Ming Chan ◽

Leung-Yau Lo ◽

Man-Leung Wong ◽

Yong Liang ◽

Kwong-Sak Leung

Keyword(s):

Genetic Algorithm ◽

Motif Discovery

Download Full-text

Genetic Algorithm Based Probabilistic Motif Discovery in Unaligned Biological Sequences

Journal of Computer Science ◽

10.3844/jcssp.2008.625.630 ◽

2008 ◽

Vol 4 (8) ◽

pp. 625-630

Author(s):

M. Hemalatha ◽

K. Vivekanand

Keyword(s):

Genetic Algorithm ◽

Motif Discovery ◽

Biological Sequences

Download Full-text

XSTREME: Comprehensive motif analysis of biological sequence datasets

10.1101/2021.09.02.458722 ◽

2021 ◽

Cited By ~ 1

Author(s):

Charles E. Grant ◽

Timothy L. Bailey

Keyword(s):

Motif Discovery ◽

De Novo ◽

Positional Distribution ◽

Enrichment Analysis ◽

Biological Sequence ◽

Motif Analysis ◽

Web Based ◽

Fully Integrated ◽

Commercial Use ◽

Motif Enrichment

AbstractXSTREME is a web-based tool for performing comprehensive motif discovery and analysis in DNA, RNA or protein sequences, as well as in sequences in user-defined alphabets. It is designed for both very large and very small datasets. XSTREME is similar to the MEME-ChIP tool, but expands upon its capabilities in several ways. Like MEME-ChIP, XSTREME performs two types of de novo motif discovery, and also performs motif enrichment analysis of the input sequences using databases of known motifs. Unlike MEME-ChIP, which ranks motifs based on their enrichment in the centers of the input sequences, XSTREME uses enrichment anywhere in the sequences for this purpose. Consequently, XSTREME is more appropriate for motif-based analysis of sequences regardless of how the motifs are distributed within the sequences. XSTREME uses the MEME and STREME algorithms for motif discovery, and the recently developed SEA algorithm for motif enrichment analysis. The interactive HTML output produced by XSTREME includes highly accurate motif significance estimates, plots of the positional distribution of each motif, and histograms of the number of motif matches in each sequences. XSTREME is easy to use via its web server at https://meme-suite.org, and is fully integrated with the widely-used MEME Suite of sequence analysis tools, which can be freely downloaded at the same web site for non-commercial use.

Download Full-text

Detection and Employment of Biological Sequence Motifs

Big Data Analytics in Bioinformatics and Healthcare - Advances in Bioinformatics and Biomedical Engineering ◽

10.4018/978-1-4666-6611-5.ch005 ◽

2015 ◽

pp. 86-116

Author(s):

Marjan Trutschl ◽

Phillip C. S. R. Kilgore ◽

Rona S. Scott ◽

Christine E. Birdwell ◽

Urška Cvek

Keyword(s):

Motif Discovery ◽

De Novo ◽

Amino Acid Sequences ◽

Discriminative Learning ◽

Large Set ◽

Sequence Motifs ◽

Biological Sequence ◽

Practical Applications ◽

De Novo Motif Discovery ◽

Covariance Models

Biological sequence motifs are short nucleotide or amino acid sequences that are biologically significant and are attractive to scientists because they are usually highly conserved and result in structural and regulatory implications. In this chapter, the authors show practical applications of these data, followed by a review of the algorithms, techniques, and tools. They address the nature of motifs and elucidate on several methods for de novo motif discovery, covering the algorithms based on Gibbs sampling, expectation maximization, Bayesian inference, covariance models, and discriminative learning. The authors present the tools and their requirements to weigh their individual benefits and challenges. Since interpretation of a large set of results can pose significant challenges, they discuss several methods for handling data that span from visualization to integration into pipelines and curated databases. Additionally, the authors show practical applications of these data with examples.

Download Full-text