scholarly journals Biological Sequence Motif Discovery Usingmotif-x

Author(s):  
Michael F. Chou ◽  
Daniel Schwartz
Author(s):  
Jin S. Liu ◽  
Mayetri Gupta ◽  
Xiaole Liu ◽  
Linda Mayerhofere ◽  
Charles E. Lawrence

Author(s):  
Marwa A. Radad ◽  
Nawal A. El-Fishawy ◽  
Hossam M. Faheem

Motif is an over-represented pattern in biological sequence. Motif discovery is a major challenge in bioinformatics. Pattern mismatches phenomena makes motif mining very difficult. Brute Force approaches take exponential time with motif length to solve this problem. In this paper, the authors discuss a Recursive-Brute Force algorithm. Its average case time complexity is exponential with the allowed mutations instead of the motif length. Modern Multi-Core architecture revolution encourages us to parallelize our algorithm. We implement the algorithm using two different approaches. A multi-threaded version (OMP-RBF) is implemented using OpenMP. OMP-RBF suffers from a serious performance degradation due to the heap contention problem. The authors have investigated different solutions to solve the heap contention problem. The second implementation is based on MPI that is called MPI-RBF. The efficient handling of the data locality boost the scalability of the MPI-RBF. The authors prove that MPI approach outperforms OpenMP in such computationally-intensive, memory-intensive, and communication-less problem.


Author(s):  
Charles E. Grant ◽  
Timothy L. Bailey

AbstractXSTREME is a web-based tool for performing comprehensive motif discovery and analysis in DNA, RNA or protein sequences, as well as in sequences in user-defined alphabets. It is designed for both very large and very small datasets. XSTREME is similar to the MEME-ChIP tool, but expands upon its capabilities in several ways. Like MEME-ChIP, XSTREME performs two types of de novo motif discovery, and also performs motif enrichment analysis of the input sequences using databases of known motifs. Unlike MEME-ChIP, which ranks motifs based on their enrichment in the centers of the input sequences, XSTREME uses enrichment anywhere in the sequences for this purpose. Consequently, XSTREME is more appropriate for motif-based analysis of sequences regardless of how the motifs are distributed within the sequences. XSTREME uses the MEME and STREME algorithms for motif discovery, and the recently developed SEA algorithm for motif enrichment analysis. The interactive HTML output produced by XSTREME includes highly accurate motif significance estimates, plots of the positional distribution of each motif, and histograms of the number of motif matches in each sequences. XSTREME is easy to use via its web server at https://meme-suite.org, and is fully integrated with the widely-used MEME Suite of sequence analysis tools, which can be freely downloaded at the same web site for non-commercial use.


Author(s):  
Marjan Trutschl ◽  
Phillip C. S. R. Kilgore ◽  
Rona S. Scott ◽  
Christine E. Birdwell ◽  
Urška Cvek

Biological sequence motifs are short nucleotide or amino acid sequences that are biologically significant and are attractive to scientists because they are usually highly conserved and result in structural and regulatory implications. In this chapter, the authors show practical applications of these data, followed by a review of the algorithms, techniques, and tools. They address the nature of motifs and elucidate on several methods for de novo motif discovery, covering the algorithms based on Gibbs sampling, expectation maximization, Bayesian inference, covariance models, and discriminative learning. The authors present the tools and their requirements to weigh their individual benefits and challenges. Since interpretation of a large set of results can pose significant challenges, they discuss several methods for handling data that span from visualization to integration into pipelines and curated databases. Additionally, the authors show practical applications of these data with examples.


2013 ◽  
Vol 11 (01) ◽  
pp. 1340008 ◽  
Author(s):  
SEBASTIAN MAURER-STROH ◽  
HE GAO ◽  
HAO HAN ◽  
LIES BAETEN ◽  
JOOST SCHYMKOWITZ ◽  
...  

Data mining in protein databases, derivatives from more fundamental protein 3D structure and sequence databases, has considerable unearthed potential for the discovery of sequence motif—structural motif—function relationships as the finding of the U-shape (Huf-Zinc) motif, originally a small student's project, exemplifies. The metal ion zinc is critically involved in universal biological processes, ranging from protein-DNA complexes and transcription regulation to enzymatic catalysis and metabolic pathways. Proteins have evolved a series of motifs to specifically recognize and bind zinc ions. Many of these, so called zinc fingers, are structurally independent globular domains with discontinuous binding motifs made up of residues mostly far apart in sequence. Through a systematic approach starting from the BRIX structure fragment database, we discovered that there exists another predictable subset of zinc-binding motifs that not only have a conserved continuous sequence pattern but also share a characteristic local conformation, despite being included in totally different overall folds. While this does not allow general prediction of all Zn binding motifs, a HMM-based web server, Huf-Zinc, is available for prediction of these novel, as well as conventional, zinc finger motifs in protein sequences. The Huf-Zinc webserver can be freely accessed through this URL ( http://mendel.bii.a-star.edu.sg/METHODS/hufzinc/ ).


2016 ◽  
Author(s):  
Morten Muhlig Nielsen ◽  
Paula Tataru ◽  
Tobias Madsen ◽  
Asger Hobolth ◽  
Jakob Skou Pedersen

Motif analysis has long been an important method to characterize biological functionality and the current growth of sequencing-based genomics experiments further extends its potential. These diverse experiments often generate sequence lists ranked by some functional property. There is therefore a growing need for motif analysis methods that can exploit this coupled data structure and be tailored for specific biological questions. Here, we present a motif analysis tool, Regmex (REGular expression Motif EXplorer), which offers several methods to identify overrepresented motifs in a ranked list of sequences. Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact probabilities for motif observations in sequences. Motif enrichment is optionally evaluated using random walks, Brownian bridges, or modified rank based statistics. These features make Regmex well suited for a range of biological sequence analysis problems related to motif discovery. We demonstrate different usage scenarios including rank correlation of microRNA binding sites co-occurring with a U-rich motif. The method is available as an R package.


2020 ◽  
Author(s):  
Koichi Mori ◽  
Haruka Ozaki ◽  
Tsukasa Fukunaga

AbstractSequence motifs play essential roles in intermolecular interactions such as DNA-protein interactions. The discovery of novel sequence motifs is therefore crucial for revealing gene functions. Various bioinformatics tools have been developed for finding sequence motifs, but until now there has been no software based on statistical hypothesis testing with statistically sound multiple testing correction. Existing software therefore could not control for the type-1 error rates. This is because, in the sequence motif discovery problem, conventional multiple testing correction methods produce very low statistical power due to overly-strict correction. We developed MotiMul, which comprehensively finds significant sequence motifs using statistically sound multiple testing correction. Our key idea is the application of Tarone’s correction, which improves the statistical power of the hypothesis test by ignoring hypotheses that never become statistically significant. For the efficient enumeration of the significant sequence motifs, we integrated a variant of the PrefixSpan algorithm with Tarone’s correction. Simulation and empirical dataset analysis showed that MotiMul is a powerful method for finding biologically meaningful sequence motifs. The source code of MotiMul is freely available at https://github.com/ko-ichimo-ri/MotiMul.


Sign in / Sign up

Export Citation Format

Share Document