scholarly journals Application and Algorithm: Maximal Motif Discovery for Biological Data in a Sliding Window

Author(s):  
Miznah H. Alshammary ◽  
Costas S. Iliopoulos ◽  
Manal Mohamed ◽  
Fatima Vayani
Author(s):  
Costas S. Iliopoulos ◽  
Manal Mohamed ◽  
Solon P. Pissis ◽  
Fatima Vayani

2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Chunxiao Sun ◽  
Hongwei Huo ◽  
Qiang Yu ◽  
Haitao Guo ◽  
Zhigang Sun

The planted(l,d)motif search (PMS) is one of the fundamental problems in bioinformatics, which plays an important role in locating transcription factor binding sites (TFBSs) in DNA sequences. Nowadays, identifying weak motifs and reducing the effect of local optimum are still important but challenging tasks for motif discovery. To solve the tasks, we propose a new algorithm, APMotif, which first applies the Affinity Propagation (AP) clustering in DNA sequences to produce informative and good candidate motifs and then employs Expectation Maximization (EM) refinement to obtain the optimal motifs from the candidate motifs. Experimental results both on simulated data sets and real biological data sets show that APMotif usually outperforms four other widely used algorithms in terms of high prediction accuracy.


2019 ◽  
Vol 15 (1) ◽  
pp. 4-26
Author(s):  
Fatma A. Hashim ◽  
Mai S. Mabrouk ◽  
Walid A.L. Atabany

Background: Bioinformatics is an interdisciplinary field that combines biology and information technology to study how to deal with the biological data. The DNA motif discovery problem is the main challenge of genome biology and its importance is directly proportional to increasing sequencing technologies which produce large amounts of data. DNA motif is a repeated portion of DNA sequences of major biological interest with important structural and functional features. Motif discovery plays a vital role in the antibody-biomarker identification which is useful for diagnosis of disease and to identify Transcription Factor Binding Sites (TFBSs) that help in learning the mechanisms for regulation of gene expression. Recently, scientists discovered that the TFs have a mutation rate five times higher than the flanking sequences, so motif discovery also has a crucial role in cancer discovery. Methods: Over the past decades, many attempts use different algorithms to design fast and accurate motif discovery tools. These algorithms are generally classified into consensus or probabilistic approach. Results: Many of DNA motif discovery algorithms are time-consuming and easily trapped in a local optimum. Conclusion: Nature-inspired algorithms and many of combinatorial algorithms are recently proposed to overcome the problems of consensus and probabilistic approaches. This paper presents a general classification of motif discovery algorithms with new sub-categories. It also presents a summary comparison between them.


2014 ◽  
Vol 13 (1) ◽  
pp. 4082-4095 ◽  
Author(s):  
Nooruldeen Qader ◽  
Hussein Keitan Al-Khafaji

Bioinformatics analyses huge amounts of biological data that demands in-depth understanding. On the other hand, data mining research develops methods for discovering motifs in biosequences. Motif discovery involves benefits and challenges. We show bridge of the two fields, data mining and Bioinformatics, for successful mining of biological data. We found the motivation and justification factors lead to preferring naturalistic method research for Bioinformatics, because naturalistic method depends on real data. The method empowers Bioinformatics techniques to handle the true properties and reducing assumptions for un-modeled or uncover biodata phenomena. The empowerment comes from recognizing and understanding biodata properties and processes.


Author(s):  
Jyoti Lakhani ◽  
Anupama Chowdhary ◽  
Dharmesh Harwani

In the present scenario there are a variety of technical tools for supporting and validating wet-lab experiments in the field of science and biotechnology. In order to analyze biological sequences it is necessary to group similar genes. Grouping of genes can be done by using various techniques like pattern matching, classification, clustering etc. In the present study clustering is used as a tool for analyzing biological data. Clustering of Biological sequences is a very interesting and fascinating area as various researchers are working on it. But simple clustering algorithms are not much suitable for sequence analysis problems. Most of the biological sequence analysis problems are NP-hard and some strong optimization algorithm are required for these types of problems. The manuscript presented here is a survey of various clustering techniques useful for analysis of biological sequences. The 3+ stage review process is adopted for the review of literature. To prepare this report 98 papers have been reviewed from year 1997 to 2014 according to the year of publish. The papers reviewed have discussed various issues related to the analysis of biological sequences. The major issues discovered in the reviewed papers were prediction, sequence alignment, motif discovery, cluster boundary prediction etc. Various solution approaches used by researchers for the biological sequence analysis are evolutionary clustering, neural networks, hierarchical clustering, k-means, Go technologies, feature selection, incremental approach, bio-inspired methods, particle swarm optimization, fuzzy techniques, rough set theory and bi-clustering etc. Researchers have applied these solution approaches on various types of datasets. In this communication we have also discussed about these datasets and the parameters used with results mentioned in papers.


Sign in / Sign up

Export Citation Format

Share Document