A Greedy Clustering Algorithm for Multiple Sequence Alignment

This paper presents a strategy to tackle the Multiple Sequence Alignment (MSA) problem, which is one of the most important tasks in the biological sequence analysis. Its role is to align the sequences in their entirety to derive relationships and common characteristics between a set of protein or nucleotide sequences. The MSA problem was proved to be an NP-Hard problem. The proposed strategy incorporates a new idea based on the well-known divide and conquer paradigm. This paper presents a novel method of clustering sequences as a preliminary step to improve the final alignment; this decomposition can be used as an optimization procedure with any MSA aligner to explore promising alignments of the search space. In their solution, authors proposed to align the clusters in a parallel and distributed way in order to benefit from parallel architectures. The strategy was tested using classical benchmarks like BAliBASE, Sabre, Prefab4 and Oxm, and the experimental results show that it gives good results by comparing to the other aligners.

Download Full-text

Search Space Reduction Technique for Distributed Multiple Sequence Alignment

2009 Sixth IFIP International Conference on Network and Parallel Computing ◽

10.1109/npc.2009.43 ◽

2009 ◽

Cited By ~ 1

Author(s):

Manal Helal ◽

Lenore Mullin ◽

John Potter ◽

Vitali Sintchenko

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Search Space ◽

Reduction Technique ◽

Multiple Sequence ◽

Space Reduction ◽

Search Space Reduction

Download Full-text

Resolving the multiple sequence alignment problem using biogeography-based optimization with multiple populations

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001550016x ◽

2015 ◽

Vol 13 (04) ◽

pp. 1550016 ◽

Cited By ~ 3

Author(s):

El-Amine Zemali ◽

Abdelmadjid Boukra

Keyword(s):

Sequence Alignment ◽

Dna Sequences ◽

Multiple Sequence Alignment ◽

Search Space ◽

New Method ◽

Average Score ◽

Solution Quality ◽

Multiple Sequence ◽

Multiple Populations ◽

Alignment Problem

The multiple sequence alignment (MSA) is one of the most challenging problems in bioinformatics, it involves discovering similarity between a set of protein or DNA sequences. This paper introduces a new method for the MSA problem called biogeography-based optimization with multiple populations (BBOMP). It is based on a recent metaheuristic inspired from the mathematics of biogeography named biogeography-based optimization (BBO). To improve the exploration ability of BBO, we have introduced a new concept allowing better exploration of the search space. It consists of manipulating multiple populations having each one its own parameters. These parameters are used to build up progressive alignments allowing more diversity. At each iteration, the best found solution is injected in each population. Moreover, to improve solution quality, six operators are defined. These operators are selected with a dynamic probability which changes according to the operators efficiency. In order to test proposed approach performance, we have considered a set of datasets from Balibase 2.0 and compared it with many recent algorithms such as GAPAM, MSA-GA, QEAMSA and RBT-GA. The results show that the proposed approach achieves better average score than the previously cited methods.

Download Full-text

The Performance Assessment Strategy in DC-BTA Multiple Sequence Alignment

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.439-440.35 ◽

2010 ◽

Vol 439-440 ◽

pp. 35-40

Author(s):

Zhan Mao Cao ◽

Wen Jun Xiao ◽

Li Min Peng

Keyword(s):

Performance Assessment ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Assessment Model ◽

Divide And Conquer ◽

Alignment Method ◽

Multiple Sequence ◽

A Value ◽

New Strategy ◽

Effective Assessment

A brand new performance assessment model is proposed for multiple sequence alignment. The new strategy is based on beam constructing of DC-BTA algorithm, which is a Divide-and-Conquer alignment method with beams. Beams form blocks of almost the identical columns and contribute biggest similarity weight to sequences. A formula to compute all beam areas covering a sequence assigns a value or weight to the sequence. And the total beam area is a partial to the whole alignment. A rate value between 0 and 1 is computed to assess the performance. This scheme is a simple and effective assessment policy in DC-BTA for the convenience of collecting the beam areas.

Download Full-text

Improving the divide-and-conquer approach to sum-of-pairs multiple sequence alignment

Applied Mathematics Letters ◽

10.1016/s0893-9659(97)00013-x ◽

1997 ◽

Vol 10 (2) ◽

pp. 67-73 ◽

Cited By ~ 13

Author(s):

J. Stoye ◽

S.W. Perrey ◽

A.W.M. Dress

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Divide And Conquer ◽

Multiple Sequence

Download Full-text

A Divide-and-Conquer Method for Multiple Sequence Alignment on Multi-core Computers

Communications in Computer and Information Science - Parallel Computational Fluid Dynamics ◽

10.1007/978-3-642-53962-6_41 ◽

2014 ◽

pp. 460-469

Author(s):

Xiangyuan Zhu

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Divide And Conquer ◽

Multiple Sequence

Download Full-text

An efficient algorithm for multiple sequence alignment based on ant colony optimisation and divide‐and‐conquer method

New Zealand Journal of Agricultural Research ◽

10.1080/00288230709510330 ◽

2007 ◽

Vol 50 (5) ◽

pp. 617-626 ◽

Cited By ~ 4

Author(s):

Wei Liu ◽

Ling Chen ◽

Juan Chen

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Efficient Algorithm ◽

Ant Colony ◽

Divide And Conquer ◽

Ant Colony Optimisation ◽

Multiple Sequence

Download Full-text

DCA: An efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment

Bioinformatics ◽

10.1093/bioinformatics/13.6.625 ◽

1997 ◽

Vol 13 (6) ◽

pp. 625-626 ◽

Cited By ~ 19

Author(s):

Jens Stoye ◽

Vincent Moulton ◽

Andreas W.M. Dress

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Efficient Implementation ◽

Divide And Conquer ◽

Multiple Sequence

Download Full-text

An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

Journal of Artificial Intelligence Research ◽

10.1613/jair.1534 ◽

2005 ◽

Vol 23 ◽

pp. 587-623 ◽

Cited By ~ 11

Author(s):

S. Schroedl

Keyword(s):

Dynamic Programming ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Search Algorithm ◽

Optimal Solution ◽

Search Space ◽

Priority Queue ◽

Sequence Alignments ◽

Multiple Sequence ◽

Benchmark Database

Multiple sequence alignment (MSA) is a ubiquitous problem in computational biology. Although it is NP-hard to find an optimal solution for an arbitrary number of sequences, due to the importance of this problem researchers are trying to push the limits of exact algorithms further. Since MSA can be cast as a classical path finding problem, it is attracting a growing number of AI researchers interested in heuristic search algorithms as a challenge with actual practical relevance. In this paper, we first review two previous, complementary lines of research. Based on Hirschberg's algorithm, Dynamic Programming needs O(kN^(k-1)) space to store both the search frontier and the nodes needed to reconstruct the solution path, for k sequences of length N. Best first search, on the other hand, has the advantage of bounding the search space that has to be explored using a heuristic. However, it is necessary to maintain all explored nodes up to the final solution in order to prevent the search from re-expanding them at higher cost. Earlier approaches to reduce the Closed list are either incompatible with pruning methods for the Open list, or must retain at least the boundary of the Closed list. In this article, we present an algorithm that attempts at combining the respective advantages; like A* it uses a heuristic for pruning the search space, but reduces both the maximum Open and Closed size to O(kN^(k-1)), as in Dynamic Programming. The underlying idea is to conduct a series of searches with successively increasing upper bounds, but using the DP ordering as the key for the Open priority queue. With a suitable choice of thresholds, in practice, a running time below four times that of A* can be expected. In our experiments we show that our algorithm outperforms one of the currently most successful algorithms for optimal multiple sequence alignments, Partial Expansion A*, both in time and memory. Moreover, we apply a refined heuristic based on optimal alignments not only of pairs of sequences, but of larger subsets. This idea is not new; however, to make it practically relevant we show that it is equally important to bound the heuristic computation appropriately, or the overhead can obliterate any possible gain. Furthermore, we discuss a number of improvements in time and space efficiency with regard to practical implementations. Our algorithm, used in conjunction with higher-dimensional heuristics, is able to calculate for the first time the optimal alignment for almost all of the problems in Reference 1 of the benchmark database BAliBASE.

Download Full-text