MANGO: MULTIPLE ALIGNMENT WITH N GAPPED OLIGOS

Multiple sequence alignment is a classical and challenging task. The problem is NP-hard. The full dynamic programming takes too much time. The progressive alignment heuristics adopted by most state-of-the-art works suffer from the "once a gap, always a gap" phenomenon. Is there a radically new way to do multiple sequence alignment? In this paper, we introduce a novel and orthogonal multiple sequence alignment method, using both multiple optimized spaced seeds and new algorithms to handle these seeds efficiently. Our new algorithm processes information of all sequences as a whole and tries to build the alignment vertically, avoiding problems caused by the popular progressive approaches. Because the optimized spaced seeds have proved significantly more sensitive than the consecutive k-mers, the new approach promises to be more accurate and reliable. To validate our new approach, we have implemented MANGO: Multiple Alignment with N Gapped Oligos. Experiments were carried out on large 16S RNA benchmarks, showing that MANGO compares favorably, in both accuracy and speed, against state-of-the-art multiple sequence alignment methods, including ClustalW 1.83, MUSCLE 3.6, MAFFT 5.861, ProbConsRNA 1.11, Dialign 2.2.1, DIALIGN-T 0.2.1, T-Coffee 4.85, POA 2.0, and Kalign 2.0. We have further demonstrated the scalability of MANGO on very large datasets of repeat elements. MANGO can be downloaded at and is free for academic usage.

Download Full-text

MULTIPLE SEQUENCE ALIGNMENT USING AN EXHAUSTIVE AND GREEDY ALGORITHM

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972000500103x ◽

2005 ◽

Vol 03 (02) ◽

pp. 243-255 ◽

Cited By ~ 1

Author(s):

YI WANG ◽

KUO-BIN LI

Keyword(s):

Greedy Algorithm ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Alignment ◽

Initial Alignment ◽

Progressive Alignment ◽

Multiple Sequence ◽

Java Programming ◽

Multiple Alignments ◽

Objective Score

We describe an exhaustive and greedy algorithm for improving the accuracy of multiple sequence alignment. A simple progressive alignment approach is employed to provide initial alignments. The initial alignment is then iteratively optimized against an objective function. For any working alignment, the optimization involves three operations: insertions, deletions and shuffles of gaps. The optimization is exhaustive since the algorithm applies the above operations to all eligible positions of an alignment. It is also greedy since only the operation that gives the best improving objective score will be accepted. The algorithms have been implemented in the EGMA (Exhaustive and Greedy Multiple Alignment) package using Java programming language, and have been evaluated using the BAliBASE benchmark alignment database. Although EGMA is not guaranteed to produce globally optimized alignment, the tests indicate that EGMA is able to build alignments with high quality consistently, compared with other commonly used iterative and non-iterative alignment programs. It is also useful for refining multiple alignments obtained by other methods.

Download Full-text

Parallelization of Pairwise Alignment and Neighbor-Joining Algorithm in Progressive Multiple Sequence Alignment

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v9.i1.pp234-242 ◽

2018 ◽

Vol 9 (1) ◽

pp. 234

Author(s):

Agung Widyo Utomo

Keyword(s):

Shared Memory ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Programming Model ◽

Heuristic Method ◽

Pairwise Alignment ◽

Neighbor Joining ◽

Progressive Alignment ◽

Multiple Sequence ◽

Progressive Multiple Sequence Alignment

Progressive multiple sequence alignment ClustalW is a widely used heuristic method for computing multiple sequence alignment (MSA). It has three stages: distance matrix computation using pairwise alignment, guide tree reconstruction using neighbor-joining and progressive alignment. To accelerate computing for large data, the progressive MSA algorithm needs to be parallelized. This research aims to identify, decompose and implement the pairwise alignment and neighbor-joining in progressive MSA using message passing, shared memory and hybrid programming model in the computer cluster. The experimental results obtained shared memory programming model as the best scenario implementation with speed up up to 12 times.

Download Full-text

A Survey of the State-of-the-Art Parallel Multiple Sequence Alignment Algorithms on Multicore Systems

International Journal of Computer Applications ◽

10.5120/ijca2018917658 ◽

2018 ◽

Vol 182 (12) ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Sara Shehab ◽

Sameh Abdulah ◽

Arabi E.

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

State Of The Art ◽

The State ◽

Multicore Systems ◽

Multiple Sequence ◽

Alignment Algorithms

Download Full-text

Progressive Alignment Method Using Genetic Algorithm for Multiple Sequence Alignment

IEEE Transactions on Evolutionary Computation ◽

10.1109/tevc.2011.2162849 ◽

2012 ◽

Vol 16 (5) ◽

pp. 615-631 ◽

Cited By ~ 36

Author(s):

Farhana Naznin ◽

Ruhul Sarker ◽

Daryl Essam

Keyword(s):

Genetic Algorithm ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Alignment Method ◽

Progressive Alignment ◽

Multiple Sequence

Download Full-text

ProgSIO-MSA: Progressive-based single iterative optimization framework for multiple sequence alignment using an effective scoring system

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720020500055 ◽

2020 ◽

Vol 18 (02) ◽

pp. 2050005

Author(s):

Sanjay Bankapur ◽

Nagamma Patil

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Scoring System ◽

State Of The Art ◽

Biological Sequences ◽

Alignment Quality ◽

Multiple Sequence ◽

Iterative Optimization ◽

Optimization Framework ◽

Proposed Model

Aligning more than two biological sequences is termed multiple sequence alignment (MSA). To analyze biological sequences, MSA is one of the primary activities with potential applications in phylogenetics, homology markers, protein structure prediction, gene regulation, and drug discovery. MSA problem is considered as NP-complete. Moreover, with the advancement of Next-Generation Sequencing techniques, all the gene and protein databases are consistently loaded with a vast amount of raw sequence data which are neither analyzed nor annotated. To analyze these growing volumes of raw sequences, the need of computationally-efficient (polynomial time) models with accurate alignment is high. In this study, a progressive-based alignment model is proposed, named ProgSIO-MSA, which consists of an effective scoring system and an optimization framework. The proposed scoring system aligns sequences effectively using the combination of two scoring strategies, i.e. Look Back Ahead, that scores a residue pair dynamically based on the status information of the previous position to improve the sum-of-pair score, and Position-Residue-Specific Dynamic Gap Penalty, that dynamically penalizes a gap using mutation matrix on the basis of residue and its position information. The proposed single iterative optimization (SIO) framework identifies and optimizes the local optima trap to improve the alignment quality. The proposed model is evaluated against progressive-based state-of-the-art models on two benchmark datasets, i.e. BAliBASE and SABmark. The alignment quality (biological accuracy) of the proposed model is increased by a factor of 17.7% on BAliBASE dataset. The proposed model’s efficiency is compared with state-of-the-art models using time complexity as well as runtime analysis. Wilcoxon signed-rank statistical test results concluded that the quality of the proposed model significantly outperformed progressive-based state-of-the-art models.

Download Full-text

MANGO: A NEW APPROACH TO MULTIPLE SEQUENCE ALIGNMENT

Computational Systems Bioinformatics ◽

10.1142/9781860948732_0026 ◽

2007 ◽

Cited By ~ 1

Author(s):

Zefeng Zhang ◽

Hao Lin ◽

Ming Li

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence ◽

New Approach

Download Full-text

A New Quantum Cuckoo Search Algorithm for Multiple Sequence Alignment

Journal of Intelligent Systems ◽

10.1515/jisys-2013-0052 ◽

2014 ◽

Vol 23 (3) ◽

pp. 261-275 ◽

Cited By ~ 4

Author(s):

Widad Kartous ◽

Abdesslem Layeb ◽

Salim Chikhi

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Search Algorithm ◽

Cuckoo Search ◽

Cuckoo Search Algorithm ◽

Initial Population ◽

Biological Sequences ◽

Alignment Method ◽

Progressive Alignment ◽

Multiple Sequence

AbstractMultiple sequence alignment (MSA) is one of the major problems that can be encountered in the bioinformatics field. MSA consists in aligning a set of biological sequences to extract the similarities between them. Unfortunately, this problem has been shown to be NP-hard. In this article, a new algorithm was proposed to deal with this problem; it is based on a quantum-inspired cuckoo search algorithm. The other feature of the proposed approach is the use of a randomized progressive alignment method based on a hybrid global/local pairwise algorithm to construct the initial population. The results obtained by this hybridization are very encouraging and show the feasibility and effectiveness of the proposed solution.

Download Full-text

UPS: A new approach for multiple sequence alignment using morphing techniques

2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2017.8217686 ◽

2017 ◽

Author(s):

Quoc-Nam Tran ◽

Mike Wallinga

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence ◽

New Approach

Download Full-text

Recursive MAGUS: Scalable and accurate multiple sequence alignment

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008950 ◽

2021 ◽

Vol 17 (10) ◽

pp. e1008950

Author(s):

Vladimir Smirnov

Keyword(s):

Open Source ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

State Of The Art ◽

Sequence Data ◽

Large Datasets ◽

Alignment Accuracy ◽

Multiple Sequence ◽

Large Numbers ◽

Source Form

Multiple sequence alignment tools struggle to keep pace with rapidly growing sequence data, as few methods can handle large datasets while maintaining alignment accuracy. We recently introduced MAGUS, a new state-of-the-art method for aligning large numbers of sequences. In this paper, we present a comprehensive set of enhancements that allow MAGUS to align vastly larger datasets with greater speed. We compare MAGUS to other leading alignment methods on datasets of up to one million sequences. Our results demonstrate the advantages of MAGUS over other alignment software in both accuracy and speed. MAGUS is freely available in open-source form at https://github.com/vlasmirnov/MAGUS.

Download Full-text

Iterative progressive alignment method (IPAM) for multiple sequence alignment

2009 International Conference on Computers & Industrial Engineering ◽

10.1109/iccie.2009.5223562 ◽

2009 ◽

Cited By ~ 1

Author(s):

Farhana Naznin ◽

Ruhul Sarker ◽

Daryl Essam

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Alignment Method ◽

Progressive Alignment ◽

Multiple Sequence

Download Full-text