ProgSIO-MSA: Progressive-based single iterative optimization framework for multiple sequence alignment using an effective scoring system

Aligning more than two biological sequences is termed multiple sequence alignment (MSA). To analyze biological sequences, MSA is one of the primary activities with potential applications in phylogenetics, homology markers, protein structure prediction, gene regulation, and drug discovery. MSA problem is considered as NP-complete. Moreover, with the advancement of Next-Generation Sequencing techniques, all the gene and protein databases are consistently loaded with a vast amount of raw sequence data which are neither analyzed nor annotated. To analyze these growing volumes of raw sequences, the need of computationally-efficient (polynomial time) models with accurate alignment is high. In this study, a progressive-based alignment model is proposed, named ProgSIO-MSA, which consists of an effective scoring system and an optimization framework. The proposed scoring system aligns sequences effectively using the combination of two scoring strategies, i.e. Look Back Ahead, that scores a residue pair dynamically based on the status information of the previous position to improve the sum-of-pair score, and Position-Residue-Specific Dynamic Gap Penalty, that dynamically penalizes a gap using mutation matrix on the basis of residue and its position information. The proposed single iterative optimization (SIO) framework identifies and optimizes the local optima trap to improve the alignment quality. The proposed model is evaluated against progressive-based state-of-the-art models on two benchmark datasets, i.e. BAliBASE and SABmark. The alignment quality (biological accuracy) of the proposed model is increased by a factor of 17.7% on BAliBASE dataset. The proposed model’s efficiency is compared with state-of-the-art models using time complexity as well as runtime analysis. Wilcoxon signed-rank statistical test results concluded that the quality of the proposed model significantly outperformed progressive-based state-of-the-art models.

Download Full-text

Multiple sequence alignment quality comparison in T-Coffee, MUSCLE and M-Coffee based on different benchmarks

Cumhuriyet Science Journal ◽

10.17776/csj.842265 ◽

2021 ◽

Vol 42 (3) ◽

pp. 526-535

Author(s):

Tuğcan KORAK ◽

Fırat AŞIR ◽

Esin IŞIK ◽

Nur CENGİZ

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Alignment Quality ◽

Multiple Sequence

Download Full-text

A Survey of the State-of-the-Art Parallel Multiple Sequence Alignment Algorithms on Multicore Systems

International Journal of Computer Applications ◽

10.5120/ijca2018917658 ◽

2018 ◽

Vol 182 (12) ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Sara Shehab ◽

Sameh Abdulah ◽

Arabi E.

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

State Of The Art ◽

The State ◽

Multicore Systems ◽

Multiple Sequence ◽

Alignment Algorithms

Download Full-text

A New Quantum Cuckoo Search Algorithm for Multiple Sequence Alignment

Journal of Intelligent Systems ◽

10.1515/jisys-2013-0052 ◽

2014 ◽

Vol 23 (3) ◽

pp. 261-275 ◽

Cited By ~ 4

Author(s):

Widad Kartous ◽

Abdesslem Layeb ◽

Salim Chikhi

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Search Algorithm ◽

Cuckoo Search ◽

Cuckoo Search Algorithm ◽

Initial Population ◽

Biological Sequences ◽

Alignment Method ◽

Progressive Alignment ◽

Multiple Sequence

AbstractMultiple sequence alignment (MSA) is one of the major problems that can be encountered in the bioinformatics field. MSA consists in aligning a set of biological sequences to extract the similarities between them. Unfortunately, this problem has been shown to be NP-hard. In this article, a new algorithm was proposed to deal with this problem; it is based on a quantum-inspired cuckoo search algorithm. The other feature of the proposed approach is the use of a randomized progressive alignment method based on a hybrid global/local pairwise algorithm to construct the initial population. The results obtained by this hybridization are very encouraging and show the feasibility and effectiveness of the proposed solution.

Download Full-text

A Multi-objective Optimization Framework for Multiple Sequence Alignment with Metaheuristics

Bioinformatics and Biomedical Engineering - Lecture Notes in Computer Science ◽

10.1007/978-3-319-56154-7_23 ◽

2017 ◽

pp. 245-256 ◽

Cited By ~ 4

Author(s):

Cristian Zambrano-Vega ◽

Antonio J. Nebro ◽

José García-Nieto ◽

José F. Aldana-Montes

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multi Objective Optimization ◽

Multiple Sequence ◽

Multi Objective ◽

Optimization Framework

Download Full-text

Multiple sequence alignment using enhanced bird swarm align algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210055 ◽

2021 ◽

pp. 1-18

Author(s):

Hafiz Asadul Rehman ◽

Kashif Zafar ◽

Ayesha Khan ◽

Abdullah Imtiaz

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Research Area ◽

Evolutionary Information ◽

Optimal Alignment ◽

Biological Sequences ◽

Multiple Sequence ◽

Alignment Problem ◽

Computationally Expensive ◽

Bird Swarm Algorithm

Discovering structural, functional and evolutionary information in biological sequences have been considered as a core research area in Bioinformatics. Multiple Sequence Alignment (MSA) tries to align all sequences in a given query set to provide us ease in annotation of new sequences. Traditional methods to find the optimal alignment are computationally expensive in real time. This research presents an enhanced version of Bird Swarm Algorithm (BSA), based on bio inspired optimization. Enhanced Bird Swarm Align Algorithm (EBSAA) is proposed for multiple sequence alignment problem to determine the optimal alignment among different sequences. Twenty-one different datasets have been used in order to compare performance of EBSAA with Genetic Algorithm (GA) and Particle Swarm Align Algorithm (PSAA). The proposed technique results in better alignment as compared to GA and PSAA in most of the cases.

Download Full-text

Recursive MAGUS: Scalable and accurate multiple sequence alignment

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008950 ◽

2021 ◽

Vol 17 (10) ◽

pp. e1008950

Author(s):

Vladimir Smirnov

Keyword(s):

Open Source ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

State Of The Art ◽

Sequence Data ◽

Large Datasets ◽

Alignment Accuracy ◽

Multiple Sequence ◽

Large Numbers ◽

Source Form

Multiple sequence alignment tools struggle to keep pace with rapidly growing sequence data, as few methods can handle large datasets while maintaining alignment accuracy. We recently introduced MAGUS, a new state-of-the-art method for aligning large numbers of sequences. In this paper, we present a comprehensive set of enhancements that allow MAGUS to align vastly larger datasets with greater speed. We compare MAGUS to other leading alignment methods on datasets of up to one million sequences. Our results demonstrate the advantages of MAGUS over other alignment software in both accuracy and speed. MAGUS is freely available in open-source form at https://github.com/vlasmirnov/MAGUS.

Download Full-text

Recursive MAGUS: scalable and accurate multiple sequence alignment

10.1101/2021.04.09.439137 ◽

2021 ◽

Author(s):

Vladimir Smirnov

Keyword(s):

Open Source ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

State Of The Art ◽

Sequence Data ◽

Large Datasets ◽

Alignment Accuracy ◽

Multiple Sequence ◽

Large Numbers ◽

Source Form

Download Full-text

CONSENT: Scalable long read self-correction and assembly polishing with multiple sequence alignment

10.1101/546630 ◽

2019 ◽

Cited By ~ 6

Author(s):

Pierre Morisse ◽

Camille Marchet ◽

Antoine Limasset ◽

Thierry Lecroq ◽

Arnaud Lefebvre

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

State Of The Art ◽

Error Rates ◽

Multiple Sequence ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Generation Sequencing

MotivationThird-generation sequencing technologies Pacific Biosciences and Oxford Nanopore allow the sequencing of long reads of tens of kbp, that are expected to solve various problems, such as contig and haplotype assembly, scaffolding, and structural variant calling. However, they also display high error rates that can reach 10 to 30%, for basic ONT and non-CCS PacBio reads. As a result, error correction is often the first step of projects dealing with long reads. As first long reads sequencing experiments produced reads displaying error rates higher than 15% on average, most methods relied on the complementary use of short reads data to perform correction, in a hybrid approach. However, these sequencing technologies evolve fast, and the error rate of the long reads now reaches 10 to 12%. As a result, self-correction is now frequently used as the first step of third-generation sequencing data analysis projects. As of today, efficient tools allowing to perform self-correction of the long reads are available, and recent observations suggest that avoiding the use of second-generation sequencing reads could bypass their inherent bias.ResultsWe introduce CONSENT, a new method for the self-correction of long reads that combines different strategies from the state-of-the-art. More precisely, we combine a multiple sequence alignment strategy with the use of local de Bruijn graphs. Moreover, the multiple sequence alignment benefits from an efficient segmentation strategy based on k-mer chaining, which allows a considerable speed improvement. Our experiments show that CONSENT compares well to the latest state-of-the-art self-correction methods, and even outperforms them on real Oxford Nanopore datasets. In particular, they show that CONSENT is the only method able to efficiently scale to the correction of Oxford Nanopore ultra-long reads, and is able to process a full human dataset, containing reads reaching lengths up to 1.5 Mbp, in 15 days. Additionally, CONSENT also implements an assembly polishing feature, and is thus able to correct errors directly from raw long read assemblies. Our experiments show that CONSENT outperforms state-of-the-art polishing tools in terms of resource consumption, and provides comparable results. Moreover, we also show that, for a full human dataset, assembling the raw data and polishing the assembly afterwards is less time consuming than assembling the corrected reads, while providing better quality results.Availability and implementationCONSENT is implemented in C++, supported on Linux platforms and freely available at https://github.com/morispi/[email protected]

Download Full-text

Exact Multiple Sequence Alignment by Synchronized Decision Diagrams

INFORMS Journal on Computing ◽

10.1287/ijoc.2019.0937 ◽

2020 ◽

Author(s):

Amin Hosseininasab ◽

Willem-Jan van Hoeve

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

State Of The Art ◽

Mixed Integer ◽

Mixed Integer Program ◽

Second Phase ◽

Polynomial Space ◽

Sequence Alignments ◽

Multiple Sequence ◽

First Time

This paper develops an exact solution algorithm for the multiple sequence alignment (MSA) problem. In the first step, we design a dynamic programming model and use it to construct a novel multivalued decision diagram (MDD) representation of all pairwise sequence alignments (PSA). PSA MDDs are then synchronized using side constraints to model the MSA problem as a mixed-integer program (MIP), for the first time, in polynomial space complexity. Two bound-based filtering procedures are developed to reduce the size of the MDDs, and the resulting MIP is solved using logic-based Benders decomposition. For a more effective algorithm, we develop a two-phase solution approach. In the first phase, we use optimistic filtering to quickly obtain a near-optimal bound, which we then use for exact filtering in the second phase to prove or obtain an optimal solution. Numerical results on benchmark instances show that our algorithm solves several instances to optimality for the first time, and, in case optimality cannot be proven, considerably improves upon a state-of-the-art heuristic MSA solver. Comparison with an existing state-of-the-art exact MSA algorithm shows that our approach is more time efficient and yields significantly smaller optimality gaps.

Download Full-text

MANGO: MULTIPLE ALIGNMENT WITH N GAPPED OLIGOS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720008003527 ◽

2008 ◽

Vol 06 (03) ◽

pp. 521-541 ◽

Cited By ~ 2

Author(s):

ZEFENG ZHANG ◽

HAO LIN ◽

MING LI

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

State Of The Art ◽

Multiple Alignment ◽

Progressive Alignment ◽

Multiple Sequence ◽

New Approach ◽

Repeat Elements ◽

Spaced Seeds ◽

16S Rna

Multiple sequence alignment is a classical and challenging task. The problem is NP-hard. The full dynamic programming takes too much time. The progressive alignment heuristics adopted by most state-of-the-art works suffer from the "once a gap, always a gap" phenomenon. Is there a radically new way to do multiple sequence alignment? In this paper, we introduce a novel and orthogonal multiple sequence alignment method, using both multiple optimized spaced seeds and new algorithms to handle these seeds efficiently. Our new algorithm processes information of all sequences as a whole and tries to build the alignment vertically, avoiding problems caused by the popular progressive approaches. Because the optimized spaced seeds have proved significantly more sensitive than the consecutive k-mers, the new approach promises to be more accurate and reliable. To validate our new approach, we have implemented MANGO: Multiple Alignment with N Gapped Oligos. Experiments were carried out on large 16S RNA benchmarks, showing that MANGO compares favorably, in both accuracy and speed, against state-of-the-art multiple sequence alignment methods, including ClustalW 1.83, MUSCLE 3.6, MAFFT 5.861, ProbConsRNA 1.11, Dialign 2.2.1, DIALIGN-T 0.2.1, T-Coffee 4.85, POA 2.0, and Kalign 2.0. We have further demonstrated the scalability of MANGO on very large datasets of repeat elements. MANGO can be downloaded at and is free for academic usage.

Download Full-text