Constrained Multiple Sequence Alignment Tool Development and Its Application to RNase Family Alignment

In this paper, we design a heuristic algorithm of computing a constrained multiple sequence alignment (CMSA for short) for guaranteeing that the generated alignment satisfies the user-specified constraints that some particular residues should be aligned together. If the number of residues needed to be aligned together is a constant α, then the time-complexity of our CMSA algorithm for aligning K sequences is O(αKn4), where n is the maximum of the lengths of sequences. In addition, we have built up such a CMSA software system and made several experiments on the RNase sequences, which mainly function in catalyzing the degradation of RNA molecules. The resulting alignments illustrate the practicability of our method.

Download Full-text

Constrained multiple sequence alignment tool development and its application to RNase family alignment

Proceedings. IEEE Computer Society Bioinformatics Conference ◽

10.1109/csb.2002.1039336 ◽

2003 ◽

Cited By ~ 10

Author(s):

Chuan Yi Tang ◽

Chin Lung Lu ◽

M.D.-T. Chang ◽

Yin-Te Tsai ◽

Yuh-Ju Sun ◽

...

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Tool Development ◽

Multiple Sequence ◽

Alignment Tool ◽

Multiple Sequence Alignment Tool

Download Full-text

ViralMSA: Massively scalable reference-guided multiple sequence alignment of viral genomes

10.1101/2020.04.20.052068 ◽

2020 ◽

Cited By ~ 1

Author(s):

Niema Moshiri

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Genomic Sequence ◽

Sequence Data ◽

Software Project ◽

Multiple Sequence ◽

Viral Genomes ◽

Alignment Tool ◽

Multiple Sequence Alignment Tool ◽

Algorithmic Techniques

AbstractMotivationIn molecular epidemiology, the identification of clusters of transmissions typically requires the alignment of viral genomic sequence data. However, existing methods of multiple sequence alignment scale poorly with respect to the number of sequences.ResultsViralMSA is a user-friendly reference-guided multiple sequence alignment tool that leverages the algorithmic techniques of read mappers to enable the multiple sequence alignment of ultra-large viral genome datasets. It scales linearly with the number of sequences, and it is able to align tens of thousands of full viral genomes in seconds.AvailabilityViralMSA is freely available at https://github.com/niemasd/ViralMSA as an open-source software [email protected]

Download Full-text

TM-Aligner: Multiple sequence alignment tool for transmembrane proteins with reduced time and improved accuracy

Scientific Reports ◽

10.1038/s41598-017-13083-y ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 7

Author(s):

Basharat Bhat ◽

Nazir A. Ganai ◽

Syed Mudasir Andrabi ◽

Riaz A. Shah ◽

Ashutosh Singh

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Transmembrane Proteins ◽

Multiple Sequence ◽

Alignment Tool ◽

Multiple Sequence Alignment Tool ◽

Improved Accuracy ◽

Reduced Time

Download Full-text

PnpProbs: a better multiple sequence alignment tool by better handling of guide trees

BMC Bioinformatics ◽

10.1186/s12859-016-1121-7 ◽

2016 ◽

Vol 17 (S8) ◽

Author(s):

Yongtao Ye ◽

Tak-Wah Lam ◽

Hing-Fung Ting

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence ◽

Alignment Tool ◽

Multiple Sequence Alignment Tool ◽

Guide Trees

Download Full-text

Match-Box_server: a multiple sequence alignment tool placing emphasis on reliability

Bioinformatics ◽

10.1093/bioinformatics/13.3.249 ◽

1997 ◽

Vol 13 (3) ◽

pp. 249-256 ◽

Cited By ~ 9

Author(s):

Eric Depiereux ◽

Guy Baudoux ◽

Pascal Briffeuil ◽

Isabelle Reginster ◽

Xavier De Bolle ◽

...

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence ◽

Alignment Tool ◽

Multiple Sequence Alignment Tool

Download Full-text

CUDA-Parttree: A Multiple Sequence Alignment Parallel Strategy in GPU

10.5753/wscad.2019.8662 ◽

2019 ◽

Author(s):

Caina Razzolini ◽

Alba Melo

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Execution Time ◽

Distance Matrix ◽

Data Conversion ◽

Multiple Sequence ◽

Alignment Tool ◽

Multiple Sequence Alignment Tool ◽

Matrix Calculation ◽

Parallel Strategy

In this paper, we propose and evaluate CUDA-Parttree, a parallel strategy that executes the first phase of the MAFFT Parttree Multiple Sequence Alignment tool (distance matrix calculation with 6mers) on GPU. When compared to Parttree, CUDA-Parttree obtained a speedup of 6.10x on the distance matrix calculation for the Cyclodex gly tran (50, 280 sequences) set, reducing the execution time from 33.94s to 5.57s. Including data conversion and movement to/from the GPU, the speedup was 2.59x. With the sequence set Syn 100000 (100, 000 sequences), a speedup of 4.46x was attained, reducing execution time from 209.54s to 47.00s.

Download Full-text

ProPIP: a tool for progressive multiple sequence alignment with Poisson Indel Process

BMC Bioinformatics ◽

10.1186/s12859-021-04442-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Massimo Maiolo ◽

Lorenzo Gatti ◽

Diego Frei ◽

Tiziano Leidi ◽

Manuel Gil ◽

...

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Source Code ◽

Evolutionary Model ◽

Multiple Sequence ◽

Insertions And Deletions ◽

Alignment Tool ◽

Multiple Sequence Alignment Tool ◽

Biological Interpretation ◽

Progressive Multiple Sequence Alignment

Abstract Background Current alignment tools typically lack an explicit model of indel evolution, leading to artificially short inferred alignments (i.e., over-alignment) due to inconsistencies between the indel history and the phylogeny relating the input sequences. Results We present a new progressive multiple sequence alignment tool ProPIP. The process of insertions and deletions is described using an explicit evolutionary model—the Poisson Indel Process or PIP. The method is based on dynamic programming and is implemented in a frequentist framework. The source code can be compiled on Linux, macOS and Microsoft Windows platforms. The algorithm is implemented in C++ as standalone program. The source code is freely available on GitHub at https://github.com/acg-team/ProPIP and is distributed under the terms of the GNU GPL v3 license. Conclusions The use of an explicit indel evolution model allows to avoid over-alignment, to infer gaps in a phylogenetically consistent way and to make inferences about the rates of insertions and deletions. Instead of the arbitrary gap penalties, the parameters used by ProPIP are the insertion and deletion rates, which have biological interpretation and are contextualized in a probabilistic environment. As a result, indel rate settings may be optimised in order to infer phylogenetically meaningful gap patterns.

Download Full-text

A Parallel Multiobjective Metaheuristic for Multiple Sequence Alignment

10.1101/103101 ◽

2017 ◽

Author(s):

Álvaro Rubio-Largo ◽

Leonardo Vanneschi ◽

Mauro Castelli ◽

Miguel A. Vega-Rodríguez

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Time Complexity ◽

Optimization Problem ◽

Optimal Alignment ◽

Multiple Sequence ◽

Parallel Metaheuristics ◽

Parallel Performance ◽

Parallel Version ◽

The Comparative Study

AbstractThe alignment among three or more nucleotides/amino-acids sequences at the same time is known as Multiple Sequence Alignment (MSA), an NP-hard optimization problem. The time complexity of finding an optimal alignment raises exponentially when the number of sequences to align increases. In this work, we deal with a multiobjective version of the MSA problem where the goal is to simultaneously optimize the accuracy and conservation of the alignment. A parallel version of the Hybrid Multiobjective Memetic Metaheuristics for Multiple Sequence Alignment is proposed. In order to evaluate the parallel performance of our proposal, we have selected a pull of datasets with different number of sequences (up to 1000 sequences) and study its parallel performance against other well-known parallel metaheuristics published in the literature, such as MSAProbs, T-Coffee, Clustal Ω, and MAFFT. The comparative study reveals that our parallel aligner is around 25 times faster than the sequential version with 32 cores, obtaining a parallel efficiency around 80%.

Download Full-text

A multiple sequence alignment method with sequence vectorization

Engineering Computations ◽

10.1108/ec-01-2013-0026 ◽

2014 ◽

Vol 31 (2) ◽

pp. 283-296

Author(s):

Guoli Ji ◽

Yong Zeng ◽

Zijiang Yang ◽

Congting Ye ◽

Jingci Yao

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Time Complexity ◽

Large Scale ◽

Distance Matrix ◽

Traditional Methods ◽

Multiple Sequence ◽

Guide Tree ◽

Content Type ◽

Matrix Calculation

Purpose – The time complexity of most multiple sequence alignment algorithm is O(N2) or O(N3) (N is the number of sequences). In addition, with the development of biotechnology, the amount of biological sequences grows significantly. The traditional methods have some difficulties in handling large-scale sequence. The proposed Lemk_MSA method aims to reduce the time complexity, especially for large-scale sequences. At the same time, it can keep similar accuracy level compared to the traditional methods. Design/methodology/approach – LemK_MSA converts multiple sequence alignment into corresponding 10D vector alignment by ten types of copy modes based on Lempel-Ziv. Then, it uses k-means algorithm and NJ algorithm to divide the sequences into several groups and calculate guide tree of each group. A complete guide tree for multiple sequence alignment could be constructed by merging guide tree of every group. Moreover, for large-scale multiple sequence, Lemk_MSA proposes a GPU-based parallel way for distance matrix calculation. Findings – Under this approach, the time efficiency to process multiple sequence alignment can be improved. The high-throughput mouse antibody sequences are used to validate the proposed method. Compared to ClustalW, MAFFT and Mbed, LemK_MSA is more than ten times efficient while ensuring the alignment accuracy at the same time. Originality/value – This paper proposes a novel method with sequence vectorization for multiple sequence alignment based on Lempel-Ziv. A GPU-based parallel method has been designed for large-scale distance matrix calculation. It provides a new way for multiple sequence alignment research.

Download Full-text

Progressive Multiple Sequence Alignment with the Poisson Indel Process

10.1101/123513 ◽

2017 ◽

Author(s):

Massimo Maiolo ◽

Xiaolei Zhang ◽

Manuel Gil ◽

Maria Anisimova

Keyword(s):

Phylogenetic Tree ◽

Sequence Alignment ◽

Polynomial Time ◽

Multiple Sequence Alignment ◽

Time Complexity ◽

Marginal Likelihood ◽

Mathematical Formulation ◽

Dynamic Programming Algorithm ◽

Sequence Alignments ◽

Multiple Sequence

AbstractSequence alignment lies at the heart of many evolutionary and comparative genomics studies. However, the optimal alignment of multiple sequences is NP-hard, so that exact algorithms become impractical for more than a few sequences. Thus, state of the art alignment methods employ progressive heuristics, breaking the problem into a series of pairwise alignments guided by a phylogenetic tree. Changes between homologous characters are typically modelled by a continuous-time Markov substitution model. In contrast, the dynamics of insertions and deletions (indels) are not modelled explicitly, because the computation of the marginal likelihood under such models has exponential time complexity in the number of taxa. Recently, Bouchard-Côté and Jordan [PNAS (2012) 110(4):1160–1166] have introduced a modification to a classical indel model, describing indel evolution on a phylogenetic tree as a Poisson process. The model termed PIP allows to compute the joint marginal probability of a multiple sequence alignment and a tree in linear time. Here, we present an new dynamic programming algorithm to align two multiple sequence alignments by maximum likelihood in polynomial time under PIP, and apply it a in progressive algorithm. To our knowledge, this is the first progressive alignment method using a rigorous mathematical formulation of an evolutionary indel process and with polynomial time complexity.

Download Full-text