ProPIP: a tool for progressive multiple sequence alignment with Poisson Indel Process

Abstract Background Current alignment tools typically lack an explicit model of indel evolution, leading to artificially short inferred alignments (i.e., over-alignment) due to inconsistencies between the indel history and the phylogeny relating the input sequences. Results We present a new progressive multiple sequence alignment tool ProPIP. The process of insertions and deletions is described using an explicit evolutionary model—the Poisson Indel Process or PIP. The method is based on dynamic programming and is implemented in a frequentist framework. The source code can be compiled on Linux, macOS and Microsoft Windows platforms. The algorithm is implemented in C++ as standalone program. The source code is freely available on GitHub at https://github.com/acg-team/ProPIP and is distributed under the terms of the GNU GPL v3 license. Conclusions The use of an explicit indel evolution model allows to avoid over-alignment, to infer gaps in a phylogenetically consistent way and to make inferences about the rates of insertions and deletions. Instead of the arbitrary gap penalties, the parameters used by ProPIP are the insertion and deletion rates, which have biological interpretation and are contextualized in a probabilistic environment. As a result, indel rate settings may be optimised in order to infer phylogenetically meaningful gap patterns.

Download Full-text

Constrained Multiple Sequence Alignment Tool Development and Its Application to RNase Family Alignment

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720003000095 ◽

2003 ◽

Vol 01 (02) ◽

pp. 267-287 ◽

Cited By ~ 33

Author(s):

Chuan Yi Tang ◽

Chin Lung Lu ◽

Margaret Dah-Tsyr Chang ◽

Yin-Te Tsai ◽

Yuh-Ju Sun ◽

...

Keyword(s):

Sequence Alignment ◽

Heuristic Algorithm ◽

Multiple Sequence Alignment ◽

Time Complexity ◽

Software System ◽

Tool Development ◽

Multiple Sequence ◽

Rna Molecules ◽

Alignment Tool ◽

Multiple Sequence Alignment Tool

In this paper, we design a heuristic algorithm of computing a constrained multiple sequence alignment (CMSA for short) for guaranteeing that the generated alignment satisfies the user-specified constraints that some particular residues should be aligned together. If the number of residues needed to be aligned together is a constant α, then the time-complexity of our CMSA algorithm for aligning K sequences is O(αKn4), where n is the maximum of the lengths of sequences. In addition, we have built up such a CMSA software system and made several experiments on the RNase sequences, which mainly function in catalyzing the degradation of RNA molecules. The resulting alignments illustrate the practicability of our method.

Download Full-text

ViralMSA: Massively scalable reference-guided multiple sequence alignment of viral genomes

10.1101/2020.04.20.052068 ◽

2020 ◽

Cited By ~ 1

Author(s):

Niema Moshiri

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Genomic Sequence ◽

Sequence Data ◽

Software Project ◽

Multiple Sequence ◽

Viral Genomes ◽

Alignment Tool ◽

Multiple Sequence Alignment Tool ◽

Algorithmic Techniques

AbstractMotivationIn molecular epidemiology, the identification of clusters of transmissions typically requires the alignment of viral genomic sequence data. However, existing methods of multiple sequence alignment scale poorly with respect to the number of sequences.ResultsViralMSA is a user-friendly reference-guided multiple sequence alignment tool that leverages the algorithmic techniques of read mappers to enable the multiple sequence alignment of ultra-large viral genome datasets. It scales linearly with the number of sequences, and it is able to align tens of thousands of full viral genomes in seconds.AvailabilityViralMSA is freely available at https://github.com/niemasd/ViralMSA as an open-source software [email protected]

Download Full-text

TM-Aligner: Multiple sequence alignment tool for transmembrane proteins with reduced time and improved accuracy

Scientific Reports ◽

10.1038/s41598-017-13083-y ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 7

Author(s):

Basharat Bhat ◽

Nazir A. Ganai ◽

Syed Mudasir Andrabi ◽

Riaz A. Shah ◽

Ashutosh Singh

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Transmembrane Proteins ◽

Multiple Sequence ◽

Alignment Tool ◽

Multiple Sequence Alignment Tool ◽

Improved Accuracy ◽

Reduced Time

Download Full-text

PnpProbs: a better multiple sequence alignment tool by better handling of guide trees

BMC Bioinformatics ◽

10.1186/s12859-016-1121-7 ◽

2016 ◽

Vol 17 (S8) ◽

Author(s):

Yongtao Ye ◽

Tak-Wah Lam ◽

Hing-Fung Ting

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence ◽

Alignment Tool ◽

Multiple Sequence Alignment Tool ◽

Guide Trees

Download Full-text

Match-Box_server: a multiple sequence alignment tool placing emphasis on reliability

Bioinformatics ◽

10.1093/bioinformatics/13.3.249 ◽

1997 ◽

Vol 13 (3) ◽

pp. 249-256 ◽

Cited By ~ 9

Author(s):

Eric Depiereux ◽

Guy Baudoux ◽

Pascal Briffeuil ◽

Isabelle Reginster ◽

Xavier De Bolle ◽

...

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence ◽

Alignment Tool ◽

Multiple Sequence Alignment Tool

Download Full-text

CUDA-Parttree: A Multiple Sequence Alignment Parallel Strategy in GPU

10.5753/wscad.2019.8662 ◽

2019 ◽

Author(s):

Caina Razzolini ◽

Alba Melo

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Execution Time ◽

Distance Matrix ◽

Data Conversion ◽

Multiple Sequence ◽

Alignment Tool ◽

Multiple Sequence Alignment Tool ◽

Matrix Calculation ◽

Parallel Strategy

In this paper, we propose and evaluate CUDA-Parttree, a parallel strategy that executes the first phase of the MAFFT Parttree Multiple Sequence Alignment tool (distance matrix calculation with 6mers) on GPU. When compared to Parttree, CUDA-Parttree obtained a speedup of 6.10x on the distance matrix calculation for the Cyclodex gly tran (50, 280 sequences) set, reducing the execution time from 33.94s to 5.57s. Including data conversion and movement to/from the GPU, the speedup was 2.59x. With the sequence set Syn 100000 (100, 000 sequences), a speedup of 4.46x was attained, reducing execution time from 209.54s to 47.00s.

Download Full-text

Constrained multiple sequence alignment tool development and its application to RNase family alignment

Proceedings. IEEE Computer Society Bioinformatics Conference ◽

10.1109/csb.2002.1039336 ◽

2003 ◽

Cited By ~ 10

Author(s):

Chuan Yi Tang ◽

Chin Lung Lu ◽

M.D.-T. Chang ◽

Yin-Te Tsai ◽

Yuh-Ju Sun ◽

...

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Tool Development ◽

Multiple Sequence ◽

Alignment Tool ◽

Multiple Sequence Alignment Tool

Download Full-text

Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model

BMC Bioinformatics ◽

10.1186/1471-2105-11-464 ◽

2010 ◽

Vol 11 (1) ◽

Cited By ~ 2

Author(s):

Gayathri Jayaraman ◽

Rahul Siddharthan

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Evolutionary Model ◽

Multiple Sequence

Download Full-text

Progressive multiple sequence alignment with indel evolution

BMC Bioinformatics ◽

10.1186/s12859-018-2357-1 ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 1

Author(s):

Massimo Maiolo ◽

Xiaolei Zhang ◽

Manuel Gil ◽

Maria Anisimova

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence ◽

Progressive Multiple Sequence Alignment

Download Full-text

Parallelization of Pairwise Alignment and Neighbor-Joining Algorithm in Progressive Multiple Sequence Alignment

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v9.i1.pp234-242 ◽

2018 ◽

Vol 9 (1) ◽

pp. 234

Author(s):

Agung Widyo Utomo

Keyword(s):

Shared Memory ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Programming Model ◽

Heuristic Method ◽

Pairwise Alignment ◽

Neighbor Joining ◽

Progressive Alignment ◽

Multiple Sequence ◽

Progressive Multiple Sequence Alignment

Progressive multiple sequence alignment ClustalW is a widely used heuristic method for computing multiple sequence alignment (MSA). It has three stages: distance matrix computation using pairwise alignment, guide tree reconstruction using neighbor-joining and progressive alignment. To accelerate computing for large data, the progressive MSA algorithm needs to be parallelized. This research aims to identify, decompose and implement the pairwise alignment and neighbor-joining in progressive MSA using message passing, shared memory and hybrid programming model in the computer cluster. The experimental results obtained shared memory programming model as the best scenario implementation with speed up up to 12 times.

Download Full-text