LemK_MSA: A Multiple Sequence Alignment Method with Sequence Vectorization Based on Lempel-Ziv

2013 ◽  
Vol 284-287 ◽  
pp. 3203-3207 ◽  
Author(s):  
Guo Li Ji ◽  
Jing Ci Yao ◽  
Zi Jiang Yang ◽  
Cong Ting Ye

In this paper, we propose a method for multiple sequence alignment, LemK_MSA, which integrates Lempel-Ziv based sequence vectorization and k-means clustering analysis. LemK_MSA converts multiple sequence alignment into corresponding 10-dimensional vector alignment by 10 types of copy modes. Then it uses k-means algorithm and NJ algorithm to divide the sequences into several groups and calculate guide tree of each part with the vectors of the sequences. A complete guide tree for multiple sequence alignment could be constructed by merging guide tree of every group. Thus, the time efficiency of processing multiple sequence alignment, especially for large-scale sequences, can be improved. The high-throughput mouse antibody sequences are used to validate the proposed method. Compared to ClustalW, MAFFT and Mbed, LemK_MSA is more than ten times efficient while ensuring the alignment accuracy at the same time. LemK_MSA also provides an effective method to analyze the evolutionary relationship and structural features among high-throughput sequences.

2014 ◽  
Vol 31 (2) ◽  
pp. 283-296
Author(s):  
Guoli Ji ◽  
Yong Zeng ◽  
Zijiang Yang ◽  
Congting Ye ◽  
Jingci Yao

Purpose – The time complexity of most multiple sequence alignment algorithm is O(N2) or O(N3) (N is the number of sequences). In addition, with the development of biotechnology, the amount of biological sequences grows significantly. The traditional methods have some difficulties in handling large-scale sequence. The proposed Lemk_MSA method aims to reduce the time complexity, especially for large-scale sequences. At the same time, it can keep similar accuracy level compared to the traditional methods. Design/methodology/approach – LemK_MSA converts multiple sequence alignment into corresponding 10D vector alignment by ten types of copy modes based on Lempel-Ziv. Then, it uses k-means algorithm and NJ algorithm to divide the sequences into several groups and calculate guide tree of each group. A complete guide tree for multiple sequence alignment could be constructed by merging guide tree of every group. Moreover, for large-scale multiple sequence, Lemk_MSA proposes a GPU-based parallel way for distance matrix calculation. Findings – Under this approach, the time efficiency to process multiple sequence alignment can be improved. The high-throughput mouse antibody sequences are used to validate the proposed method. Compared to ClustalW, MAFFT and Mbed, LemK_MSA is more than ten times efficient while ensuring the alignment accuracy at the same time. Originality/value – This paper proposes a novel method with sequence vectorization for multiple sequence alignment based on Lempel-Ziv. A GPU-based parallel method has been designed for large-scale distance matrix calculation. It provides a new way for multiple sequence alignment research.


2016 ◽  
Vol 7 (3) ◽  
pp. 36-55 ◽  
Author(s):  
El-amine Zemali ◽  
Abdelmadjid Boukra

One of the most challenging tasks in bioinformatics is the resolution of Multiple Sequence Alignment (MSA) problem. It consists in comparing a set of protein or DNA sequences, in aim of predicting their structure and function. This paper introduces a new bio-inspired approach to solve such problem. This approach named BA-MSA is based on Bat Algorithm. Bat Algorithm (BA) is a recent evolutionary algorithm inspired from Bats behavior seeking their prey. The proposed approach includes new mechanism to generate initial population. It consists in generating a guide tree for each solution with progressive approach by varying some parameters. The generated guide tree will be enhanced by Hill-Climbing algorithm. In addition, to deal with the premature convergence of BA, a new restart technique is proposed to introduce more diversification when detecting premature convergence. Balibase 2.0 datasets are used for experiments. The comparison with well-known methods as MSA-GA MSA-GA (w\prealign), ClustalW, and SAGA and recent method (BBOMP) shows the effectiveness of the proposed approach.


PLoS Currents ◽  
2011 ◽  
Vol 2 ◽  
pp. RRN1198 ◽  
Author(s):  
Kevin Liu ◽  
C. Randal Linder ◽  
Tandy Warnow

2021 ◽  
pp. 560-575
Author(s):  
Rodrigo A. de O. Siqueira ◽  
Marco A. Stefanes ◽  
Luiz C. S. Rozante ◽  
David C. Martins-Jr ◽  
Jorge E. S. de Souza ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document