Using a Bio-Inspired Algorithm to Resolve the Multiple Sequence Alignment Problem

2016 ◽  
Vol 7 (3) ◽  
pp. 36-55 ◽  
Author(s):  
El-amine Zemali ◽  
Abdelmadjid Boukra

One of the most challenging tasks in bioinformatics is the resolution of Multiple Sequence Alignment (MSA) problem. It consists in comparing a set of protein or DNA sequences, in aim of predicting their structure and function. This paper introduces a new bio-inspired approach to solve such problem. This approach named BA-MSA is based on Bat Algorithm. Bat Algorithm (BA) is a recent evolutionary algorithm inspired from Bats behavior seeking their prey. The proposed approach includes new mechanism to generate initial population. It consists in generating a guide tree for each solution with progressive approach by varying some parameters. The generated guide tree will be enhanced by Hill-Climbing algorithm. In addition, to deal with the premature convergence of BA, a new restart technique is proposed to introduce more diversification when detecting premature convergence. Balibase 2.0 datasets are used for experiments. The comparison with well-known methods as MSA-GA MSA-GA (w\prealign), ClustalW, and SAGA and recent method (BBOMP) shows the effectiveness of the proposed approach.

2015 ◽  
Vol 13 (04) ◽  
pp. 1550016 ◽  
Author(s):  
El-Amine Zemali ◽  
Abdelmadjid Boukra

The multiple sequence alignment (MSA) is one of the most challenging problems in bioinformatics, it involves discovering similarity between a set of protein or DNA sequences. This paper introduces a new method for the MSA problem called biogeography-based optimization with multiple populations (BBOMP). It is based on a recent metaheuristic inspired from the mathematics of biogeography named biogeography-based optimization (BBO). To improve the exploration ability of BBO, we have introduced a new concept allowing better exploration of the search space. It consists of manipulating multiple populations having each one its own parameters. These parameters are used to build up progressive alignments allowing more diversity. At each iteration, the best found solution is injected in each population. Moreover, to improve solution quality, six operators are defined. These operators are selected with a dynamic probability which changes according to the operators efficiency. In order to test proposed approach performance, we have considered a set of datasets from Balibase 2.0 and compared it with many recent algorithms such as GAPAM, MSA-GA, QEAMSA and RBT-GA. The results show that the proposed approach achieves better average score than the previously cited methods.


2012 ◽  
Vol 38 ◽  
pp. 1783-1787 ◽  
Author(s):  
Sudipta Mulia ◽  
Debahuti Mishra ◽  
Tanushree Jena

2014 ◽  
Vol 23 (3) ◽  
pp. 261-275 ◽  
Author(s):  
Widad Kartous ◽  
Abdesslem Layeb ◽  
Salim Chikhi

AbstractMultiple sequence alignment (MSA) is one of the major problems that can be encountered in the bioinformatics field. MSA consists in aligning a set of biological sequences to extract the similarities between them. Unfortunately, this problem has been shown to be NP-hard. In this article, a new algorithm was proposed to deal with this problem; it is based on a quantum-inspired cuckoo search algorithm. The other feature of the proposed approach is the use of a randomized progressive alignment method based on a hybrid global/local pairwise algorithm to construct the initial population. The results obtained by this hybridization are very encouraging and show the feasibility and effectiveness of the proposed solution.


2019 ◽  
Vol 12 (1) ◽  
pp. 30-39
Author(s):  
Siti Amiroch ◽  
M. Syaiful Pradana ◽  
M. Isa Irawan ◽  
Imam Mukhlash

Background:Multiple sequence alignment is a method of getting genomic relationships between 3 sequences or more. In multiple alignments, there are 3 mutation network analyses, namely topological network system, mutation region network and network system of mutation mode. In general, the three analyses show stable and unstable regions that map mutation regions. This area of ​​mutation is described further in a phylogenetic tree which simultaneously illustrates the path of the spread of an epidemic, the Severe Acute Respiratory Syndrome (SARS) epidemic. The process of spreading the SARS viruses, in this case, is described as the process of phylogenetic tree formation, and as a novelty of this research, multiple alignments in the process are analyzed in detail and then optimized with genetic algorithms.Methods:The data used to form the phylogenetic tree for the spread of the SARS epidemic are 14 DNA sequences which are then optimized by using genetic algorithms. The phylogenetic tree is constructed by using the neighbor-joining algorithm with a distance matrix that the intended distance is the genetic distance obtained from sequence alignment by using the Needleman Wunsch Algorithm.Results & Conclusion:The results of the analysis obtained 3649 stable areas and 19 unstable areas. The results of phylogenetic tree from the network system analysis indicated that the spread of the SARS epidemic extended from Guangzhou 16/12/02 to Zhongshan 27/12/02, then spread simultaneously to Guangzhou 18/02/03 and Guangzhou hospital. After that, the virus reached Metropole, Zhongshan, Hongkong, Singapore, Taiwan, Hong kong, and Hanoi which then continued to Guangzhou 01/01/03 and Toronto at once. The results of the mutation region network system demonstrate decomposition of orthogonal mutations in the 1st order arc.


2021 ◽  
Author(s):  
Liang Hong ◽  
Siqi Sun ◽  
Liangzhen Zheng ◽  
Qingxiong Tan ◽  
Yu Li

Evolutionarily related sequences provide information for the protein structure and function. Multiple sequence alignment, which includes homolog searching from large databases and sequence alignment, is efficient to dig out the information and assist protein structure and function prediction, whose efficiency has been proved by AlphaFold. Despite the existing tools for multiple sequence alignment, searching homologs from the entire UniProt is still time-consuming. Considering the success of AlphaFold, foreseeably, large- scale multiple sequence alignments against massive databases will be a trend in the field. It is very desirable to accelerate this step. Here, we propose a novel method, fastMSA, to improve the speed significantly. Our idea is orthogonal to all the previous accelerating methods. Taking advantage of the protein language model based on BERT, we propose a novel dual encoder architecture that can embed the protein sequences into a low-dimension space and filter the unrelated sequences efficiently before running BLAST. Extensive experimental results suggest that we can recall most of the homologs with a 34-fold speed-up. Moreover, our method is compatible with the downstream tasks, such as structure prediction using AlphaFold. Using multiple sequence alignments generated from our method, we have little performance compromise on the protein structure prediction with much less running time. fastMSA will effectively assist protein sequence, structure, and function analysis based on homologs and multiple sequence alignment.


2016 ◽  
Vol 61 (1) ◽  
Author(s):  
Mohd Saeed ◽  
Mohd Adnan ◽  
Saif Khan ◽  
Eyad Al-Shammari ◽  
Huma Mustafa

AbstractLymphatic filariasis (LF) is a chronic disease and is caused by the parasites Wuchereria bancrofti (W. bancrofti), Brugia malayi (B. malayi) and Brugia timori (B. timori). In the present study, Setaria cervi (S. cervi), a bovine filarial parasite has been used. Previously, it has been reported that the S. cervi shares some common proteins and antigenic determinants with that of human filarial parasite. The larval stages of filarial species usually cannot be identified by classical morphology. Hence, molecular characterization allows the identification of the parasites throughout all their developmental stages. The genomic DNA of S. cervi adult were isolated and estimated spectrophotometrically for the quantitative presence of DNA content. Screening of DNA sequences from filarial DNA GenBank and Expressed Sequence Tags (EST’s) were performed for homologous sequences and then multiple sequence alignment was executed. The conserved sequences from multiple sequence alignment were used for In Silico primer designing. The successfully designed primers were used further in PCR amplifications. Therefore, in search of a promising diagnostic tool few genes were identified to be conserved in the human and bovine filariasis and these novel primers deigned may help to develop a promising diagnostic tool for identification of lymphatic filariasis.


2014 ◽  
Vol 31 (2) ◽  
pp. 283-296
Author(s):  
Guoli Ji ◽  
Yong Zeng ◽  
Zijiang Yang ◽  
Congting Ye ◽  
Jingci Yao

Purpose – The time complexity of most multiple sequence alignment algorithm is O(N2) or O(N3) (N is the number of sequences). In addition, with the development of biotechnology, the amount of biological sequences grows significantly. The traditional methods have some difficulties in handling large-scale sequence. The proposed Lemk_MSA method aims to reduce the time complexity, especially for large-scale sequences. At the same time, it can keep similar accuracy level compared to the traditional methods. Design/methodology/approach – LemK_MSA converts multiple sequence alignment into corresponding 10D vector alignment by ten types of copy modes based on Lempel-Ziv. Then, it uses k-means algorithm and NJ algorithm to divide the sequences into several groups and calculate guide tree of each group. A complete guide tree for multiple sequence alignment could be constructed by merging guide tree of every group. Moreover, for large-scale multiple sequence, Lemk_MSA proposes a GPU-based parallel way for distance matrix calculation. Findings – Under this approach, the time efficiency to process multiple sequence alignment can be improved. The high-throughput mouse antibody sequences are used to validate the proposed method. Compared to ClustalW, MAFFT and Mbed, LemK_MSA is more than ten times efficient while ensuring the alignment accuracy at the same time. Originality/value – This paper proposes a novel method with sequence vectorization for multiple sequence alignment based on Lempel-Ziv. A GPU-based parallel method has been designed for large-scale distance matrix calculation. It provides a new way for multiple sequence alignment research.


Sign in / Sign up

Export Citation Format

Share Document