Using a Bio-Inspired Algorithm to Resolve the Multiple Sequence Alignment Problem

El-amine Zemali; Abdelmadjid Boukra

doi:10.4018/ijamc.2016070103

Using a Bio-Inspired Algorithm to Resolve the Multiple Sequence Alignment Problem

International Journal of Applied Metaheuristic Computing ◽

10.4018/ijamc.2016070103 ◽

2016 ◽

Vol 7 (3) ◽

pp. 36-55 ◽

Cited By ~ 2

Author(s):

El-amine Zemali ◽

Abdelmadjid Boukra

Keyword(s):

Sequence Alignment ◽

Dna Sequences ◽

Multiple Sequence Alignment ◽

Bat Algorithm ◽

Premature Convergence ◽

Hill Climbing ◽

Initial Population ◽

Multiple Sequence ◽

Guide Tree ◽

And Function

One of the most challenging tasks in bioinformatics is the resolution of Multiple Sequence Alignment (MSA) problem. It consists in comparing a set of protein or DNA sequences, in aim of predicting their structure and function. This paper introduces a new bio-inspired approach to solve such problem. This approach named BA-MSA is based on Bat Algorithm. Bat Algorithm (BA) is a recent evolutionary algorithm inspired from Bats behavior seeking their prey. The proposed approach includes new mechanism to generate initial population. It consists in generating a guide tree for each solution with progressive approach by varying some parameters. The generated guide tree will be enhanced by Hill-Climbing algorithm. In addition, to deal with the premature convergence of BA, a new restart technique is proposed to introduce more diversification when detecting premature convergence. Balibase 2.0 datasets are used for experiments. The comparison with well-known methods as MSA-GA MSA-GA (w\prealign), ClustalW, and SAGA and recent method (BBOMP) shows the effectiveness of the proposed approach.

Download Full-text

A Comprehensive Comparison of Guide Tree Construction for Multiple Sequence Alignment

International Journal of Advancements in Computing Technology ◽

10.4156/ijact.vol5.issue9.42 ◽

2013 ◽

Vol 5 (9) ◽

pp. 350-358

Author(s):

Liangliang Chen ◽

Yong Zeng ◽

Mingcheng Wu ◽

Guoli Ji

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence ◽

Guide Tree ◽

Tree Construction ◽

Comprehensive Comparison

Download Full-text

Resolving the multiple sequence alignment problem using biogeography-based optimization with multiple populations

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001550016x ◽

2015 ◽

Vol 13 (04) ◽

pp. 1550016 ◽

Cited By ~ 3

Author(s):

El-Amine Zemali ◽

Abdelmadjid Boukra

Keyword(s):

Sequence Alignment ◽

Dna Sequences ◽

Multiple Sequence Alignment ◽

Search Space ◽

New Method ◽

Average Score ◽

Solution Quality ◽

Multiple Sequence ◽

Multiple Populations ◽

Alignment Problem

The multiple sequence alignment (MSA) is one of the most challenging problems in bioinformatics, it involves discovering similarity between a set of protein or DNA sequences. This paper introduces a new method for the MSA problem called biogeography-based optimization with multiple populations (BBOMP). It is based on a recent metaheuristic inspired from the mathematics of biogeography named biogeography-based optimization (BBO). To improve the exploration ability of BBO, we have introduced a new concept allowing better exploration of the search space. It consists of manipulating multiple populations having each one its own parameters. These parameters are used to build up progressive alignments allowing more diversity. At each iteration, the best found solution is injected in each population. Moreover, to improve solution quality, six operators are defined. These operators are selected with a dynamic probability which changes according to the operators efficiency. In order to test proposed approach performance, we have considered a set of datasets from Balibase 2.0 and compared it with many recent algorithms such as GAPAM, MSA-GA, QEAMSA and RBT-GA. The results show that the proposed approach achieves better average score than the previously cited methods.

Download Full-text

Genetic Algorithm Using Guide Tree in Mutation Operator for Solving Multiple Sequence Alignment

Advances in Intelligent Systems and Computing - Advanced Computing and Systems for Security ◽

10.1007/978-81-322-2650-5_10 ◽

2015 ◽

pp. 145-157 ◽

Cited By ~ 1

Author(s):

Rohit Kumar Yadav ◽

Haider Banka

Keyword(s):

Genetic Algorithm ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Mutation Operator ◽

Multiple Sequence ◽

Guide Tree

Download Full-text

Profile HMM based Multiple Sequence Alignment for DNA Sequences

Procedia Engineering ◽

10.1016/j.proeng.2012.06.218 ◽

2012 ◽

Vol 38 ◽

pp. 1783-1787 ◽

Cited By ~ 1

Author(s):

Sudipta Mulia ◽

Debahuti Mishra ◽

Tanushree Jena

Keyword(s):

Sequence Alignment ◽

Dna Sequences ◽

Multiple Sequence Alignment ◽

Multiple Sequence ◽

Profile Hmm

Download Full-text

A New Quantum Cuckoo Search Algorithm for Multiple Sequence Alignment

Journal of Intelligent Systems ◽

10.1515/jisys-2013-0052 ◽

2014 ◽

Vol 23 (3) ◽

pp. 261-275 ◽

Cited By ~ 4

Author(s):

Widad Kartous ◽

Abdesslem Layeb ◽

Salim Chikhi

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Search Algorithm ◽

Cuckoo Search ◽

Cuckoo Search Algorithm ◽

Initial Population ◽

Biological Sequences ◽

Alignment Method ◽

Progressive Alignment ◽

Multiple Sequence

AbstractMultiple sequence alignment (MSA) is one of the major problems that can be encountered in the bioinformatics field. MSA consists in aligning a set of biological sequences to extract the similarities between them. Unfortunately, this problem has been shown to be NP-hard. In this article, a new algorithm was proposed to deal with this problem; it is based on a quantum-inspired cuckoo search algorithm. The other feature of the proposed approach is the use of a randomized progressive alignment method based on a hybrid global/local pairwise algorithm to construct the initial population. The results obtained by this hybridization are very encouraging and show the feasibility and effectiveness of the proposed solution.

Download Full-text

A Simple Genetic Algorithm for Optimizing Multiple Sequence Alignment on the Spread of the SARS Epidemic

The Open Bioinformatics Journal ◽

10.2174/1875036201912010030 ◽

2019 ◽

Vol 12 (1) ◽

pp. 30-39

Author(s):

Siti Amiroch ◽

M. Syaiful Pradana ◽

M. Isa Irawan ◽

Imam Mukhlash

Keyword(s):

Genetic Algorithms ◽

Phylogenetic Tree ◽

Sequence Alignment ◽

Dna Sequences ◽

Multiple Sequence Alignment ◽

Network System ◽

Multiple Sequence ◽

Network Analyses ◽

Multiple Alignments ◽

Mutation Region

Background:Multiple sequence alignment is a method of getting genomic relationships between 3 sequences or more. In multiple alignments, there are 3 mutation network analyses, namely topological network system, mutation region network and network system of mutation mode. In general, the three analyses show stable and unstable regions that map mutation regions. This area of mutation is described further in a phylogenetic tree which simultaneously illustrates the path of the spread of an epidemic, the Severe Acute Respiratory Syndrome (SARS) epidemic. The process of spreading the SARS viruses, in this case, is described as the process of phylogenetic tree formation, and as a novelty of this research, multiple alignments in the process are analyzed in detail and then optimized with genetic algorithms.Methods:The data used to form the phylogenetic tree for the spread of the SARS epidemic are 14 DNA sequences which are then optimized by using genetic algorithms. The phylogenetic tree is constructed by using the neighbor-joining algorithm with a distance matrix that the intended distance is the genetic distance obtained from sequence alignment by using the Needleman Wunsch Algorithm.Results & Conclusion:The results of the analysis obtained 3649 stable areas and 19 unstable areas. The results of phylogenetic tree from the network system analysis indicated that the spread of the SARS epidemic extended from Guangzhou 16/12/02 to Zhongshan 27/12/02, then spread simultaneously to Guangzhou 18/02/03 and Guangzhou hospital. After that, the virus reached Metropole, Zhongshan, Hongkong, Singapore, Taiwan, Hong kong, and Hanoi which then continued to Guangzhou 01/01/03 and Toronto at once. The results of the mutation region network system demonstrate decomposition of orthogonal mutations in the 1st order arc.

Download Full-text

fastMSA: Accelerating Multiple Sequence Alignment with Dense Retrieval on Protein Language

10.1101/2021.12.20.473431 ◽

2021 ◽

Author(s):

Liang Hong ◽

Siqi Sun ◽

Liangzhen Zheng ◽

Qingxiong Tan ◽

Yu Li

Keyword(s):

Protein Structure ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Structure Prediction ◽

Structure And Function ◽

Sequence Alignments ◽

Protein Structure And Function ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

And Function

Evolutionarily related sequences provide information for the protein structure and function. Multiple sequence alignment, which includes homolog searching from large databases and sequence alignment, is efficient to dig out the information and assist protein structure and function prediction, whose efficiency has been proved by AlphaFold. Despite the existing tools for multiple sequence alignment, searching homologs from the entire UniProt is still time-consuming. Considering the success of AlphaFold, foreseeably, large- scale multiple sequence alignments against massive databases will be a trend in the field. It is very desirable to accelerate this step. Here, we propose a novel method, fastMSA, to improve the speed significantly. Our idea is orthogonal to all the previous accelerating methods. Taking advantage of the protein language model based on BERT, we propose a novel dual encoder architecture that can embed the protein sequences into a low-dimension space and filter the unrelated sequences efficiently before running BLAST. Extensive experimental results suggest that we can recall most of the homologs with a 34-fold speed-up. Moreover, our method is compatible with the downstream tasks, such as structure prediction using AlphaFold. Using multiple sequence alignments generated from our method, we have little performance compromise on the protein structure prediction with much less running time. fastMSA will effectively assist protein sequence, structure, and function analysis based on homologs and multiple sequence alignment.

Download Full-text

MAUSA: Using Simulated Annealing for Guide Tree Construction in Multiple Sequence Alignment

AI 2007: Advances in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-540-76928-6_61 ◽

2007 ◽

pp. 599-608

Author(s):

P. J. Uren ◽

R. M. Cameron-Jones ◽

A. H. J. Sale

Keyword(s):

Simulated Annealing ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence ◽

Guide Tree ◽

Tree Construction

Download Full-text

In search of a potential diagnostic tool for molecular characterization of lymphatic filariasis

Acta Parasitologica ◽

10.1515/ap-2016-0015 ◽

2016 ◽

Vol 61 (1) ◽

Author(s):

Mohd Saeed ◽

Mohd Adnan ◽

Saif Khan ◽

Eyad Al-Shammari ◽

Huma Mustafa

Keyword(s):

Lymphatic Filariasis ◽

Molecular Characterization ◽

Sequence Alignment ◽

Dna Sequences ◽

Multiple Sequence Alignment ◽

Diagnostic Tool ◽

Developmental Stages ◽

Filarial Parasite ◽

Antigenic Determinants ◽

Multiple Sequence

AbstractLymphatic filariasis (LF) is a chronic disease and is caused by the parasites Wuchereria bancrofti (W. bancrofti), Brugia malayi (B. malayi) and Brugia timori (B. timori). In the present study, Setaria cervi (S. cervi), a bovine filarial parasite has been used. Previously, it has been reported that the S. cervi shares some common proteins and antigenic determinants with that of human filarial parasite. The larval stages of filarial species usually cannot be identified by classical morphology. Hence, molecular characterization allows the identification of the parasites throughout all their developmental stages. The genomic DNA of S. cervi adult were isolated and estimated spectrophotometrically for the quantitative presence of DNA content. Screening of DNA sequences from filarial DNA GenBank and Expressed Sequence Tags (EST’s) were performed for homologous sequences and then multiple sequence alignment was executed. The conserved sequences from multiple sequence alignment were used for In Silico primer designing. The successfully designed primers were used further in PCR amplifications. Therefore, in search of a promising diagnostic tool few genes were identified to be conserved in the human and bovine filariasis and these novel primers deigned may help to develop a promising diagnostic tool for identification of lymphatic filariasis.

Download Full-text

A multiple sequence alignment method with sequence vectorization

Engineering Computations ◽

10.1108/ec-01-2013-0026 ◽

2014 ◽

Vol 31 (2) ◽

pp. 283-296

Author(s):

Guoli Ji ◽

Yong Zeng ◽

Zijiang Yang ◽

Congting Ye ◽

Jingci Yao

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Time Complexity ◽

Large Scale ◽

Distance Matrix ◽

Traditional Methods ◽

Multiple Sequence ◽

Guide Tree ◽

Content Type ◽

Matrix Calculation

Purpose – The time complexity of most multiple sequence alignment algorithm is O(N2) or O(N3) (N is the number of sequences). In addition, with the development of biotechnology, the amount of biological sequences grows significantly. The traditional methods have some difficulties in handling large-scale sequence. The proposed Lemk_MSA method aims to reduce the time complexity, especially for large-scale sequences. At the same time, it can keep similar accuracy level compared to the traditional methods. Design/methodology/approach – LemK_MSA converts multiple sequence alignment into corresponding 10D vector alignment by ten types of copy modes based on Lempel-Ziv. Then, it uses k-means algorithm and NJ algorithm to divide the sequences into several groups and calculate guide tree of each group. A complete guide tree for multiple sequence alignment could be constructed by merging guide tree of every group. Moreover, for large-scale multiple sequence, Lemk_MSA proposes a GPU-based parallel way for distance matrix calculation. Findings – Under this approach, the time efficiency to process multiple sequence alignment can be improved. The high-throughput mouse antibody sequences are used to validate the proposed method. Compared to ClustalW, MAFFT and Mbed, LemK_MSA is more than ten times efficient while ensuring the alignment accuracy at the same time. Originality/value – This paper proposes a novel method with sequence vectorization for multiple sequence alignment based on Lempel-Ziv. A GPU-based parallel method has been designed for large-scale distance matrix calculation. It provides a new way for multiple sequence alignment research.

Download Full-text