Profile HMM based Multiple Sequence Alignment for DNA Sequences

One of the most challenging tasks in bioinformatics is the resolution of Multiple Sequence Alignment (MSA) problem. It consists in comparing a set of protein or DNA sequences, in aim of predicting their structure and function. This paper introduces a new bio-inspired approach to solve such problem. This approach named BA-MSA is based on Bat Algorithm. Bat Algorithm (BA) is a recent evolutionary algorithm inspired from Bats behavior seeking their prey. The proposed approach includes new mechanism to generate initial population. It consists in generating a guide tree for each solution with progressive approach by varying some parameters. The generated guide tree will be enhanced by Hill-Climbing algorithm. In addition, to deal with the premature convergence of BA, a new restart technique is proposed to introduce more diversification when detecting premature convergence. Balibase 2.0 datasets are used for experiments. The comparison with well-known methods as MSA-GA MSA-GA (w\prealign), ClustalW, and SAGA and recent method (BBOMP) shows the effectiveness of the proposed approach.

Download Full-text

Resolving the multiple sequence alignment problem using biogeography-based optimization with multiple populations

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001550016x ◽

2015 ◽

Vol 13 (04) ◽

pp. 1550016 ◽

Cited By ~ 3

Author(s):

El-Amine Zemali ◽

Abdelmadjid Boukra

Keyword(s):

Sequence Alignment ◽

Dna Sequences ◽

Multiple Sequence Alignment ◽

Search Space ◽

New Method ◽

Average Score ◽

Solution Quality ◽

Multiple Sequence ◽

Multiple Populations ◽

Alignment Problem

The multiple sequence alignment (MSA) is one of the most challenging problems in bioinformatics, it involves discovering similarity between a set of protein or DNA sequences. This paper introduces a new method for the MSA problem called biogeography-based optimization with multiple populations (BBOMP). It is based on a recent metaheuristic inspired from the mathematics of biogeography named biogeography-based optimization (BBO). To improve the exploration ability of BBO, we have introduced a new concept allowing better exploration of the search space. It consists of manipulating multiple populations having each one its own parameters. These parameters are used to build up progressive alignments allowing more diversity. At each iteration, the best found solution is injected in each population. Moreover, to improve solution quality, six operators are defined. These operators are selected with a dynamic probability which changes according to the operators efficiency. In order to test proposed approach performance, we have considered a set of datasets from Balibase 2.0 and compared it with many recent algorithms such as GAPAM, MSA-GA, QEAMSA and RBT-GA. The results show that the proposed approach achieves better average score than the previously cited methods.

Download Full-text

A Simple Genetic Algorithm for Optimizing Multiple Sequence Alignment on the Spread of the SARS Epidemic

The Open Bioinformatics Journal ◽

10.2174/1875036201912010030 ◽

2019 ◽

Vol 12 (1) ◽

pp. 30-39

Author(s):

Siti Amiroch ◽

M. Syaiful Pradana ◽

M. Isa Irawan ◽

Imam Mukhlash

Keyword(s):

Genetic Algorithms ◽

Phylogenetic Tree ◽

Sequence Alignment ◽

Dna Sequences ◽

Multiple Sequence Alignment ◽

Network System ◽

Multiple Sequence ◽

Network Analyses ◽

Multiple Alignments ◽

Mutation Region

Background:Multiple sequence alignment is a method of getting genomic relationships between 3 sequences or more. In multiple alignments, there are 3 mutation network analyses, namely topological network system, mutation region network and network system of mutation mode. In general, the three analyses show stable and unstable regions that map mutation regions. This area of mutation is described further in a phylogenetic tree which simultaneously illustrates the path of the spread of an epidemic, the Severe Acute Respiratory Syndrome (SARS) epidemic. The process of spreading the SARS viruses, in this case, is described as the process of phylogenetic tree formation, and as a novelty of this research, multiple alignments in the process are analyzed in detail and then optimized with genetic algorithms.Methods:The data used to form the phylogenetic tree for the spread of the SARS epidemic are 14 DNA sequences which are then optimized by using genetic algorithms. The phylogenetic tree is constructed by using the neighbor-joining algorithm with a distance matrix that the intended distance is the genetic distance obtained from sequence alignment by using the Needleman Wunsch Algorithm.Results & Conclusion:The results of the analysis obtained 3649 stable areas and 19 unstable areas. The results of phylogenetic tree from the network system analysis indicated that the spread of the SARS epidemic extended from Guangzhou 16/12/02 to Zhongshan 27/12/02, then spread simultaneously to Guangzhou 18/02/03 and Guangzhou hospital. After that, the virus reached Metropole, Zhongshan, Hongkong, Singapore, Taiwan, Hong kong, and Hanoi which then continued to Guangzhou 01/01/03 and Toronto at once. The results of the mutation region network system demonstrate decomposition of orthogonal mutations in the 1st order arc.

Download Full-text

In search of a potential diagnostic tool for molecular characterization of lymphatic filariasis

Acta Parasitologica ◽

10.1515/ap-2016-0015 ◽

2016 ◽

Vol 61 (1) ◽

Author(s):

Mohd Saeed ◽

Mohd Adnan ◽

Saif Khan ◽

Eyad Al-Shammari ◽

Huma Mustafa

Keyword(s):

Lymphatic Filariasis ◽

Molecular Characterization ◽

Sequence Alignment ◽

Dna Sequences ◽

Multiple Sequence Alignment ◽

Diagnostic Tool ◽

Developmental Stages ◽

Filarial Parasite ◽

Antigenic Determinants ◽

Multiple Sequence

AbstractLymphatic filariasis (LF) is a chronic disease and is caused by the parasites Wuchereria bancrofti (W. bancrofti), Brugia malayi (B. malayi) and Brugia timori (B. timori). In the present study, Setaria cervi (S. cervi), a bovine filarial parasite has been used. Previously, it has been reported that the S. cervi shares some common proteins and antigenic determinants with that of human filarial parasite. The larval stages of filarial species usually cannot be identified by classical morphology. Hence, molecular characterization allows the identification of the parasites throughout all their developmental stages. The genomic DNA of S. cervi adult were isolated and estimated spectrophotometrically for the quantitative presence of DNA content. Screening of DNA sequences from filarial DNA GenBank and Expressed Sequence Tags (EST’s) were performed for homologous sequences and then multiple sequence alignment was executed. The conserved sequences from multiple sequence alignment were used for In Silico primer designing. The successfully designed primers were used further in PCR amplifications. Therefore, in search of a promising diagnostic tool few genes were identified to be conserved in the human and bovine filariasis and these novel primers deigned may help to develop a promising diagnostic tool for identification of lymphatic filariasis.

Download Full-text

Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families

10.1101/2021.08.17.456740 ◽

2021 ◽

Author(s):

Robert M. Hubley ◽

Travis J. Wheeler ◽

Arian F.A. Smit

Keyword(s):

Transposable Element ◽

Sequence Alignment ◽

Dna Sequences ◽

Multiple Sequence Alignment ◽

Structural Features ◽

Sequence Evolution ◽

Sequence Alignments ◽

Multiple Sequence ◽

Consensus Sequences

The construction of a high-quality multiple sequence alignment (MSA) from copies of a transposable element (TE) is a critical step in the characterization of a new TE family. Most studies of MSA accuracy have been conducted on protein or RNA sequence families where structural features and strong signals of selection may assist with alignment. Less attention has been given to the quality of sequence alignments involving neutrally evolving DNA sequences such as those resulting from TE replication. Such alignments play an important role in understanding and representing TE family history. Transposable element sequences are challenging to align due to their wide divergence ranges, fragmentation, and predominantly-neutral mutation patterns. To gain insight into the effects of these properties on MSA accuracy, we developed a simulator of TE sequence evolution, and used it to generate a benchmark with which we evaluated the MSA predictions produced by several popular aligners, along with Refiner, a method we developed in the context of our RepeatModeler software. We find that MAFFT and Refiner generally outperform other aligners for low to medium divergence simulated sequences, while Refiner is uniquely effective when tasked with aligning high-divergent and fragmented instances of a family. As a result, consensus sequences derived from Refiner-based MSAs are more similar to the true consensus.

Download Full-text

Aligning Multiple Sequences Using an Improved Tabu Search Algorithm

Journal of Circuits System and Computers ◽

10.1142/s0218126617500669 ◽

2016 ◽

Vol 26 (04) ◽

pp. 1750066 ◽

Cited By ~ 1

Author(s):

Lamiche Chaabane ◽

Moussaoui Abdelouahab

Keyword(s):

Tabu Search ◽

Sequence Alignment ◽

Dna Sequences ◽

Multiple Sequence Alignment ◽

Large Scale ◽

Search Algorithm ◽

Protein Structures ◽

Biological Sequence ◽

Multiple Sequence ◽

Alignment Problem

One of the most essential operations in biological sequence analysis is multiple sequence alignment (MSA), where it is used for constructing evolutionary trees for DNA sequences and for analyzing the protein structures to help design new proteins. In this research study, a new method for solving sequence alignment problem is proposed, which is named improved tabu search (ITS). This algorithm is based on the classical tabu search (TS) optimizing technique. ITS is implemented in order to obtain results of multiple sequence alignment. Several variants concerning neighborhood generation and intensification/diversification strategies for our proposed ITS are investigated. Simulation results on a large scale of datasets have shown the efficacy of the developed approach and its capacity to achieve good quality solutions in terms of scores comparing to those given by other existing methods.

Download Full-text

Inferring an Original Sequence from Erroneous Copies: Two Approaches

Asia-Pacific Biotech News ◽

10.1142/s0219030303000284 ◽

2003 ◽

Vol 07 (03) ◽

pp. 107-114 ◽

Cited By ~ 3

Author(s):

Jonathan M. Keith ◽

Peter Adams ◽

Darryn Bryant ◽

Keith R. Mitchelson ◽

Duncan A. E. Cochran ◽

...

Keyword(s):

Sequence Alignment ◽

Dna Sequences ◽

Multiple Sequence Alignment ◽

Sequence Alignments ◽

Multiple Sequence ◽

Original Sequence ◽

Multiple Sequence Alignments ◽

Sequencing Errors ◽

The Cost ◽

New Algorithms

This paper considers the problem of inferring an original sequence from a number of erroneous copies. The problem arises in DNA sequencing, particularly in the context of emerging technologies that provide high throughput or other advantages at the cost of an increased number of errors. We describe and compare two approaches that have recently been developed by the authors. The first approach searches for a sequence known as a Steiner string; the second searches for the most probable original sequence with respect to a simple Bayesian model of sequencing errors. We present the results of extensive tests in which erroneous copies of real DNA sequences were simulated and the algorithms were used to infer the original sequences. The results are used to compare the two approaches to each other and to a third, more conventional, approach based on multiple sequence alignment. We find that the Bayesian approach is superior to the Steiner approach, which in turn is superior to the alignment approach. The two new algorithms can also be used to construct multiple sequence alignments. We show that the two methods produce alignments of approximately equal quality, and conclude that the Steiner approach is better for this purpose because it is faster. Both methods produce better alignments than a well-known multiple sequence alignment package, for the cases tested.

Download Full-text

Multiple Sequence Alignment Based on Profile Hidden Markov Model and Quantum-Behaved Particle Swarm Optimization with Selection Method

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.282-283.7 ◽

2011 ◽

Vol 282-283 ◽

pp. 7-12 ◽

Cited By ~ 2

Author(s):

Hai Xia Long ◽

Li Hua Wu ◽

Yu Zhang

Keyword(s):

Particle Swarm Optimization ◽

Markov Model ◽

Hidden Markov Model ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Hidden Markov ◽

Particle Swarm ◽

Multiple Sequence ◽

Swarm Optimization ◽

Profile Hmm

Multiple sequence alignment (MSA) is an NP-complete and important problem in bioinformatics. Currently, profile hidden Markov model (HMM) is widely used for multiple sequence alignment. In this paper, Quantum-behaved Particle Swarm Optimization with selection operation (SQPSO) is presented, which is used to train profile HMM. Furthermore, an integration algorithm based on the profile HMM and SQPSO for the MSA is constructed. The approach is examined by using multiple nucleotides and protein sequences and compared with other algorithms. The results of the comparisons show that the HMM trained with SQPSO and QPSO yield better alignments than other most commonly used HMM training methods such as Baum–Welch and PSO.

Download Full-text