Component-Based Design and Assembly of Heuristic Multiple Sequence Alignment Algorithms

As a key algorithm in bioinformatics, sequence alignment algorithm is widely used in sequence similarity analysis and genome sequence database search. Existing research focuses mainly on the specific steps of the algorithm or is for specific problems, lack of high-level abstract domain algorithm framework. Multiple sequence alignment algorithms are more complex, redundant, and difficult to understand, and it is not easy for users to select the appropriate algorithm; some computing errors may occur. Based on our constructed pairwise sequence alignment algorithm component library and the convenient software platform PAR, a few expansion domain components are developed for multiple sequence alignment application domain, and specific multiple sequence alignment algorithm can be designed, and its corresponding program, i.e., C++/Java/Python program, can be generated efficiently and thus enables the improvement of the development efficiency of complex algorithms, as well as accuracy of sequence alignment calculation. A star alignment algorithm is designed and generated to demonstrate the development process.

Download Full-text

Instability in progressive multiple sequence alignment algorithms

Algorithms for Molecular Biology ◽

10.1186/s13015-015-0057-1 ◽

2015 ◽

Vol 10 (1) ◽

Cited By ~ 13

Author(s):

Kieran Boyce ◽

Fabian Sievers ◽

Desmond G. Higgins

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence ◽

Alignment Algorithms ◽

Progressive Multiple Sequence Alignment

Download Full-text

Multiple Sequence Alignment Averaging Improves Phylogeny Reconstruction

Systematic Biology ◽

10.1093/sysbio/syy036 ◽

2018 ◽

Vol 68 (1) ◽

pp. 117-130 ◽

Cited By ~ 9

Author(s):

Haim Ashkenazy ◽

Itamar Sela ◽

Eli Levy Karin ◽

Giddy Landan ◽

Tal Pupko

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Sequence Data ◽

Phylogenetic Signal ◽

Large Set ◽

Multiple Sequence ◽

Extra Effort ◽

Alignment Algorithms ◽

Tree Inference ◽

Alignment Errors

Abstract The classic methodology of inferring a phylogenetic tree from sequence data is composed of two steps. First, a multiple sequence alignment (MSA) is computed. Then, a tree is reconstructed assuming the MSA is correct. Yet, inferred MSAs were shown to be inaccurate and alignment errors reduce tree inference accuracy. It was previously proposed that filtering unreliable alignment regions can increase the accuracy of tree inference. However, it was also demonstrated that the benefit of this filtering is often obscured by the resulting loss of phylogenetic signal. In this work we explore an approach, in which instead of relying on a single MSA, we generate a large set of alternative MSAs and concatenate them into a single SuperMSA. By doing so, we account for phylogenetic signals contained in columns that are not present in the single MSA computed by alignment algorithms. Using simulations, we demonstrate that this approach results, on average, in more accurate trees compared to 1) using an unfiltered MSA and 2) using a single MSA with weights assigned to columns according to their reliability. Next, we explore in which regions of the MSA space our approach is expected to be beneficial. Finally, we provide a simple criterion for deciding whether or not the extra effort of computing a SuperMSA and inferring a tree from it is beneficial. Based on these assessments, we expect our methodology to be useful for many cases in which diverged sequences are analyzed. The option to generate such a SuperMSA is available at http://guidance.tau.ac.il.

Download Full-text