scholarly journals Kalign 3: multiple sequence alignment of large datasets

Author(s):  
Timo Lassmann

Abstract Motivation Kalign is an efficient multiple sequence alignment (MSA) program capable of aligning thousands of protein or nucleotide sequences. However, current alignment problems involving large numbers of sequences are exceeding Kalign’s original design specifications. Here we present a completely re-written and updated version to meet current and future alignment challenges. Results Kalign now uses a SIMD (single instruction, multiple data) accelerated version of the bit-parallel Gene Myers algorithm to estimate pairwise distances, adopts a sequence embedding strategy and the bi-secting K-means algorithm to rapidly construct guide trees for thousands of sequences. The new version maintains high alignment accuracy on both protein and nucleotide alignments and scales better than other MSA tools. Availability and implementation The source code of Kalign and code to reproduce the results are found here: https://github.com/timolassmann/kalign. Contact [email protected]

2015 ◽  
Vol 16 (Suppl 5) ◽  
pp. S4 ◽  
Author(s):  
Qing Zhan ◽  
Yongtao Ye ◽  
Tak-Wah Lam ◽  
Siu-Ming Yiu ◽  
Yadong Wang ◽  
...  

2010 ◽  
Vol 5 (1) ◽  
pp. 21 ◽  
Author(s):  
Gordon Blackshields ◽  
Fabian Sievers ◽  
Weifeng Shi ◽  
Andreas Wilm ◽  
Desmond G Higgins

2019 ◽  
Author(s):  
Ivo Baar ◽  
Lukas Hübner ◽  
Peter Oettig ◽  
Adrian Zapletal ◽  
Sebastian Schlag ◽  
...  

AbstractThe so-called site repeats (SR) technique can be used to accelerate the widely-used phylogenetic likelihood function (PLF) by identifying identical patterns among multiple sequence alignment (MSA) sites, thereby omitting redundant calculations and saving memory. However, this complicates the optimal data distribution of MSA sites in parallel likelihood calculations, as the cost of computing the likelihood for individual sites strongly depends on the sites-to-cores assignment. We show that finding a ‘good’ sites-to-cores assignment can be modeled as a hypergraph partitioning problem, more specifically, a specific instance of the so-called judicious hypergraph partitioning problem. We initially develop, parallelize, and make available HyperPhylo, an efficient open-source implementation for this flavor of judicious partitioning where all vertices have the same degree. Using empirical MSA data, we then show that sites-to-core assignments computed via HyperPhylo are substantially better than those obtained via a previous na ï ve approach for phylogenetic data distribution under SRs.


2021 ◽  
Vol 17 (10) ◽  
pp. e1008950
Author(s):  
Vladimir Smirnov

Multiple sequence alignment tools struggle to keep pace with rapidly growing sequence data, as few methods can handle large datasets while maintaining alignment accuracy. We recently introduced MAGUS, a new state-of-the-art method for aligning large numbers of sequences. In this paper, we present a comprehensive set of enhancements that allow MAGUS to align vastly larger datasets with greater speed. We compare MAGUS to other leading alignment methods on datasets of up to one million sequences. Our results demonstrate the advantages of MAGUS over other alignment software in both accuracy and speed. MAGUS is freely available in open-source form at https://github.com/vlasmirnov/MAGUS.


Author(s):  
Jacek Błażewicz ◽  
Piotr Formanowicz ◽  
Paweł Wojciechowski

Some remarks on evaluating the quality of the multiple sequence alignment based on the BAliBASE benchmarkBAliBASE is one of the most widely used benchmarks for multiple sequence alignment programs. The accuracy of alignment methods is measured bybali_score—an application provided together with the database. The standard accuracy measures are the Sum of Pairs (SP) and the Total Column (TC). We have found that, for non-core block columns, results calculated bybali_scoreare different from those obtained on the basis of the formal definitions of the measures. We do not claim that one of these measures is better than the other, but they are definitely different. Such a situation can be the source of confusion when alignments obtained using various methods are compared. Therefore, we propose a new nomenclature for the measures of the quality of multiple sequence alignments to distinguish which one was actually calculated. Moreover, we have found that the occurrence of a gap in some column in the first sequence of the reference alignment causes column discarding.


Author(s):  
Yarong Li

Multiple sequence alignment methods refer to a series of algorithmic solutions for the alignment of evolutionary-related sequences while taking into account evolutionary events such as mutations, insertions, deletions, and rearrangements under certain conditions. In this article, we propose a method with Q-learning based on the Actor-Critic model for sequence alignment. We transform the sequence alignment problem into an agent's autonomous learning process. In this process, the reward of the possible next action taken is calculated, and the cumulative reward of the entire process is calculated. The results show that the method we propose is better than the gene algorithm and the dynamic programming method.


Sign in / Sign up

Export Citation Format

Share Document