scholarly journals Block aligner: fast and flexible pairwise sequence alignment with SIMD-accelerated adaptive blocks

2021 ◽  
Author(s):  
Daniel Liu ◽  
Martin Steinegger

Background: The Smith-Waterman-Gotoh alignment algorithm is the most popular method for comparing biological sequences. Recently, Single Instruction Multiple Data methods have been used to speed up alignment. However, these algorithms have limitations like being optimized for specific scoring schemes, cannot handle large gaps, or require quadratic time computation. Results: We propose a new algorithm called block aligner for aligning nucleotide and protein sequences. It greedily shifts and grows a block of computed scores to span large gaps within the aligned sequences. This greedy approach is able to only compute a fraction of the DP matrix. In exchange for these features, there is no guarantee that the computed scores are accurate compared to full DP. However, in our experiments, we show that block aligner performs accurately on various realistic datasets, and it is up to 9 times faster than the popular Farrar's algorithm for protein global alignments. Conclusions: Our algorithm has applications in computing global alignments and X-drop alignments on proteins and long reads. It is available as a Rust library at https://github.com/Daniel-Liu-c0deb0t/block-aligner.

2012 ◽  
Vol 2012 ◽  
pp. 1-15 ◽  
Author(s):  
Khaled Benkrid ◽  
Ali Akoglu ◽  
Cheng Ling ◽  
Yang Song ◽  
Ying Liu ◽  
...  

This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBM’s Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion and perform better than all other platforms on performance per dollar criterion, although by a much smaller margin. Cell BE and GPU come second and third, respectively, on both performance per watt and performance per dollar criteria. In general, in order to outperform other technologies on performance per dollar criterion (using currently available hardware and development tools), FPGAs need to achieve at least two orders of magnitude speed-up compared to general-purpose processors and one order of magnitude speed-up compared to domain-specific technologies such as GPUs.


2020 ◽  
Author(s):  
Yan Gao ◽  
Yongzhuang Liu ◽  
Yanmei Ma ◽  
Bo Liu ◽  
Yadong Wang ◽  
...  

AbstractSummaryPartial order alignment, which aligns a sequence to a directed acyclic graph, is now frequently used as a key component in long-read error correction and assembly. We present abPOA (adaptive banded Partial Order Alignment), a Single Instruction Multiple Data (SIMD) based C library for fast partial order alignment using adaptive banded dynamic programming. It can work as a stand-alone multiple sequence alignment and consensus calling tool or be easily integrated into any long-read error correction and assembly workflow. Compared to a state-of-the-art tool (SPOA), abPOA is up to 15 times faster with a comparable alignment accuracy.Availability and implementationabPOA is implemented in C. A stand-alone tool and a C/Python software interface are freely available at https://github.com/yangao07/[email protected] or [email protected]


2020 ◽  
pp. 565-579 ◽  
Author(s):  
Mohamed Issa ◽  
Aboul Ella Hassanien

Sequence alignment is a vital process in many biological applications such as Phylogenetic trees construction, DNA fragment assembly and structure/function prediction. Two kinds of alignment are pairwise alignment which align two sequences and Multiple Sequence alignment (MSA) that align sequences more than two. The accurate method of alignment is based on Dynamic Programming (DP) approach which suffering from increasing time exponentially with increasing the length and the number of the aligned sequences. Stochastic or meta-heuristics techniques speed up alignment algorithm but with near optimal alignment accuracy not as that of DP. Hence, This chapter aims to review the recent development of MSA using meta-heuristics algorithms. In addition, two recent techniques are focused in more deep: the first is Fragmented protein sequence alignment using two-layer particle swarm optimization (FTLPSO). The second is Multiple sequence alignment using multi-objective based bacterial foraging optimization algorithm (MO-BFO).


2011 ◽  
Vol 4 (3) ◽  
pp. 835-844 ◽  
Author(s):  
P. Hanappe ◽  
A. Beurivé ◽  
F. Laguzet ◽  
L. Steels ◽  
N. Bellouin ◽  
...  

Abstract. We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations. The modified algorithm runs more than 50 times faster on the CELL's Synergistic Processing Element than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60 % of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.


2011 ◽  
pp. 1819-1819
Author(s):  
Jack Dongarra ◽  
Piotr Luszczek ◽  
Felix Wolf ◽  
Jesper Larsson Träff ◽  
Patrice Quinton ◽  
...  

2001 ◽  
Vol 105 (1050) ◽  
pp. 419-426 ◽  
Author(s):  
G. Barakos ◽  
M. Vahdati ◽  
A.I. Sayma ◽  
C. Bréard ◽  
M. Imregun

Abstract This paper presents the development and validation of a parallel unsteady flow and aeroelasticity code for large-scale numerical models used in turbo machinery applications. The work is based on an existing unstructured Navier-Stokes solver developed over the past ten years by the Aeroelasticity Research Group at Imperial College Vibration University Technology Centre. The single-process multiple-data paradigm was adopted for the parallelisation of the solver and several validation cases were considered. The computational mesh was divided into several sub-sections using a domain decomposition technique. The performance and numerical accuracy of the parallel solver was validated across several computer platforms for various problem sizes. In cases where the solution could be obtained on a single CPU, the serial and parallel versions of the code were found to produce identical results. Studies on up to 32 CPUs showed varying levels of parallelisation efficiency, an almost linear speed-up being obtained in some cases. Finally, an industrial configuration, a 17 blade row turbine with a 47 million point mesh, was discussed to illustrate the potential of the proposed large-scale modelling methodology.


2018 ◽  
Vol 232 ◽  
pp. 01046
Author(s):  
Wan Qiao ◽  
Dake Liu

In this paper, we propose a flexible scalable BP Polar decoding application-specific instruction set processor (PASIP) that supports multiple code lengths (64 to 4096) and any code rates. High throughputs and sufficient programmability are achieved by the single-instruction-multiple-data (SIMD) based architecture and specially designed Polar decoding acceleration instructions. The synthesis result using 65 nm CMOS technology shows that the total area of PASIP is 2.71 mm2. PASIP provides the maximum throughput of 1563 Mbps (for N = 1024) at the work frequency of 400MHz. The comparison with state-of-art Polar decoders reveals PASIP’s high area efficiency.


VLSI Design ◽  
2007 ◽  
Vol 2007 ◽  
pp. 1-7 ◽  
Author(s):  
Zheng Shen ◽  
Hu He ◽  
Yanjun Zhang ◽  
Yihe Sun

This paper describes a novel video specific instruction set architecture for ASIP design. With single instruction multiple data (SIMD) instructions, two destination modes, and video specific instructions, an instruction set architecture is introduced to enhance the performance for video applications. Furthermore, we quantify the improvement on H.263 encoding. In this paper, we evaluate and compare the performance of VS-ISA, other DSPs (digital signal processors), and conventional SIMD media extensions in the context of video coding. Our evaluation results show that VS-ISA improves the processor's performance by approximately 5x on H.263 encoding, and VS-ISA outperforms other architectures by 1.6x to 8.57x in computing IDCT.


Sign in / Sign up

Export Citation Format

Share Document