Block aligner: fast and flexible pairwise sequence alignment with SIMD-accelerated adaptive blocks

Mapping Intimacies ◽

10.1101/2021.11.08.467651 ◽

2021 ◽

Author(s):

Daniel Liu ◽

Martin Steinegger

Keyword(s):

Protein Sequences ◽

Single Instruction Multiple Data ◽

Alignment Algorithm ◽

Biological Sequences ◽

Pairwise Sequence Alignment ◽

Popular Method ◽

Multiple Data ◽

Long Reads ◽

Speed Up ◽

Scoring Schemes

Background: The Smith-Waterman-Gotoh alignment algorithm is the most popular method for comparing biological sequences. Recently, Single Instruction Multiple Data methods have been used to speed up alignment. However, these algorithms have limitations like being optimized for specific scoring schemes, cannot handle large gaps, or require quadratic time computation. Results: We propose a new algorithm called block aligner for aligning nucleotide and protein sequences. It greedily shifts and grows a block of computed scores to span large gaps within the aligned sequences. This greedy approach is able to only compute a fraction of the DP matrix. In exchange for these features, there is no guarantee that the computed scores are accurate compared to full DP. However, in our experiments, we show that block aligner performs accurately on various realistic datasets, and it is up to 9 times faster than the popular Farrar's algorithm for protein global alignments. Conclusions: Our algorithm has applications in computing global alignments and X-drop alignments on proteins and long reads. It is available as a Rust library at https://github.com/Daniel-Liu-c0deb0t/block-aligner.

Download Full-text

High Performance Biological Pairwise Sequence Alignment: FPGA versus GPU versus Cell BE versus GPP

International Journal of Reconfigurable Computing ◽

10.1155/2012/752910 ◽

2012 ◽

Vol 2012 ◽

pp. 1-15 ◽

Cited By ~ 23

Author(s):

Khaled Benkrid ◽

Ali Akoglu ◽

Cheng Ling ◽

Yang Song ◽

Ying Liu ◽

...

Keyword(s):

Sequence Alignment ◽

Reconfigurable Computing ◽

High Performance ◽

General Purpose ◽

Alignment Algorithm ◽

Pairwise Sequence Alignment ◽

Performance Per Watt ◽

Order Of Magnitude ◽

Speed Up ◽

General Purpose Processors

This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBM’s Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion and perform better than all other platforms on performance per dollar criterion, although by a much smaller margin. Cell BE and GPU come second and third, respectively, on both performance per watt and performance per dollar criteria. In general, in order to outperform other technologies on performance per dollar criterion (using currently available hardware and development tools), FPGAs need to achieve at least two orders of magnitude speed-up compared to general-purpose processors and one order of magnitude speed-up compared to domain-specific technologies such as GPUs.

Download Full-text

abPOA: an SIMD-based C library for fast partial order alignment using adaptive band

10.1101/2020.05.07.083196 ◽

2020 ◽

Author(s):

Yan Gao ◽

Yongzhuang Liu ◽

Yanmei Ma ◽

Bo Liu ◽

Yadong Wang ◽

...

Keyword(s):

Error Correction ◽

Partial Order ◽

Directed Acyclic Graph ◽

State Of The Art ◽

Single Instruction Multiple Data ◽

Multiple Sequence ◽

Software Interface ◽

Multiple Data ◽

Long Read ◽

Read Error Correction

AbstractSummaryPartial order alignment, which aligns a sequence to a directed acyclic graph, is now frequently used as a key component in long-read error correction and assembly. We present abPOA (adaptive banded Partial Order Alignment), a Single Instruction Multiple Data (SIMD) based C library for fast partial order alignment using adaptive banded dynamic programming. It can work as a stand-alone multiple sequence alignment and consensus calling tool or be easily integrated into any long-read error correction and assembly workflow. Compared to a state-of-the-art tool (SPOA), abPOA is up to 15 times faster with a comparable alignment accuracy.Availability and implementationabPOA is implemented in C. A stand-alone tool and a C/Python software interface are freely available at https://github.com/yangao07/[email protected] or [email protected]

Download Full-text

MTRAP: Pairwise sequence alignment algorithm by a new measure based on transition probability between two consecutive pairs of residues

BMC Bioinformatics ◽

10.1186/1471-2105-11-235 ◽

2010 ◽

Vol 11 (1) ◽

Cited By ~ 6

Author(s):

Toshihide Hara ◽

Keiko Sato ◽

Masanori Ohya

Keyword(s):

Sequence Alignment ◽

Transition Probability ◽

Alignment Algorithm ◽

Pairwise Sequence Alignment ◽

Sequence Alignment Algorithm

Download Full-text

Multiple Sequence Alignment Optimization Using Meta-Heuristic Techniques

Data Analytics in Medicine ◽

10.4018/978-1-7998-1204-3.ch031 ◽

2020 ◽

pp. 565-579 ◽

Cited By ~ 1

Author(s):

Mohamed Issa ◽

Aboul Ella Hassanien

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Phylogenetic Trees ◽

Pairwise Alignment ◽

Accurate Method ◽

Alignment Algorithm ◽

Bacterial Foraging Optimization ◽

Multiple Sequence ◽

Speed Up ◽

Dna Fragment Assembly

Sequence alignment is a vital process in many biological applications such as Phylogenetic trees construction, DNA fragment assembly and structure/function prediction. Two kinds of alignment are pairwise alignment which align two sequences and Multiple Sequence alignment (MSA) that align sequences more than two. The accurate method of alignment is based on Dynamic Programming (DP) approach which suffering from increasing time exponentially with increasing the length and the number of the aligned sequences. Stochastic or meta-heuristics techniques speed up alignment algorithm but with near optimal alignment accuracy not as that of DP. Hence, This chapter aims to review the recent development of MSA using meta-heuristics algorithms. In addition, two recent techniques are focused in more deep: the first is Fragmented protein sequence alignment using two-layer particle swarm optimization (FTLPSO). The second is Multiple sequence alignment using multi-objective based bacterial foraging optimization algorithm (MO-BFO).

Download Full-text

FAMOUS, faster: using parallel computing techniques to accelerate the FAMOUS/HadCM3 climate model with a focus on the radiative transfer algorithm

Geoscientific Model Development ◽

10.5194/gmd-4-835-2011 ◽

2011 ◽

Vol 4 (3) ◽

pp. 835-844 ◽

Cited By ~ 10

Author(s):

P. Hanappe ◽

A. Beurivé ◽

F. Laguzet ◽

L. Steels ◽

N. Bellouin ◽

...

Keyword(s):

Climate Model ◽

Processing Element ◽

Atmospheric Radiation ◽

Multiple Data ◽

Fortran Code ◽

Thread Pool ◽

Speed Up ◽

Single Data ◽

Hardware Platforms ◽

Air Column

Abstract. We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations. The modified algorithm runs more than 50 times faster on the CELL's Synergistic Processing Element than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60 % of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.

Download Full-text

SIMD (Single Instruction, Multiple Data) Machines

Encyclopedia of Parallel Computing ◽

10.1007/978-0-387-09766-4_2440 ◽

2011 ◽

pp. 1819-1819

Author(s):

Jack Dongarra ◽

Piotr Luszczek ◽

Felix Wolf ◽

Jesper Larsson Träff ◽

Patrice Quinton ◽

...

Keyword(s):

Single Instruction Multiple Data ◽

Multiple Data

Download Full-text

A fully distributed unstructured Navier-Stokes solver for large-scale aeroelasticity computations

The Aeronautical Journal ◽

10.1017/s0001924000012392 ◽

2001 ◽

Vol 105 (1050) ◽

pp. 419-426 ◽

Cited By ~ 10

Author(s):

G. Barakos ◽

M. Vahdati ◽

A.I. Sayma ◽

C. Bréard ◽

M. Imregun

Keyword(s):

Large Scale ◽

Numerical Models ◽

Navier Stokes ◽

Multiple Data ◽

Computational Mesh ◽

Blade Row ◽

Scale Modelling ◽

Modelling Methodology ◽

Speed Up ◽

Development And Validation

Abstract This paper presents the development and validation of a parallel unsteady flow and aeroelasticity code for large-scale numerical models used in turbo machinery applications. The work is based on an existing unstructured Navier-Stokes solver developed over the past ten years by the Aeroelasticity Research Group at Imperial College Vibration University Technology Centre. The single-process multiple-data paradigm was adopted for the parallelisation of the solver and several validation cases were considered. The computational mesh was divided into several sub-sections using a domain decomposition technique. The performance and numerical accuracy of the parallel solver was validated across several computer platforms for various problem sizes. In cases where the solution could be obtained on a single CPU, the serial and parallel versions of the code were found to produce identical results. Studies on up to 32 CPUs showed varying levels of parallelisation efficiency, an almost linear speed-up being obtained in some cases. Finally, an industrial configuration, a 17 blade row turbine with a 47 million point mesh, was discussed to illustrate the potential of the proposed large-scale modelling methodology.

Download Full-text

A radix-2 FFT algorithm for modern single instruction multiple data (SIMD) architectures

IEEE International Conference on Acoustics Speech and Signal Processing ◽

10.1109/icassp.2002.1005373 ◽

2002 ◽

Cited By ~ 9

Author(s):

Rodriguez

Keyword(s):

Single Instruction Multiple Data ◽

Multiple Data

Download Full-text

A scalable ASIP for BP Polar decoding with multiple code lengths

MATEC Web of Conferences ◽

10.1051/matecconf/201823201046 ◽

2018 ◽

Vol 232 ◽

pp. 01046

Author(s):

Wan Qiao ◽

Dake Liu

Keyword(s):

Cmos Technology ◽

Single Instruction Multiple Data ◽

Instruction Set ◽

Maximum Throughput ◽

Specific Instruction ◽

Area Efficiency ◽

Multiple Data ◽

High Area ◽

Multiple Code ◽

Application Specific

In this paper, we propose a flexible scalable BP Polar decoding application-specific instruction set processor (PASIP) that supports multiple code lengths (64 to 4096) and any code rates. High throughputs and sufficient programmability are achieved by the single-instruction-multiple-data (SIMD) based architecture and specially designed Polar decoding acceleration instructions. The synthesis result using 65 nm CMOS technology shows that the total area of PASIP is 2.71 mm2. PASIP provides the maximum throughput of 1563 Mbps (for N = 1024) at the work frequency of 400MHz. The comparison with state-of-art Polar decoders reveals PASIP’s high area efficiency.

Download Full-text

A Video Specific Instruction Set Architecture for ASIP design

VLSI Design ◽

10.1155/2007/58431 ◽

2007 ◽

Vol 2007 ◽

pp. 1-7 ◽

Cited By ~ 5

Author(s):

Zheng Shen ◽

Hu He ◽

Yanjun Zhang ◽

Yihe Sun

Keyword(s):

Video Coding ◽

Digital Signal ◽

Digital Signal Processors ◽

Single Instruction Multiple Data ◽

Instruction Set ◽

Instruction Set Architecture ◽

Specific Instruction ◽

Multiple Data ◽

Signal Processors

This paper describes a novel video specific instruction set architecture for ASIP design. With single instruction multiple data (SIMD) instructions, two destination modes, and video specific instructions, an instruction set architecture is introduced to enhance the performance for video applications. Furthermore, we quantify the improvement on H.263 encoding. In this paper, we evaluate and compare the performance of VS-ISA, other DSPs (digital signal processors), and conventional SIMD media extensions in the context of video coding. Our evaluation results show that VS-ISA improves the processor's performance by approximately 5x on H.263 encoding, and VS-ISA outperforms other architectures by 1.6x to 8.57x in computing IDCT.

Download Full-text