Graphics card processing: accelerating profile-profile alignment

AbstractAlignment is the fundamental operation in molecular biology for comparing biomolecular sequences. The most widely used method for aligning groups of alignments is based on the alignment of the profiles corresponding to the groups. We show that profile-profile alignment can be significantly speeded up by general purpose computing on a modern commodity graphics card. Wavefront and matrix-matrix product approaches for implementing profile-profile alignment onto graphics processor are analyzed. The average speed-up obtained is one order of magnitude even when overheads are considered. Thus the computational power of graphics cards can be exploited to develop improved solutions for multiple sequence alignment.

Download Full-text

Continuous Gravitational-Wave Data Analysis with General Purpose Computing on Graphic Processing Units

Universe ◽

10.3390/universe7070218 ◽

2021 ◽

Vol 7 (7) ◽

pp. 218

Author(s):

Iuri La Rosa ◽

Pia Astone ◽

Sabrina D’Antonio ◽

Sergio Frasca ◽

Paola Leaci ◽

...

Keyword(s):

Data Analysis ◽

General Purpose ◽

Gpu Programming ◽

Computational Power ◽

Graphic Processing Units ◽

New Approach ◽

Multicore System ◽

Speed Up ◽

High Level ◽

Graphic Processing

We present a new approach to searching for Continuous gravitational Waves (CWs) emitted by isolated rotating neutron stars, using the high parallel computing efficiency and computational power of modern Graphic Processing Units (GPUs). Specifically, in this paper the porting of one of the algorithms used to search for CW signals, the so-called FrequencyHough transform, on the TensorFlow framework, is described. The new code has been fully tested and its performance on GPUs has been compared to those in a CPU multicore system of the same class, showing a factor of 10 speed-up. This demonstrates that GPU programming with general purpose libraries (the those of the TensorFlow framework) of a high-level programming language can provide a significant improvement of the performance of data analysis, opening new perspectives on wide-parameter searches for CWs.

Download Full-text

MULTIPROCESSOR AND MEMORY ARCHITECTURE OF THE NEUROCOMPUTER SYNAPSE-1

International Journal of Neural Systems ◽

10.1142/s0129065793000274 ◽

1993 ◽

Vol 04 (04) ◽

pp. 333-336 ◽

Cited By ~ 2

Author(s):

U. RAMACHER ◽

W. RAAB ◽

J. ANLAUF ◽

U. HACHMANN ◽

J. BEICHTER ◽

...

Keyword(s):

Programming Language ◽

Systolic Array ◽

General Purpose ◽

Computational Power ◽

Memory Architecture ◽

Neural Signal ◽

Benchmark Test ◽

Processing Power ◽

Speed Up ◽

Signal Processors

A general purpose neurocomputer, SYNAPSE-1, which exhibits a multiprocessor and memory architecture is presented. It offers wide flexibility with respect to neural algorithms and a speed-up factor of several orders of magnitude — including learning. The computational power is provided by a 2-dimensional systolic array of neural signal processors. Since the weights are stored outside these NSPs, memory size and processing power can be adapted individually to the application needs. A neural algorithms programming language, embedded in C ++ has been defined for the user to cope with the neurocomputer. In a benchmark test, the prototype of SYNAPSE-1 was 8000 times as fast as a standard workstation.

Download Full-text

High Performance Biological Pairwise Sequence Alignment: FPGA versus GPU versus Cell BE versus GPP

International Journal of Reconfigurable Computing ◽

10.1155/2012/752910 ◽

2012 ◽

Vol 2012 ◽

pp. 1-15 ◽

Cited By ~ 23

Author(s):

Khaled Benkrid ◽

Ali Akoglu ◽

Cheng Ling ◽

Yang Song ◽

Ying Liu ◽

...

Keyword(s):

Sequence Alignment ◽

Reconfigurable Computing ◽

High Performance ◽

General Purpose ◽

Alignment Algorithm ◽

Pairwise Sequence Alignment ◽

Performance Per Watt ◽

Order Of Magnitude ◽

Speed Up ◽

General Purpose Processors

This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBM’s Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion and perform better than all other platforms on performance per dollar criterion, although by a much smaller margin. Cell BE and GPU come second and third, respectively, on both performance per watt and performance per dollar criteria. In general, in order to outperform other technologies on performance per dollar criterion (using currently available hardware and development tools), FPGAs need to achieve at least two orders of magnitude speed-up compared to general-purpose processors and one order of magnitude speed-up compared to domain-specific technologies such as GPUs.

Download Full-text

Balanced Sparsity for Efficient DNN Inference on GPU

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015676 ◽

2019 ◽

Vol 33 ◽

pp. 5676-5683 ◽

Cited By ~ 3

Author(s):

Zhuliang Yao ◽

Shijie Cao ◽

Wencong Xiao ◽

Chen Zhang ◽

Lanshun Nie

Keyword(s):

Deep Neural Networks ◽

General Purpose ◽

Coarse Grained ◽

Efficient Computation ◽

Model Accuracy ◽

Sparse Model ◽

Model Inference ◽

Fine Grained ◽

Practical Inference ◽

Speed Up

In trained deep neural networks, unstructured pruning can reduce redundant weights to lower storage cost. However, it requires the customization of hardwares to speed up practical inference. Another trend accelerates sparse model inference on general-purpose hardwares by adopting coarse-grained sparsity to prune or regularize consecutive weights for efficient computation. But this method often sacrifices model accuracy. In this paper, we propose a novel fine-grained sparsity approach, Balanced Sparsity, to achieve high model accuracy with commercial hardwares efficiently. Our approach adapts to high parallelism property of GPU, showing incredible potential for sparsity in the widely deployment of deep learning services. Experiment results show that Balanced Sparsity achieves up to 3.1x practical speedup for model inference on GPU, while retains the same high model accuracy as finegrained sparsity.

Download Full-text

Speed management as a measure to improve road safety on Polish regional roads

Archives of Transport ◽

10.5604/01.3001.0010.4225 ◽

2017 ◽

Vol 43 (3) ◽

pp. 29-42 ◽

Cited By ~ 6

Author(s):

Stanisław Gaca ◽

Sylwia Pogodzińska

Keyword(s):

Road Safety ◽

Speed Limits ◽

Speed Reduction ◽

Average Speed ◽

Traffic Calming ◽

Management Measures ◽

Speed Management ◽

Speed Up ◽

Road Segments ◽

Local Speed

The article presents the issue of the implementation of speed management measures on regional roads, whose character requires the use of different solutions than those on national roads. The authors briefly described speed management measures, the conditions for their implementation and their effectiveness with reference to environmental conditions and road safety. The further part of the paper presents selected results of the authors' research into the speed on various road segments equipped with different speed management measures. The estimations were made as to the impact of local speed limits and traffic calming measures on drivers' behaviour in free flow conditions. This research found that the introduction of the local speed limits cause reduction in average speed and 85th percentile speed up to 11.9 km/h (14.4%) and 16.3 km/h (16.8%) respectively. These values are averaged in the tested samples. Speed reduction depends strongly on the value of the limit and local circumstances. Despite speed reduction, the share of drivers who do not comply with speed limits was still high and ranged from 43% in the case of a 70 km/h limit, up to 89% for a 40 km/h limit. As far as comprehensive traffic calming measures are concerned, results show decrease in average speed and 85th percentile speed up to 18.1 km/h and 20.8 km/h respectively. For some road segments, however, the values of average speed and 85th percentile speed increased. It confirms that the effectiveness of speed management measures is strongly determined by local circumstances.

Download Full-text

A 3D Complexity-Adaptive Approach to Exploit Sparsity in Elastic Wave Propagation

Geophysics ◽

10.1190/geo2020-0490.1 ◽

2021 ◽

pp. 1-64

Author(s):

Claudia Haindl ◽

Kuangdai Leng ◽

Tarje Nissen-Meyer

Keyword(s):

Degrees Of Freedom ◽

Computational Cost ◽

Parameter Tuning ◽

Performance Limits ◽

Absorbing Boundary ◽

Adaptive Approach ◽

Order Of Magnitude ◽

Speed Up ◽

Complex Settings ◽

Close Fit

We present an adaptive approach to seismic modeling by which the computational cost of a 3D simulation can be reduced while retaining resolution and accuracy. This Azimuthal Complexity Adaptation (ACA) approach relies upon the inherent smoothness of wavefields around the azimuth of a source-centered cylindrical coordinate system. Azimuthal oversampling is thereby detected and eliminated. The ACA method has recently been introduced as part of AxiSEM3D, an open-source solver for global seismology. We employ a generalization of this solver which can handle local-scale Cartesian models, and which features a combination of an absorbing boundary condition and a sponge boundary with automated parameter tuning. The ACA method is benchmarked against an established 3D method using a model featuring bathymetry and a salt body. We obtain a close fit where the models are implemented equally in both solvers and an expectedly poor fit otherwise, with the ACA method running an order of magnitude faster than the classic 3D method. Further, we present maps of maximum azimuthal wavenumbers that are created to facilitate azimuthal complexity adaptation. We show how these maps can be interpreted in terms of the 3D complexity of the wavefield and in terms of seismic resolution. The expected performance limits of the ACA method for complex 3D structures are tested on the SEG/EAGE salt model. In this case, ACA still reduces the overall degrees of freedom by 92% compared to a complexity-blind AxiSEM3D simulation. In comparison with the reference 3D method, we again find a close fit and a speed-up of a factor 7. We explore how the performance of ACA is affected by model smoothness by subjecting the SEG/EAGE salt model to Gaussian smoothing. This results in a doubling of the speed-up. ACA thus represents a convergent, versatile and efficient method for a variety of complex settings and scales.

Download Full-text

Multiple Sequence Alignment Optimization Using Meta-Heuristic Techniques

Data Analytics in Medicine ◽

10.4018/978-1-7998-1204-3.ch031 ◽

2020 ◽

pp. 565-579 ◽

Cited By ~ 1

Author(s):

Mohamed Issa ◽

Aboul Ella Hassanien

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Phylogenetic Trees ◽

Pairwise Alignment ◽

Accurate Method ◽

Alignment Algorithm ◽

Bacterial Foraging Optimization ◽

Multiple Sequence ◽

Speed Up ◽

Dna Fragment Assembly

Sequence alignment is a vital process in many biological applications such as Phylogenetic trees construction, DNA fragment assembly and structure/function prediction. Two kinds of alignment are pairwise alignment which align two sequences and Multiple Sequence alignment (MSA) that align sequences more than two. The accurate method of alignment is based on Dynamic Programming (DP) approach which suffering from increasing time exponentially with increasing the length and the number of the aligned sequences. Stochastic or meta-heuristics techniques speed up alignment algorithm but with near optimal alignment accuracy not as that of DP. Hence, This chapter aims to review the recent development of MSA using meta-heuristics algorithms. In addition, two recent techniques are focused in more deep: the first is Fragmented protein sequence alignment using two-layer particle swarm optimization (FTLPSO). The second is Multiple sequence alignment using multi-objective based bacterial foraging optimization algorithm (MO-BFO).

Download Full-text

Accelerating numerical wave propagation using wavefield adapted meshes. Part I: forward and adjoint modelling

Geophysical Journal International ◽

10.1093/gji/ggaa058 ◽

2020 ◽

Vol 221 (3) ◽

pp. 1580-1590 ◽

Cited By ~ 3

Author(s):

M van Driel ◽

C Boehm ◽

L Krischer ◽

M Afanasiev

Keyword(s):

Wave Propagation ◽

Adaptive Mesh Refinement ◽

Waveform Inversion ◽

Computational Cost ◽

Model Space ◽

Adaptive Mesh ◽

Order Of Magnitude ◽

Speed Up ◽

Adjoint Modelling ◽

The Waves

SUMMARY An order of magnitude speed-up in finite-element modelling of wave propagation can be achieved by adapting the mesh to the anticipated space-dependent complexity and smoothness of the waves. This can be achieved by designing the mesh not only to respect the local wavelengths, but also the propagation direction of the waves depending on the source location, hence by anisotropic adaptive mesh refinement. Discrete gradients with respect to material properties as needed in full waveform inversion can still be computed exactly, but at greatly reduced computational cost. In order to do this, we explicitly distinguish the discretization of the model space from the discretization of the wavefield and derive the necessary expressions to map the discrete gradient into the model space. While the idea is applicable to any wave propagation problem that retains predictable smoothness in the solution, we highlight the idea of this approach with instructive 2-D examples of forward as well as inverse elastic wave propagation. Furthermore, we apply the method to 3-D global seismic wave simulations and demonstrate how meshes can be constructed that take advantage of high-order mappings from the reference coordinates of the finite elements to physical coordinates. Error level and speed-ups are estimated based on convergence tests with 1-D and 3-D models.

Download Full-text

Accelerated methods for direct computation of fusion alpha particle losses within, stellarator optimization

Journal of Plasma Physics ◽

10.1017/s0022377820000203 ◽

2020 ◽

Vol 86 (2) ◽

Author(s):

Christopher G. Albert ◽

Sergei V. Kasilov ◽

Winfried Kernbichler

Keyword(s):

Alpha Particle ◽

Symplectic Integrators ◽

Particle Loss ◽

Early Classification ◽

Statistical Computation ◽

Computational Speed ◽

Order Of Magnitude ◽

Speed Up ◽

Orbit Types ◽

Optimization Loop

Accelerated statistical computation of collisionless fusion alpha particle losses in stellarator configurations is presented based on direct guiding-centre orbit tracing. The approach relies on the combination of recently developed symplectic integrators in canonicalized magnetic flux coordinates and early classification into regular and chaotic orbit types. Only chaotic orbits have to be traced up to the end, as their behaviour is unpredictable. An implementation of this technique is provided in the code SIMPLE (symplectic integration methods for particle loss estimation, Albert et al., 2020b, doi:10.5281/zenodo.3666820). Reliable results were obtained for an ensemble of 1000 orbits in a quasi-isodynamic, a quasi-helical and a quasi-axisymmetric configuration. Overall, a computational speed up of approximately one order of magnitude is achieved compared to direct integration via adaptive Runge–Kutta methods. This reduces run times to the range of typical magnetic equilibrium computations and makes direct alpha particle loss computation adequate for use within a stellarator optimization loop.

Download Full-text

A general-purpose time-step criterion for simulations with gravity

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa1453 ◽

2020 ◽

Vol 495 (4) ◽

pp. 4306-4313 ◽

Cited By ~ 1

Author(s):

Michael Y Grudić ◽

Philip F Hopkins

Keyword(s):

Time Scale ◽

General Purpose ◽

Gravitational Acceleration ◽

Time Step ◽

Simulation Code ◽

Time Stepping ◽

Dynamical Time ◽

Adaptive Time Step ◽

Order Of Magnitude ◽

True Time

ABSTRACT We describe a new adaptive time-step criterion for integrating gravitational motion, which uses the tidal tensor to estimate the local dynamical time-scale and scales the time-step proportionally. This provides a better candidate for a truly general-purpose gravitational time-step criterion than the usual prescription derived from the gravitational acceleration, which does not respect the equivalence principle, breaks down when $\boldsymbol {a}=0$, and does not obey the same dimensional scaling as the true time-scale of orbital motion. We implement the tidal time-step criterion in the simulation code gizmo, and examine controlled tests of collisionless galaxy and star cluster models, as well as galaxy merger simulations. The tidal criterion estimates the dynamical time faithfully, and generally provides a more efficient time-stepping scheme compared to an acceleration criterion. Specifically, the tidal criterion achieves order-of-magnitude smaller energy errors for the same number of force evaluations in potentials with inner profiles shallower than ρ ∝ r−1 (i.e. where $\boldsymbol {a}\rightarrow 0$), such as star clusters and cored galaxies. For a given problem these advantages must be weighed against the additional overhead of computing the tidal tensor on-the-fly, but in many cases this overhead is small.

Download Full-text