HARNESSING THE POWER OF IDLE GPUS FOR ACCELERATION OF BIOLOGICAL SEQUENCE ALIGNMENT

This paper presents a parallel system capable of accelerating biological sequence alignment on the graphics processing unit (GPU) grid. The GPU grid in this paper is a desktop grid system that utilizes idle GPUs and CPUs in the office and home. Our parallel implementation employs a master-worker paradigm to accelerate an OpenGL-based algorithm that runs on a single GPU. We integrate this implementation into a screensaver-based grid system that detects idle resources on which the alignment code can run. We also show some experimental results comparing our implementation with three different implementations running on a single GPU, a single CPU, or multiple CPUs. As a result, we find that a single non-dedicated GPU can provide us almost the same throughput as two dedicated CPUs in our laboratory environment, where GPU-equipped machines are ordinarily used to develop GPU applications. In a dedicated environment, the GPU-accelerated code achieves five times higher throughput than the CPU-based code. Furthermore, a linear speedup of 30.7X is observed on a 32-node cluster of dedicated GPUs. We also implement a compute unified device architecture (CUDA) based algorithm to demonstrate further acceleration.

Download Full-text

Parallel data mining techniques on Graphics Processing Unit with Compute Unified Device Architecture (CUDA)

The Journal of Supercomputing ◽

10.1007/s11227-011-0672-7 ◽

2011 ◽

Vol 64 (3) ◽

pp. 942-967 ◽

Cited By ~ 44

Author(s):

Liheng Jian ◽

Cheng Wang ◽

Ying Liu ◽

Shenshen Liang ◽

Weidong Yi ◽

...

Keyword(s):

Data Mining ◽

Graphics Processing Unit ◽

Processing Unit ◽

Compute Unified Device Architecture ◽

Data Mining Techniques ◽

Device Architecture ◽

Parallel Data ◽

Parallel Data Mining ◽

Graphics Processing

Download Full-text

Optimization of K-Means Clustering on Graphics Processing Unit Using Compute Unified Device Architecture

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2017.6274 ◽

2017 ◽

Vol 14 (1) ◽

pp. 789-795

Author(s):

V Saveetha ◽

S Sophia

Keyword(s):

High Performance ◽

Programming Model ◽

Graphics Processing Unit ◽

Direct Access ◽

Communication Overhead ◽

Processing Unit ◽

Compute Unified Device Architecture ◽

Central Processing ◽

Device Architecture ◽

Graphics Processing

Parallel data clustering aims at using algorithms and methods to extract knowledge from fat databases in rational time using high performance architectures. The computational challenge faced by cluster analysis due to increasing capacity of data can be overcome by exploiting the power of these architectures. The recent development in parallel power of Graphics Processing Unit enables low cost high performance solutions for general purpose applications. The Compute Unified Device Architecture programming model provides application programming interface methods to handle data proficiently on Graphics Processing Unit for iterative clustering algorithms like K-Means. The existing Graphics Processing Unit based K-Means algorithms highly focus on improvising the speedup of the algorithms and fall short to handle the high time spent on transfer of data between the Central Processing Unit and Graphics Processing Unit. A competent K-Means algorithm is proposed in this paper to lessen the transfer time by introducing a novel approach to check the convergence of the algorithm and utilize the pinned memory for direct access. This algorithm outperforms the other algorithms by maximizing parallelism and utilizing the memory features. The relative speedups and the validity measure for the proposed algorithm is elevated when compared with K-Means on Graphics Processing Unit and K-Means using Flag on Graphics Processing Unit. Thus the planned approach proves that communication overhead can be reduced in K-Means clustering.

Download Full-text

Accelerating Rabin Karp on a Graphics Processing Unit (GPU) using Compute Unified Device Architecture (CUDA)

7th International Conference on Information and Automation for Sustainability ◽

10.1109/iciafs.2014.7069589 ◽

2014 ◽

Cited By ~ 3

Author(s):

Nayomi Dayarathne ◽

Roshan Ragel

Keyword(s):

Graphics Processing Unit ◽

Processing Unit ◽

Compute Unified Device Architecture ◽

Device Architecture ◽

Graphics Processing

Download Full-text

Parallelization of Global Sequence Alignment on Graphics Processing Unit

2020 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI) ◽

10.1109/ccci49893.2020.9256747 ◽

2020 ◽

Author(s):

Kailash W. Kalare ◽

Mohammad S. Obaidat ◽

Jitendra V. Tembhurne ◽

Chandrashekhar Meshram ◽

Kuei-Fang Hsiao

Keyword(s):

Sequence Alignment ◽

Graphics Processing Unit ◽

Processing Unit ◽

Graphics Processing

Download Full-text

Finite element method completely implemented for graphic processor units using parallel algorithm libraries

The International Journal of High Performance Computing Applications ◽

10.1177/1094342017694703 ◽

2017 ◽

Vol 33 (1) ◽

pp. 53-66 ◽

Cited By ~ 1

Author(s):

Franz Pichler ◽

Gundolf Haase

Keyword(s):

Finite Element ◽

Graphics Processing Unit ◽

Computational Cost ◽

Processing Unit ◽

Time Step ◽

Device Architecture ◽

Transient Problems ◽

Speed Up ◽

Automotive Batteries ◽

Graphics Processing

A finite element code is developed in which all of the computationally expensive steps are performed on a graphics processing unit via the THRUST and the PARALUTION libraries. The code focuses on the simulation of transient problems where the repeated computations per time-step create the computational cost. It is used to solve partial and ordinary differential equations as they arise in thermal-runaway simulations of automotive batteries. The speed-up obtained by utilizing the graphics processing unit for every critical step is compared against the single core and the multi-threading solutions which are also supported by the chosen libraries. This way a high total speed-up on the graphics processing unit is achieved without the need for programming a single classical Compute Unified Device Architecture kernel.

Download Full-text

AN EVALUATION OF MULTIPLE FEED-FORWARD NETWORKS ON GPUs

International Journal of Neural Systems ◽

10.1142/s0129065711002638 ◽

2011 ◽

Vol 21 (01) ◽

pp. 31-47 ◽

Cited By ~ 14

Author(s):

NOEL LOPES ◽

BERNARDETE RIBEIRO

Keyword(s):

Graphics Processing Unit ◽

Parallel Implementation ◽

Low Cost ◽

Back Propagation ◽

General Purpose ◽

Training System ◽

Graphics Hardware ◽

Processing Unit ◽

Data Parallel ◽

Graphics Processing

The Graphics Processing Unit (GPU) originally designed for rendering graphics and which is difficult to program for other tasks, has since evolved into a device suitable for general-purpose computations. As a result graphics hardware has become progressively more attractive yielding unprecedented performance at a relatively low cost. Thus, it is the ideal candidate to accelerate a wide variety of data parallel tasks in many fields such as in Machine Learning (ML). As problems become more and more demanding, parallel implementations of learning algorithms are crucial for a useful application. In particular, the implementation of Neural Networks (NNs) in GPUs can significantly reduce the long training times during the learning process. In this paper we present a GPU parallel implementation of the Back-Propagation (BP) and Multiple Back-Propagation (MBP) algorithms, and describe the GPU kernels needed for this task. The results obtained on well-known benchmarks show faster training times and improved performances as compared to the implementation in traditional hardware, due to maximized floating-point throughput and memory bandwidth. Moreover, a preliminary GPU based Autonomous Training System (ATS) is developed which aims at automatically finding high-quality NNs-based solutions for a given problem.

Download Full-text

Software Polarization Spectrometer "PolariS"

Journal of Astronomical Instrumentation ◽

10.1142/s225117171450010x ◽

2014 ◽

Vol 03 (03n04) ◽

pp. 1450010 ◽

Cited By ~ 8

Author(s):

Izumi Mizuno ◽

Seiji Kameno ◽

Amane Kano ◽

Makoto Kuroo ◽

Fumitaka Nakamura ◽

...

Keyword(s):

Dynamic Range ◽

Graphics Processing Unit ◽

Zeeman Splitting ◽

High Spectral Resolution ◽

Processing Unit ◽

Analog To Digital ◽

Star Forming ◽

Device Architecture ◽

Graphics Processing ◽

High Degree

We have developed a software-based polarization spectrometer, PolariS, to acquire full-Stokes spectra with a very high spectral resolution of 61 Hz. The primary aim of PolariS is to measure the magnetic fields in dense star-forming cores by detecting the Zeeman splitting of molecular emission lines. The spectrometer consists of a commercially available digital sampler and a Linux computer. The computer is equipped with a graphics processing unit (GPU) to process FFT and cross-correlation using the Compute Unified Device Architecture (CUDA) library developed by NVIDIA. Thanks to a high degree of precision in quantization of the analog-to-digital converter and arithmetic in the GPU, PolariS offers excellent performances in linearity, dynamic range, sensitivity, bandpass flatness and stability. The software has been released under the MIT License and is available to the public. In this paper, we report the design of PolariS and its performance verified through engineering tests and commissioning observations.

Download Full-text

Evaluating the computing efficiencies (specificity and sensitivity) of graphics processing unit (GPU)-accelerated DNA sequence alignment tools against central processing unit (CPU) alignment tool

Journal of Bioinformatics and Sequence Analysis ◽

10.5897/jbsa2018.0109 ◽

2018 ◽

Vol 9 (2) ◽

pp. 10-14 ◽

Cited By ~ 1

Author(s):

Pawar Shrikant ◽

Stanam Aditya ◽

Zhu Ying

Keyword(s):

Dna Sequence ◽

Sequence Alignment ◽

Graphics Processing Unit ◽

Central Processing Unit ◽

Processing Unit ◽

Central Processing ◽

Dna Sequence Alignment ◽

Specificity And Sensitivity ◽

Alignment Tool ◽

Graphics Processing

Download Full-text

Recent Development of Molecular Simulation Based on GPU in Material Science

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.79-82.1309 ◽

2009 ◽

Vol 79-82 ◽

pp. 1309-1312

Author(s):

Kuan Yu ◽

Bo Zhu

Keyword(s):

Molecular Simulation ◽

Material Science ◽

Graphics Processing Unit ◽

Molecular Simulations ◽

General Purpose ◽

Processing Unit ◽

Device Architecture ◽

Simulation Based ◽

The Common ◽

Graphics Processing

Molecular simulation can provide mechanism insights into how material behaviour related to molecular properties and microscopic details of the arrangement of many molecules. With the development of Graphics Processing Unit (GPU), scientists have realized general purpose molecular simulations on GPU and the Common Unified Device Architecture (CUDA) environment. In this paper, we provided a brief overview of molecular simulation and CUDA, introduced the recent achievements in molecular simulation based on GPU in material science, mainly about Monte Carlo method and Molecular Dynamics. The recent research achievements have shown that GPUs can provide unprecedented computational power for scientific applications. With optimized algorithms and program codes, a single GPU can provide a performance equivalent to that of a distributed computer cluster. So, study of molecular simulations based on GPU will accelerate the development of material science in the future.

Download Full-text