HARNESSING THE POWER OF IDLE GPUS FOR ACCELERATION OF BIOLOGICAL SEQUENCE ALIGNMENT

2009 ◽  
Vol 19 (04) ◽  
pp. 513-533 ◽  
Author(s):  
FUMIHIKO INO ◽  
YUKI KOTANI ◽  
YUMA MUNEKAWA ◽  
KENICHI HAGIHARA

This paper presents a parallel system capable of accelerating biological sequence alignment on the graphics processing unit (GPU) grid. The GPU grid in this paper is a desktop grid system that utilizes idle GPUs and CPUs in the office and home. Our parallel implementation employs a master-worker paradigm to accelerate an OpenGL-based algorithm that runs on a single GPU. We integrate this implementation into a screensaver-based grid system that detects idle resources on which the alignment code can run. We also show some experimental results comparing our implementation with three different implementations running on a single GPU, a single CPU, or multiple CPUs. As a result, we find that a single non-dedicated GPU can provide us almost the same throughput as two dedicated CPUs in our laboratory environment, where GPU-equipped machines are ordinarily used to develop GPU applications. In a dedicated environment, the GPU-accelerated code achieves five times higher throughput than the CPU-based code. Furthermore, a linear speedup of 30.7X is observed on a 32-node cluster of dedicated GPUs. We also implement a compute unified device architecture (CUDA) based algorithm to demonstrate further acceleration.

2017 ◽  
Vol 14 (1) ◽  
pp. 789-795
Author(s):  
V Saveetha ◽  
S Sophia

Parallel data clustering aims at using algorithms and methods to extract knowledge from fat databases in rational time using high performance architectures. The computational challenge faced by cluster analysis due to increasing capacity of data can be overcome by exploiting the power of these architectures. The recent development in parallel power of Graphics Processing Unit enables low cost high performance solutions for general purpose applications. The Compute Unified Device Architecture programming model provides application programming interface methods to handle data proficiently on Graphics Processing Unit for iterative clustering algorithms like K-Means. The existing Graphics Processing Unit based K-Means algorithms highly focus on improvising the speedup of the algorithms and fall short to handle the high time spent on transfer of data between the Central Processing Unit and Graphics Processing Unit. A competent K-Means algorithm is proposed in this paper to lessen the transfer time by introducing a novel approach to check the convergence of the algorithm and utilize the pinned memory for direct access. This algorithm outperforms the other algorithms by maximizing parallelism and utilizing the memory features. The relative speedups and the validity measure for the proposed algorithm is elevated when compared with K-Means on Graphics Processing Unit and K-Means using Flag on Graphics Processing Unit. Thus the planned approach proves that communication overhead can be reduced in K-Means clustering.


Author(s):  
Franz Pichler ◽  
Gundolf Haase

A finite element code is developed in which all of the computationally expensive steps are performed on a graphics processing unit via the THRUST and the PARALUTION libraries. The code focuses on the simulation of transient problems where the repeated computations per time-step create the computational cost. It is used to solve partial and ordinary differential equations as they arise in thermal-runaway simulations of automotive batteries. The speed-up obtained by utilizing the graphics processing unit for every critical step is compared against the single core and the multi-threading solutions which are also supported by the chosen libraries. This way a high total speed-up on the graphics processing unit is achieved without the need for programming a single classical Compute Unified Device Architecture kernel.


2011 ◽  
Vol 21 (01) ◽  
pp. 31-47 ◽  
Author(s):  
NOEL LOPES ◽  
BERNARDETE RIBEIRO

The Graphics Processing Unit (GPU) originally designed for rendering graphics and which is difficult to program for other tasks, has since evolved into a device suitable for general-purpose computations. As a result graphics hardware has become progressively more attractive yielding unprecedented performance at a relatively low cost. Thus, it is the ideal candidate to accelerate a wide variety of data parallel tasks in many fields such as in Machine Learning (ML). As problems become more and more demanding, parallel implementations of learning algorithms are crucial for a useful application. In particular, the implementation of Neural Networks (NNs) in GPUs can significantly reduce the long training times during the learning process. In this paper we present a GPU parallel implementation of the Back-Propagation (BP) and Multiple Back-Propagation (MBP) algorithms, and describe the GPU kernels needed for this task. The results obtained on well-known benchmarks show faster training times and improved performances as compared to the implementation in traditional hardware, due to maximized floating-point throughput and memory bandwidth. Moreover, a preliminary GPU based Autonomous Training System (ATS) is developed which aims at automatically finding high-quality NNs-based solutions for a given problem.


2014 ◽  
Vol 03 (03n04) ◽  
pp. 1450010 ◽  
Author(s):  
Izumi Mizuno ◽  
Seiji Kameno ◽  
Amane Kano ◽  
Makoto Kuroo ◽  
Fumitaka Nakamura ◽  
...  

We have developed a software-based polarization spectrometer, PolariS, to acquire full-Stokes spectra with a very high spectral resolution of 61 Hz. The primary aim of PolariS is to measure the magnetic fields in dense star-forming cores by detecting the Zeeman splitting of molecular emission lines. The spectrometer consists of a commercially available digital sampler and a Linux computer. The computer is equipped with a graphics processing unit (GPU) to process FFT and cross-correlation using the Compute Unified Device Architecture (CUDA) library developed by NVIDIA. Thanks to a high degree of precision in quantization of the analog-to-digital converter and arithmetic in the GPU, PolariS offers excellent performances in linearity, dynamic range, sensitivity, bandpass flatness and stability. The software has been released under the MIT License and is available to the public. In this paper, we report the design of PolariS and its performance verified through engineering tests and commissioning observations.


2009 ◽  
Vol 79-82 ◽  
pp. 1309-1312
Author(s):  
Kuan Yu ◽  
Bo Zhu

Molecular simulation can provide mechanism insights into how material behaviour related to molecular properties and microscopic details of the arrangement of many molecules. With the development of Graphics Processing Unit (GPU), scientists have realized general purpose molecular simulations on GPU and the Common Unified Device Architecture (CUDA) environment. In this paper, we provided a brief overview of molecular simulation and CUDA, introduced the recent achievements in molecular simulation based on GPU in material science, mainly about Monte Carlo method and Molecular Dynamics. The recent research achievements have shown that GPUs can provide unprecedented computational power for scientific applications. With optimized algorithms and program codes, a single GPU can provide a performance equivalent to that of a distributed computer cluster. So, study of molecular simulations based on GPU will accelerate the development of material science in the future.


Sign in / Sign up

Export Citation Format

Share Document