Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit

Ke-shi Ge; Hua-you Su; Dong-sheng Li; Xi-cheng Lu

doi:10.1631/fitee.1601786

Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit

Frontiers of Information Technology & Electronic Engineering ◽

10.1631/fitee.1601786 ◽

2017 ◽

Vol 18 (7) ◽

pp. 915-927 ◽

Cited By ~ 3

Author(s):

Ke-shi Ge ◽

Hua-you Su ◽

Dong-sheng Li ◽

Xi-cheng Lu

Keyword(s):

Clustering Algorithm ◽

Graphics Processing Unit ◽

Parallel Implementation ◽

Processing Unit ◽

Density Peaks ◽

Density Peaks Clustering ◽

Graphics Processing

Download Full-text

Parallel Implementation of Membrane Computing-Inspired Clustering Algorithm on Graphics Processing Unit

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2016.5199 ◽

2016 ◽

Vol 13 (6) ◽

pp. 3673-3680 ◽

Cited By ~ 1

Author(s):

Hong Peng ◽

Jie Jin ◽

Jun Wang

Keyword(s):

Clustering Algorithm ◽

Membrane Computing ◽

Graphics Processing Unit ◽

Parallel Implementation ◽

Processing Unit ◽

Graphics Processing

Download Full-text

AN EVALUATION OF MULTIPLE FEED-FORWARD NETWORKS ON GPUs

International Journal of Neural Systems ◽

10.1142/s0129065711002638 ◽

2011 ◽

Vol 21 (01) ◽

pp. 31-47 ◽

Cited By ~ 14

Author(s):

NOEL LOPES ◽

BERNARDETE RIBEIRO

Keyword(s):

Graphics Processing Unit ◽

Parallel Implementation ◽

Low Cost ◽

Back Propagation ◽

General Purpose ◽

Training System ◽

Graphics Hardware ◽

Processing Unit ◽

Data Parallel ◽

Graphics Processing

The Graphics Processing Unit (GPU) originally designed for rendering graphics and which is difficult to program for other tasks, has since evolved into a device suitable for general-purpose computations. As a result graphics hardware has become progressively more attractive yielding unprecedented performance at a relatively low cost. Thus, it is the ideal candidate to accelerate a wide variety of data parallel tasks in many fields such as in Machine Learning (ML). As problems become more and more demanding, parallel implementations of learning algorithms are crucial for a useful application. In particular, the implementation of Neural Networks (NNs) in GPUs can significantly reduce the long training times during the learning process. In this paper we present a GPU parallel implementation of the Back-Propagation (BP) and Multiple Back-Propagation (MBP) algorithms, and describe the GPU kernels needed for this task. The results obtained on well-known benchmarks show faster training times and improved performances as compared to the implementation in traditional hardware, due to maximized floating-point throughput and memory bandwidth. Moreover, a preliminary GPU based Autonomous Training System (ATS) is developed which aims at automatically finding high-quality NNs-based solutions for a given problem.

Download Full-text

Scalable and highly parallel implementation of Smith-Waterman on graphics processing unit using CUDA

Cluster Computing ◽

10.1007/s10586-009-0089-8 ◽

2009 ◽

Vol 12 (3) ◽

pp. 341-352 ◽

Cited By ~ 9

Author(s):

Ali Akoglu ◽

Gregory M. Striemer

Keyword(s):

Graphics Processing Unit ◽

Parallel Implementation ◽

Processing Unit ◽

Graphics Processing

Download Full-text

Parallel implementation of an error diffusion halftoning algorithm with a general purpose graphics processing unit

2010 IEEE International Conference on Image Processing ◽

10.1109/icip.2010.5653503 ◽

2010 ◽

Cited By ~ 2

Author(s):

Becksang Seong ◽

Jaewoo Ahn ◽

Wonyong Sung

Keyword(s):

Graphics Processing Unit ◽

Parallel Implementation ◽

General Purpose ◽

Error Diffusion ◽

Processing Unit ◽

Graphics Processing

Download Full-text

Massively parallel implementation of cyclic LDPC codes on a general purpose graphics processing unit

2009 IEEE Workshop on Signal Processing Systems ◽

10.1109/sips.2009.5336268 ◽

2009 ◽

Cited By ~ 10

Author(s):

Hyunwoo Ji ◽

Junho Cho ◽

Wonyong Sung

Keyword(s):

Graphics Processing Unit ◽

Parallel Implementation ◽

Ldpc Codes ◽

General Purpose ◽

Massively Parallel ◽

Processing Unit ◽

Graphics Processing

Download Full-text

Scalable parallel implementation of independent components analysis on the graphics processing unit

2011 24th Canadian Conference on Electrical and Computer Engineering(CCECE) ◽

10.1109/ccece.2011.6030591 ◽

2011 ◽

Cited By ~ 1

Author(s):

Jacquelyne Forgette ◽

Renata Wachowiak Smolikova ◽

Mark Wachowiak

Keyword(s):

Graphics Processing Unit ◽

Parallel Implementation ◽

Processing Unit ◽

Independent Components Analysis ◽

Independent Components ◽

Components Analysis ◽

Graphics Processing

Download Full-text

Parallel Implementation of Density Peaks Clustering Algorithm Based on Spark

Procedia Computer Science ◽

10.1016/j.procs.2017.03.138 ◽

2017 ◽

Vol 107 ◽

pp. 442-447 ◽

Cited By ~ 5

Author(s):

Rui Liu ◽

Xiaoge Li ◽

Liping Du ◽

Shuting Zhi ◽

Mian Wei

Keyword(s):

Clustering Algorithm ◽

Parallel Implementation ◽

Density Peaks ◽

Density Peaks Clustering

Download Full-text

Ray-based modeling and imaging in viscoelastic media using graphics processing units

Geophysics ◽

10.1190/geo2018-0510.1 ◽

2019 ◽

Vol 84 (5) ◽

pp. S425-S436

Author(s):

Martin Sarajaervi ◽

Henk Keers

Keyword(s):

Seismic Data ◽

Graphics Processing Units ◽

Graphics Processing Unit ◽

Parallel Implementation ◽

Processing Unit ◽

Central Processing ◽

Imaging Results ◽

Viscoelastic Modeling ◽

Graphics Processing ◽

Complex Valued

In seismic data processing, the amplitude loss caused by attenuation should be taken into account. The basis for this is provided by a 3D attenuation model described by the quality factor [Formula: see text], which is used in viscoelastic modeling and imaging. We have accomplished viscoelastic modeling and imaging using ray theory and the ray-Born approximation. This makes it possible to take [Formula: see text] into account using complex-valued and frequency-dependent traveltimes. We have developed a unified parallel implementation for modeling and imaging in the frequency domain and carried out the numerical integration on a graphics processing unit. A central part of the implementation is an efficient technique for computing large integrals. We applied the integration method to the 3D SEG/EAGE overthrust model to generate synthetic seismograms and imaging results. The attenuation effects are accurately modeled in the seismograms and compensated for in the imaging algorithm. The results indicate a significant improvement in computational efficiency compared to a parallel central processing unit baseline.

Download Full-text

HARNESSING THE POWER OF IDLE GPUS FOR ACCELERATION OF BIOLOGICAL SEQUENCE ALIGNMENT

Parallel Processing Letters ◽

10.1142/s0129626409000390 ◽

2009 ◽

Vol 19 (04) ◽

pp. 513-533 ◽

Cited By ~ 7

Author(s):

FUMIHIKO INO ◽

YUKI KOTANI ◽

YUMA MUNEKAWA ◽

KENICHI HAGIHARA

Keyword(s):

Sequence Alignment ◽

Graphics Processing Unit ◽

Parallel Implementation ◽

Processing Unit ◽

Compute Unified Device Architecture ◽

Grid System ◽

Biological Sequence ◽

Device Architecture ◽

Linear Speedup ◽

Graphics Processing

This paper presents a parallel system capable of accelerating biological sequence alignment on the graphics processing unit (GPU) grid. The GPU grid in this paper is a desktop grid system that utilizes idle GPUs and CPUs in the office and home. Our parallel implementation employs a master-worker paradigm to accelerate an OpenGL-based algorithm that runs on a single GPU. We integrate this implementation into a screensaver-based grid system that detects idle resources on which the alignment code can run. We also show some experimental results comparing our implementation with three different implementations running on a single GPU, a single CPU, or multiple CPUs. As a result, we find that a single non-dedicated GPU can provide us almost the same throughput as two dedicated CPUs in our laboratory environment, where GPU-equipped machines are ordinarily used to develop GPU applications. In a dedicated environment, the GPU-accelerated code achieves five times higher throughput than the CPU-based code. Furthermore, a linear speedup of 30.7X is observed on a 32-node cluster of dedicated GPUs. We also implement a compute unified device architecture (CUDA) based algorithm to demonstrate further acceleration.

Download Full-text

Parallel Weighting K-means Clustering Algorithm Based on Graphics Processing Unit

Journal of Information and Computational Science ◽

10.12733/jics20106927 ◽

2015 ◽

Vol 12 (18) ◽

pp. 7031-7040 ◽

Cited By ~ 2

Author(s):

Xiaohui Huang

Keyword(s):

Clustering Algorithm ◽

Graphics Processing Unit ◽

Processing Unit ◽

Graphics Processing

Download Full-text