Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Graphics Processing Unit Computing

Summary In this work, the scalability of two key multiscale solvers for the pressure equation arising from incompressible flow in heterogeneous porous media, namely, the multiscale finite volume (MSFV) solver, and the restriction-smoothed basis multiscale (MsRSB) solver, are investigated on the graphics processing unit (GPU) massively parallel architecture. The robustness and scalability of both solvers are compared against their corresponding carefully optimized implementation on the shared-memory multicore architecture in a structured problem setting. Although several components in MSFV and MsRSB algorithms are directly parallelizable, their scalability on the GPU architecture depends heavily on the underlying algorithmic details and data-structure design of every step, where one needs to ensure favorable control and data flow on the GPU, while extracting enough parallel work for a massively parallel environment. In addition, the type of algorithm chosen for each step greatly influences the overall robustness of the solver. Thus, we extend the work on the parallel multiscale methods of Manea et al. (2016) to map the MSFV and MsRSB special kernels to the massively parallel GPU architecture. The scalability of our optimized parallel MSFV and MsRSB GPU implementations are demonstrated using highly heterogeneous structured 3D problems derived from the SPE10 Benchmark (Christie and Blunt 2001). Those problems range in size from millions to tens of millions of cells. For both solvers, the multicore implementations are benchmarked on a shared-memory multicore architecture consisting of two packages of Intel® Cascade Lake Xeon Gold 6246 central processing unit (CPU), whereas the GPU implementations are benchmarked on a massively parallel architecture consisting of NVIDIA Volta V100 GPUs. We compare the multicore implementations to the GPU implementations for both the setup and solution stages. Finally, we compare the parallel MsRSB scalability to the scalability of MSFV on the multicore (Manea et al. 2016) and GPU architectures. To the best of our knowledge, this is the first parallel implementation and demonstration of these versatile multiscale solvers on the GPU architecture. NOTE: This paper is published as part of the 2021 SPE Reservoir Simulation Conference Special Issue.

Download Full-text

Massively parallel implementation of cyclic LDPC codes on a general purpose graphics processing unit

2009 IEEE Workshop on Signal Processing Systems ◽

10.1109/sips.2009.5336268 ◽

2009 ◽

Cited By ~ 10

Author(s):

Hyunwoo Ji ◽

Junho Cho ◽

Wonyong Sung

Keyword(s):

Graphics Processing Unit ◽

Parallel Implementation ◽

Ldpc Codes ◽

General Purpose ◽

Massively Parallel ◽

Processing Unit ◽

Graphics Processing

Download Full-text

Massively parallel hybrid algorithm on embedded graphics processing unit for unmanned aerial vehicle path planning

International Journal of Digital Signals and Smart Systems ◽

10.1504/ijdsss.2018.10011960 ◽

2018 ◽

Vol 2 (1) ◽

pp. 68

Author(s):

Mohammed Tarbouchi ◽

Vincent Roberge

Keyword(s):

Path Planning ◽

Unmanned Aerial Vehicle ◽

Hybrid Algorithm ◽

Graphics Processing Unit ◽

Massively Parallel ◽

Processing Unit ◽

Parallel Hybrid ◽

Aerial Vehicle ◽

Vehicle Path ◽

Graphics Processing

Download Full-text

A GPU/CPU Programming Model for CFD Simulation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.712-715.2538 ◽

2013 ◽

Vol 712-715 ◽

pp. 2538-2541

Author(s):

Cao Wei ◽

Zheng Hua Wang ◽

Chuan Fu Xu

Keyword(s):

High Performance ◽

Cfd Simulation ◽

Programming Model ◽

Graphics Processing Unit ◽

Processing Unit ◽

Computational Capability ◽

Parallel Programming Model ◽

Parallel Graphics ◽

Performance Results ◽

Graphics Processing

In recent years, the highly parallel graphics processing unit (GPU) is rapidly gaining maturity as a powerful engine for high performance computer. More and more researchers try to port the computational fluid dynamics (CFD) simulations into heterogeneous computers. However, most researchers focus on exploring the computational capability of GPU, while ignore the computational capability of CPU. In order to utilize the computational capability of CPU and GPU, we propose a hybrid CUDA/OpenMP parallel programming model. And we proposed an adaptive load balancing scheme to distribute the workload among CPUs and GPUs. With this programming model, we implement a high-order CFD program on “Tianhe-1A” supercomputer system. The performance results validate the workload distribution scheme.

Download Full-text