Performances of Navier-Stokes Solver on a Hybrid CPU/GPU Computing System

The graphics processing unit (GPU) has evolved from configurable graphics processor to a powerful engine for high performance computer. In this paper, we describe the graphics pipeline of GPU, and introduce the history and evolution of GPU architecture. We also provide a summary of software environments used on GPU, from graphics APIs to non-graphics APIs. At last, we present the GPU computing in computational fluid dynamics applications, including the GPGPU computing for Navier-Stokes equations methods and the GPGPU computing for Lattice Boltzmann method.

Download Full-text

Tiled QR Decomposition and Its Optimization on CPU and GPU Computing System

2013 42nd International Conference on Parallel Processing ◽

10.1109/icpp.2013.88 ◽

2013 ◽

Cited By ~ 3

Author(s):

Dongjin Kim ◽

Kyu-Ho Park

Keyword(s):

Gpu Computing ◽

Computing System ◽

Qr Decomposition

Download Full-text

Defence Mechanism of Distributed Reflective Denial of Service (DRDOS) Attack by using Hybrid (CPU-GPU) Computing System

International Journal of Computer Trends and Technology ◽

10.14445/22312803/ijctt-v27p106 ◽

2015 ◽

Vol 27 (1) ◽

pp. 31-39

Author(s):

Gagan deep ◽

◽

Er. Meena kshi

Keyword(s):

Gpu Computing ◽

Denial Of Service ◽

Computing System ◽

Defence Mechanism

Download Full-text

CARAT-GxG: CUDA-Accelerated Regression Analysis Toolkit for Large-Scale Gene–Gene Interaction with GPU Computing System

Cancer Informatics ◽

10.4137/cin.s16349 ◽

2014 ◽

Vol 13s7 ◽

pp. CIN.S16349 ◽

Cited By ~ 2

Author(s):

Sungyoung Lee ◽

Min-Seok Kwon ◽

Taesung Park

Keyword(s):

Regression Analysis ◽

Large Scale ◽

Gpu Computing ◽

Association Studies ◽

Gene Interaction ◽

Computing System ◽

Optimization Techniques ◽

Gwas Data ◽

Genome Wide Association Studies ◽

Execution Speed

In genome-wide association studies (GWAS), regression analysis has been most commonly used to establish an association between a phenotype and genetic variants, such as single nucleotide polymorphism (SNP). However, most applications of regression analysis have been restricted to the investigation of single marker because of the large computational burden. Thus, there have been limited applications of regression analysis to multiple SNPs, including gene–gene interaction (GGI) in large-scale GWAS data. In order to overcome this limitation, we propose CARAT-GxG, a GPU computing system-oriented toolkit, for performing regression analysis with GGI using CUDA (compute unified device architecture). Compared to other methods, CARAT-GxG achieved almost 700-fold execution speed and delivered highly reliable results through our GPU-specific optimization techniques. In addition, it was possible to achieve almost-linear speed acceleration with the application of a GPU computing system, which is implemented by the TORQUE Resource Manager. We expect that CARAT-GxG will enable large-scale regression analysis with GGI for GWAS data.

Download Full-text

Direct Numerical Simulation of Turbulent Channel Flow on High-Performance GPU Computing System

Computation ◽

10.3390/computation4010013 ◽

2016 ◽

Vol 4 (1) ◽

pp. 13 ◽

Cited By ~ 1

Author(s):

Giancarlo Alfonsi ◽

Stefania Ciliberti ◽

Marco Mancini ◽

Leonardo Primavera

Keyword(s):

Numerical Simulation ◽

Direct Numerical Simulation ◽

Channel Flow ◽

High Performance ◽

Gpu Computing ◽

Turbulent Channel Flow ◽

Computing System

Download Full-text

CPU/GPU COMPUTING FOR AN IMPLICIT MULTI-BLOCK COMPRESSIBLE NAVIER-STOKES SOLVER ON HETEROGENEOUS PLATFORM

International Journal of Modern Physics Conference Series ◽

10.1142/s2010194516601630 ◽

2016 ◽

Vol 42 ◽

pp. 1660163

Author(s):

LIANG DENG ◽

HANLI BAI ◽

FANG WANG ◽

QINGXIN XU

Keyword(s):

Stokes Equations ◽

Gpu Computing ◽

Three Dimensional ◽

Navier Stokes ◽

Double Precision ◽

Single Node ◽

Navier Stokes Equations ◽

Heterogeneous Platform ◽

Redundant Data ◽

Fine Grain

CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.

Download Full-text

A Performance Study of Moving Particle Semi-Implicit Method for Incompressible Fluid Flow on GPU

International Journal of Distributed Systems and Technologies ◽

10.4018/ijdst.2020010107 ◽

2020 ◽

Vol 11 (1) ◽

pp. 83-94

Author(s):

Kirankumar V Kataraki ◽

Satyadhyan Chickerur

Keyword(s):

Graphics Processing Units ◽

Computing System ◽

Navier Stokes ◽

Performance Study ◽

Governing Equations ◽

Navier Stokes Equation ◽

Gpu Processing ◽

Moving Particle ◽

Particle Search ◽

Graphics Processing

The aim of moving particle semi-implicit (MPS) is to simulate the incompressible flow of fluids in free surface. MPS, when implemented, consumes a lot of time and thus, needs a very powerful computing system. Instead of using parallel computing system, the performance level of the MPS model can be improved by using graphics processing units (GPUs). The aim is to have a computing system that is capable of performing at high levels thereby enhancing the speed of processing the numerical computations required in MPS. The primary aim of the study is to build a GPU-accelerated MPS model using CUDA aimed at reducing the time taken to perform the search for neighboring particles. In order to increase the GPU processing speed, specific consideration is given towards the optimization of a neighboring particle search process. The numerical model of MPS is performed using the governing equations, notably the Navier-Stokes equation. The simulation model indicates that using GPU based MPS produce better performance compared to the traditional arrangement of using CPUs.

Download Full-text

BODCA: Heterogeneous CPU-GPU computing system with Bandwidth-Optimized DRAM cache design

2020 IEEE International Conference on Consumer Electronics - Asia (ICCE-Asia) ◽

10.1109/icce-asia49877.2020.9276874 ◽

2020 ◽

Author(s):

Sungji Choi ◽

Won Woo Ro

Keyword(s):

Gpu Computing ◽

Computing System ◽

Cache Design

Download Full-text