Use of GPU Computing for Uncertainty Quantification in Computational Mechanics: A Case Study

Graphics processing units (GPUs) are rapidly emerging as a more economical and highly competitive alternative to CPU-based parallel computing. As the degree of software control of GPUs has increased, many researchers have explored their use in non-gaming applications. Recent studies have shown that GPUs consistently outperform their best corresponding CPU-based parallel computing alternatives in single-instruction multiple-data (SIMD) strategies. This study explores the use of GPUs for uncertainty quantification in computational mechanics. Five types of analysis procedures that are frequently utilized for uncertainty quantification of mechanical and dynamical systems have been considered and their GPU implementations have been developed. The numerical examples presented in this study show that considerable gains in computational efficiency can be obtained for these procedures. It is expected that the GPU implementations presented in this study will serve as initial bases for further developments in the use of GPUs in the field of uncertainty quantification and will (i) aid the understanding of the performance constraints on the relevant GPU kernels and (ii) provide some guidance regarding the computational and the data structures to be utilized in these novel GPU implementations.

Download Full-text

TESLA GPUs versus MPI with OpenMP for the Forward Modeling of Gravity and Gravity Gradient of Large Prisms Ensemble

Journal of Applied Mathematics ◽

10.1155/2013/437357 ◽

2013 ◽

Vol 2013 ◽

pp. 1-15 ◽

Cited By ~ 4

Author(s):

Carlos Couder-Castañeda ◽

Carlos Ortiz-Alemán ◽

Mauricio Gabriel Orozco-del-Castillo ◽

Mauricio Nava-Flores

Keyword(s):

Parallel Computing ◽

Graphics Processing Units ◽

Forward Modeling ◽

Gravity Gradient ◽

Constant Density ◽

Gravitational Fields ◽

Design And Implementation ◽

Cuda Technology ◽

Performance Results ◽

Graphics Processing

An implementation with the CUDA technology in a single and in several graphics processing units (GPUs) is presented for the calculation of the forward modeling of gravitational fields from a tridimensional volumetric ensemble composed by unitary prisms of constant density. We compared the performance results obtained with the GPUs against a previous version coded in OpenMP with MPI, and we analyzed the results on both platforms. Today, the use of GPUs represents a breakthrough in parallel computing, which has led to the development of several applications with various applications. Nevertheless, in some applications the decomposition of the tasks is not trivial, as can be appreciated in this paper. Unlike a trivial decomposition of the domain, we proposed to decompose the problem by sets of prisms and use different memory spaces per processing CUDA core, avoiding the performance decay as a result of the constant calls to kernels functions which would be needed in a parallelization by observations points. The design and implementation created are the main contributions of this work, because the parallelization scheme implemented is not trivial. The performance results obtained are comparable to those of a small processing cluster.

Download Full-text

Clustered Cell Parallelization for GPU Computing of Silicon Anisotropic Etching Simulation

Volume 2A: 33rd Computers and Information in Engineering Conference ◽

10.1115/detc2013-12965 ◽

2013 ◽

Author(s):

Jianhua Li ◽

Jingyuan Chen ◽

Yan Wang ◽

Jianhua Huang

Keyword(s):

Graphics Processing Units ◽

Gpu Computing ◽

Cell Model ◽

Application Programming Interface ◽

Simulation Method ◽

Anisotropic Etching ◽

Time Step ◽

Ca Model ◽

Application Programming ◽

Graphics Processing

The parallelization of silicon anisotropic etching simulation with the cellular automata (CA) model on graphics processing units (GPUs) is challenging, because the numbers of computational tasks in etching simulation dynamically change and the existing parallel CA mechanisms do not fit in GPU computation well. In this paper, an improved CA model, called clustered cell model, is proposed for GPU-based etching simulation. The model consists of clustered cells, each of which manages a scalable number of atoms. In this model, only the etching and update of states for the atoms on the etching surface and their unexposed neighbors are performed at each CA time step, whereas the clustered cells are reclassified in a longer time step. With this model, a crystal cell parallelization method is given, where clustered cells are allocated to threads on GPUs in the simulation. With the optimizations from the spatial and temporal aspects as well as a proper granularity, this method provides a faster process simulation. The proposed simulation method is implemented with the Compute Unified Device Architecture (CUDA) application programming interface. Several computational experiments are taken to analyze the efficiency of the method.

Download Full-text

A parallel computing approach to viewshed analysis of large terrain data using graphics processing units

International Journal of Geographical Information Science ◽

10.1080/13658816.2012.692372 ◽

2013 ◽

Vol 27 (2) ◽

pp. 363-384 ◽

Cited By ~ 60

Author(s):

Yanli Zhao ◽

Anand Padmanabhan ◽

Shaowen Wang

Keyword(s):

Parallel Computing ◽

Graphics Processing Units ◽

Viewshed Analysis ◽

Graphics Processing ◽

Computing Approach

Download Full-text

Parallel computing for simultaneous iterative tomographic imaging by graphics processing units

10.1117/12.2223466 ◽

2016 ◽

Author(s):

Pedro D. Bello-Maldonado ◽

Ricardo López ◽

Colleen Rogers ◽

Yuanwei Jin ◽

Enyue Lu

Keyword(s):

Parallel Computing ◽

Graphics Processing Units ◽

Tomographic Imaging ◽

Graphics Processing

Download Full-text

Parallel computing with graphics processing units for high-speed Monte Carlo simulation of photon migration

Journal of Biomedical Optics ◽

10.1117/1.3041496 ◽

2008 ◽

Vol 13 (6) ◽

pp. 060504 ◽

Cited By ~ 232

Author(s):

Erik Alerstam ◽

Tomas Svensson ◽

Stefan Andersson-Engels

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Parallel Computing ◽

Graphics Processing Units ◽

High Speed ◽

Photon Migration ◽

Graphics Processing

Download Full-text

Time-Domain Power Quality State Estimation Based on Kalman Filter Using Parallel Computing on Graphics Processing Units

IEEE Access ◽

10.1109/access.2018.2823721 ◽

2018 ◽

Vol 6 ◽

pp. 21152-21163 ◽

Cited By ~ 7

Author(s):

Rafael Cisneros-Magana ◽

Aurelio Medina ◽

Venkata Dinavahi ◽

Antonio Ramos-Paz

Keyword(s):

Parallel Computing ◽

Kalman Filter ◽

State Estimation ◽

Power Quality ◽

Time Domain ◽

Graphics Processing Units ◽

Graphics Processing

Download Full-text

Hybrid MPI and CUDA Parallelization for CFD Applications on Multi-GPU HPC Clusters

Scientific Programming ◽

10.1155/2020/8862123 ◽

2020 ◽

Vol 2020 ◽

pp. 1-15

Author(s):

Jianqi Lai ◽

Hang Yu ◽

Zhengyu Tian ◽

Hua Li

Keyword(s):

Parallel Computing ◽

Parallel Algorithm ◽

Graphics Processing Units ◽

Message Passing Interface ◽

Gpu Computing ◽

Programming Model ◽

Domain Decomposition Method ◽

Parallel Execution ◽

Performance Measurements ◽

Cfd Applications

Graphics processing units (GPUs) have a strong floating-point capability and a high memory bandwidth in data parallelism and have been widely used in high-performance computing (HPC). Compute unified device architecture (CUDA) is used as a parallel computing platform and programming model for the GPU to reduce the complexity of programming. The programmable GPUs are becoming popular in computational fluid dynamics (CFD) applications. In this work, we propose a hybrid parallel algorithm of the message passing interface and CUDA for CFD applications on multi-GPU HPC clusters. The AUSM + UP upwind scheme and the three-step Runge–Kutta method are used for spatial discretization and time discretization, respectively. The turbulent solution is solved by the K−ω SST two-equation model. The CPU only manages the execution of the GPU and communication, and the GPU is responsible for data processing. Parallel execution and memory access optimizations are used to optimize the GPU-based CFD codes. We propose a nonblocking communication method to fully overlap GPU computing, CPU_CPU communication, and CPU_GPU data transfer by creating two CUDA streams. Furthermore, the one-dimensional domain decomposition method is used to balance the workload among GPUs. Finally, we evaluate the hybrid parallel algorithm with the compressible turbulent flow over a flat plate. The performance of a single GPU implementation and the scalability of multi-GPU clusters are discussed. Performance measurements show that multi-GPU parallelization can achieve a speedup of more than 36 times with respect to CPU-based parallel computing, and the parallel algorithm has good scalability.

Download Full-text

A high performance approach for parallel computing of fibre Bragg grating strain profiles using graphics processing units

International Journal of High Performance Systems Architecture ◽

10.1504/ijhpsa.2016.081743 ◽

2016 ◽

Vol 6 (4) ◽

pp. 197

Author(s):

L.H. Negri ◽

H.S. Lopes ◽

M. Muller ◽

J.L. Fabris ◽

A.S. Paterno

Keyword(s):

Parallel Computing ◽

Graphics Processing Units ◽

High Performance ◽

Bragg Grating ◽

Fibre Bragg Grating ◽

Graphics Processing

Download Full-text

Load Balancing versus Occupancy Maximization on Graphics Processing Units: The Generalized Hough Transform as a Case Study

The International Journal of High Performance Computing Applications ◽

10.1177/1094342010383998 ◽

2010 ◽

Vol 25 (2) ◽

pp. 205-222 ◽

Cited By ~ 5

Author(s):

Juan Gómez-Luna ◽

José María González-Linares ◽

José Ignacio Benavides ◽

Emilio L. Zapata ◽

Nicolás Guil

Keyword(s):

Load Balancing ◽

Hough Transform ◽

Graphics Processing Units ◽

Generalized Hough Transform ◽

Graphics Processing

Download Full-text

dadi.CUDA: Accelerating population genetic inference with Graphics Processing Units

10.1101/2020.07.30.229336 ◽

2020 ◽

Author(s):

Ryan N Gutenkunst

Keyword(s):

Graphics Processing Units ◽

Population Genetic ◽

Gpu Computing ◽

Graphics Processing Unit ◽

Demographic History ◽

Processing Unit ◽

Speed Increase ◽

Population Genetic Inference ◽

Computationally Intensive ◽

Graphics Processing

Extracting insight from population genetic data often demands computationally intensive modeling. dadi is a popular program for fitting models of demographic history and natural selection to such data. Here, I show that running dadi on a Graphics Processing Unit (GPU) can speed computation by orders of magnitude compared to the CPU implementation, with minimal user burden. This speed increase enables the analysis of more complex models, which motivated the extension of dadi to four- and five-population models. Remarkably, dadi performs almost as well on inexpensive consumer-grade GPUs as on expensive server-grade GPUs. GPU computing thus offers large and accessible benefits to the community of dadi users. This functionality is available in dadi version 2.1.0.

Download Full-text