scholarly journals Use of GPU Computing for Uncertainty Quantification in Computational Mechanics: A Case Study

2011 ◽  
Vol 19 (4) ◽  
pp. 199-212 ◽  
Author(s):  
Gaurav ◽  
Steven F. Wojtkiewicz

Graphics processing units (GPUs) are rapidly emerging as a more economical and highly competitive alternative to CPU-based parallel computing. As the degree of software control of GPUs has increased, many researchers have explored their use in non-gaming applications. Recent studies have shown that GPUs consistently outperform their best corresponding CPU-based parallel computing alternatives in single-instruction multiple-data (SIMD) strategies. This study explores the use of GPUs for uncertainty quantification in computational mechanics. Five types of analysis procedures that are frequently utilized for uncertainty quantification of mechanical and dynamical systems have been considered and their GPU implementations have been developed. The numerical examples presented in this study show that considerable gains in computational efficiency can be obtained for these procedures. It is expected that the GPU implementations presented in this study will serve as initial bases for further developments in the use of GPUs in the field of uncertainty quantification and will (i) aid the understanding of the performance constraints on the relevant GPU kernels and (ii) provide some guidance regarding the computational and the data structures to be utilized in these novel GPU implementations.

2013 ◽  
Vol 2013 ◽  
pp. 1-15 ◽  
Author(s):  
Carlos Couder-Castañeda ◽  
Carlos Ortiz-Alemán ◽  
Mauricio Gabriel Orozco-del-Castillo ◽  
Mauricio Nava-Flores

An implementation with the CUDA technology in a single and in several graphics processing units (GPUs) is presented for the calculation of the forward modeling of gravitational fields from a tridimensional volumetric ensemble composed by unitary prisms of constant density. We compared the performance results obtained with the GPUs against a previous version coded in OpenMP with MPI, and we analyzed the results on both platforms. Today, the use of GPUs represents a breakthrough in parallel computing, which has led to the development of several applications with various applications. Nevertheless, in some applications the decomposition of the tasks is not trivial, as can be appreciated in this paper. Unlike a trivial decomposition of the domain, we proposed to decompose the problem by sets of prisms and use different memory spaces per processing CUDA core, avoiding the performance decay as a result of the constant calls to kernels functions which would be needed in a parallelization by observations points. The design and implementation created are the main contributions of this work, because the parallelization scheme implemented is not trivial. The performance results obtained are comparable to those of a small processing cluster.


Author(s):  
Jianhua Li ◽  
Jingyuan Chen ◽  
Yan Wang ◽  
Jianhua Huang

The parallelization of silicon anisotropic etching simulation with the cellular automata (CA) model on graphics processing units (GPUs) is challenging, because the numbers of computational tasks in etching simulation dynamically change and the existing parallel CA mechanisms do not fit in GPU computation well. In this paper, an improved CA model, called clustered cell model, is proposed for GPU-based etching simulation. The model consists of clustered cells, each of which manages a scalable number of atoms. In this model, only the etching and update of states for the atoms on the etching surface and their unexposed neighbors are performed at each CA time step, whereas the clustered cells are reclassified in a longer time step. With this model, a crystal cell parallelization method is given, where clustered cells are allocated to threads on GPUs in the simulation. With the optimizations from the spatial and temporal aspects as well as a proper granularity, this method provides a faster process simulation. The proposed simulation method is implemented with the Compute Unified Device Architecture (CUDA) application programming interface. Several computational experiments are taken to analyze the efficiency of the method.


2016 ◽  
Author(s):  
Pedro D. Bello-Maldonado ◽  
Ricardo López ◽  
Colleen Rogers ◽  
Yuanwei Jin ◽  
Enyue Lu

IEEE Access ◽  
2018 ◽  
Vol 6 ◽  
pp. 21152-21163 ◽  
Author(s):  
Rafael Cisneros-Magana ◽  
Aurelio Medina ◽  
Venkata Dinavahi ◽  
Antonio Ramos-Paz

2020 ◽  
Vol 2020 ◽  
pp. 1-15
Author(s):  
Jianqi Lai ◽  
Hang Yu ◽  
Zhengyu Tian ◽  
Hua Li

Graphics processing units (GPUs) have a strong floating-point capability and a high memory bandwidth in data parallelism and have been widely used in high-performance computing (HPC). Compute unified device architecture (CUDA) is used as a parallel computing platform and programming model for the GPU to reduce the complexity of programming. The programmable GPUs are becoming popular in computational fluid dynamics (CFD) applications. In this work, we propose a hybrid parallel algorithm of the message passing interface and CUDA for CFD applications on multi-GPU HPC clusters. The AUSM + UP upwind scheme and the three-step Runge–Kutta method are used for spatial discretization and time discretization, respectively. The turbulent solution is solved by the K−ω SST two-equation model. The CPU only manages the execution of the GPU and communication, and the GPU is responsible for data processing. Parallel execution and memory access optimizations are used to optimize the GPU-based CFD codes. We propose a nonblocking communication method to fully overlap GPU computing, CPU_CPU communication, and CPU_GPU data transfer by creating two CUDA streams. Furthermore, the one-dimensional domain decomposition method is used to balance the workload among GPUs. Finally, we evaluate the hybrid parallel algorithm with the compressible turbulent flow over a flat plate. The performance of a single GPU implementation and the scalability of multi-GPU clusters are discussed. Performance measurements show that multi-GPU parallelization can achieve a speedup of more than 36 times with respect to CPU-based parallel computing, and the parallel algorithm has good scalability.


Author(s):  
Juan Gómez-Luna ◽  
José María González-Linares ◽  
José Ignacio Benavides ◽  
Emilio L. Zapata ◽  
Nicolás Guil

2020 ◽  
Author(s):  
Ryan N Gutenkunst

Extracting insight from population genetic data often demands computationally intensive modeling. dadi is a popular program for fitting models of demographic history and natural selection to such data. Here, I show that running dadi on a Graphics Processing Unit (GPU) can speed computation by orders of magnitude compared to the CPU implementation, with minimal user burden. This speed increase enables the analysis of more complex models, which motivated the extension of dadi to four- and five-population models. Remarkably, dadi performs almost as well on inexpensive consumer-grade GPUs as on expensive server-grade GPUs. GPU computing thus offers large and accessible benefits to the community of dadi users. This functionality is available in dadi version 2.1.0.


Sign in / Sign up

Export Citation Format

Share Document