CityFFD – City fast fluid dynamics for urban microclimate simulations on graphics processing units

Review Paper ◽

Fluid Flows ◽

New Paradigm ◽

The Past ◽

Challenges And Opportunities ◽

A new paradigm for computing fluid flows is the use of Graphics Processing Units (GPU), which have recently become very powerful and convenient to use. In the past three years, we have implemented five different fluid flow algorithms on GPUs and have obtained significant speed-ups over a single CPU. Typically, it is possible to achieve a factor of 50–100 over a single CPU. In this review paper, we describe our experiences on the various algorithms developed and the speeds achieved.

2012 Freeman Scholar Lecture: Computational Fluid Dynamics on Graphics Processing Units

Journal of Fluids Engineering ◽

10.1115/1.4023858 ◽

2013 ◽

Vol 135 (6) ◽

Cited By ~ 18

Author(s):

S. P. Vanka

Keyword(s):

Fluid Dynamics ◽

Computational Fluid Dynamics ◽

Stokes Equations ◽

Multicore Processors ◽

Fluid Flows ◽

Data Access ◽

Central Processing ◽

Data Parallel ◽

This paper discusses the various issues of using graphics processing units (GPU) for computing fluid flows. GPUs, used primarily for processing graphics functions in a computer, are massively parallel multicore processors, which can also perform scientific computations in a data parallel mode. In the past ten years, GPUs have become quite powerful and have challenged the central processing units (CPUs) in their price and performance characteristics. However, in order to fully benefit from the GPUs' performance, the numerical algorithms must be made data parallel and converge rapidly. In addition, the hardware features of the GPUs require that the memory access be managed carefully in order to not suffer from the high latency. Fully explicit algorithms for Euler and Navier–Stokes equations and the lattice Boltzmann method for mesoscopic flows have been widely incorporated on the GPUs, with significant speed-up over a scalar algorithm. However, more complex algorithms with implicit formulations and unstructured grids require innovative thinking in data access and management. This article reviews the literature on linear solvers and computational fluid dynamics (CFD) algorithms on GPUs, including the author's own research on simulations of fluid flows using GPUs.

Massively parallel simulations of relativistic fluid dynamics on graphics processing units with CUDA

Computer Physics Communications ◽

10.1016/j.cpc.2017.01.015 ◽

2018 ◽

Vol 225 ◽

pp. 92-113 ◽

Cited By ~ 14

Author(s):

Dennis Bazow ◽

Ulrich Heinz ◽

Michael Strickland

Keyword(s):

Fluid Dynamics ◽

Massively Parallel ◽

Relativistic Fluid Dynamics ◽

Parallel Simulations ◽

Relativistic Fluid ◽

CU++: an object oriented framework for computational fluid dynamics applications using graphics processing units

The Journal of Supercomputing ◽

10.1007/s11227-013-0985-9 ◽

2013 ◽

Vol 67 (1) ◽

pp. 47-68 ◽

Cited By ~ 6

Author(s):

Dominic D. J. Chandar ◽

Jayanarayanan Sitaraman ◽

Dimitri Mavriplis

Keyword(s):

Fluid Dynamics ◽

Computational Fluid Dynamics ◽

Object Oriented ◽

High performance computing on graphics processing units

Pollack Periodica ◽

10.1556/pollack.3.2008.2.3 ◽

2008 ◽

Vol 3 (2) ◽

pp. 27-34 ◽

Cited By ~ 2

Author(s):

Balázs Tukora ◽

Tibor Szalay

Keyword(s):

High Performance Computing ◽

High Performance ◽

Graphics Processing ◽

Performance Computing

Parallel Option Pricing with Fourier Space Time-Stepping Method on Graphics Processing Units

SSRN Electronic Journal ◽

10.2139/ssrn.1020207 ◽

2007 ◽

Cited By ~ 1

Author(s):

Vladimir Surkov

Keyword(s):

Option Pricing ◽

Space Time ◽

Fourier Space ◽

Time Stepping ◽

Improving the Efficiency and the Accuracy of 2D Gel Electrophoresis Spot Detection Using Graphics Processing Units

Current Bioinformatics ◽

10.2174/1574893612666170725141905 ◽

2018 ◽

Vol 13 (2) ◽

pp. 193-206 ◽

Cited By ~ 1

Author(s):

Marwa K. Elteir ◽

Shaheera A. Rashwan ◽

Ashraf A. Khalil

Keyword(s):

Gel Electrophoresis ◽

2D Gel Electrophoresis ◽

Spot Detection ◽

2D Gel ◽

Using graphics processing units on the cloud to accelerate and reduce processing cost of parameters estimation of seismic processing algorithm

10.22564/16cisbgf2019.221 ◽

2019 ◽

Author(s):

Nicholas Okita ◽

Tiago Coimbra ◽

José Ribeiro ◽

Martin Tygel

Keyword(s):

Parameters Estimation ◽

Processing Algorithm ◽

Seismic Processing ◽

Processing Cost ◽

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

Review of smoothed particle hydrodynamics: towards converged Lagrangian flow modelling

10.1098/rspa.2019.0801 ◽

2020 ◽

Vol 476 (2241) ◽

pp. 20190801

Author(s):

Steven J. Lind ◽

Benedict D. Rogers ◽

Peter K. Stansby

Keyword(s):

Smoothed Particle Hydrodynamics ◽

Wave Structure ◽

Free Form ◽

Mesh Free ◽

Weakly Compressible ◽

Particle Hydrodynamics ◽

Massively Parallel Computing ◽

Smoothed Particle ◽

This paper presents a review of the progress of smoothed particle hydrodynamics (SPH) towards high-order converged simulations. As a mesh-free Lagrangian method suitable for complex flows with interfaces and multiple phases, SPH has developed considerably in the past decade. While original applications were in astrophysics, early engineering applications showed the versatility and robustness of the method without emphasis on accuracy and convergence. The early method was of weakly compressible form resulting in noisy pressures due to spurious pressure waves. This was effectively removed in the incompressible (divergence-free) form which followed; since then the weakly compressible form has been advanced, reducing pressure noise. Now numerical convergence studies are standard. While the method is computationally demanding on conventional processors, it is well suited to parallel processing on massively parallel computing and graphics processing units. Applications are diverse and encompass wave–structure interaction, geophysical flows due to landslides, nuclear sludge flows, welding, gearbox flows and many others. In the state of the art, convergence is typically between the first- and second-order theoretical limits. Recent advances are improving convergence to fourth order (and higher) and these will also be outlined. This can be necessary to resolve multi-scale aspects of turbulent flow.

Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software

ACM Transactions on Mathematical Software ◽

10.1145/3441850 ◽

2021 ◽

Vol 47 (2) ◽

pp. 1-28

Author(s):

Goran Flegar ◽

Hartwig Anzt ◽

Terry Cojean ◽

Enrique S. Quintana-Ortí

Keyword(s):

Linear Algebra ◽

High Performance ◽

Numerical Algorithms ◽

Mixed Precision ◽

Before And After ◽

Memory Accesses ◽

Specialized Hardware ◽

The Individual ◽

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.