Accelerated hydrologic modeling: ParFlow GPU implementation

Author(s):  
Jaro Hokkanen ◽  
Jiri Kraus ◽  
Andreas Herten ◽  
Dirk Pleiter ◽  
Stefan Kollet

<p>  ParFlow is known as a numerical model that simulates the hydrologic cycle from the bedrock to the top of the plant canopy. The original codebase provides an embedded Domain-Specific Language (eDSL) for generic numerical implementations with support for supercomputer environments (distributed memory parallelism), on top of which the hydrologic numerical core has been built.<br>  In ParFlow, the newly developed optional GPU acceleration is built directly into the eDSL headers such that, ideally, parallelizing all loops in a single source file requires only a new header file. This is possible because the eDSL API is used for looping, allocating memory, and accessing data structures. The decision to embed GPU acceleration directly into the eDSL layer resulted in a highly productive and minimally invasive implementation.<br>  This eDSL implementation is based on C host language and the support for GPU acceleration is based on CUDA C++. CUDA C++ has been under intense development during the past years, and features such as Unified Memory and host-device lambdas were extensively leveraged in the ParFlow implementation in order to maximize productivity. Efficient intra- and inter-node data transfer between GPUs rests on a CUDA-aware MPI library and application side GPU-based data packing routines.<br>  The current, moderately optimized ParFlow GPU version runs a representative model up to 20 times faster on a node with 2 Intel Skylake processors and 4 NVIDIA V100 GPUs compared to the original version of ParFlow, where the GPUs are not used. The eDSL approach and ParFlow GPU implementation may serve as a blueprint to tackle the challenges of heterogeneous HPC hardware architectures on the path to exascale.</p>

2017 ◽  
Author(s):  
Oliver Fuhrer ◽  
Tarun Chadha ◽  
Torsten Hoefler ◽  
Grzegorz Kwasniewski ◽  
Xavier Lapillonne ◽  
...  

Abstract. The best hope for reducing long-standing global climate model biases, is through increasing the resolution to the kilometer scale. Here we present results from an ultra-high resolution non-hydrostatic climate model for a near-global setup running on the full Piz Daint supercomputer on 4888 GPUs. The dynamical core of the model has been completely rewritten using a domain-specific language (DSL) for performance portability across different hardware architectures. Physical parameterizations and diagnostics have been ported using compiler directives. To our knowledge this represents the first complete atmospheric model being run entirely on accelerators at this scale. At a grid spacing of 930 m (1.9 km), we achieve a simulation throughput of 0.043 (0.23) simulated years per day and an energy consumption of 596 MWh per simulated year. Furthermore, we propose the new memory usage efficiency metric that considers how efficiently the memory bandwidth – the dominant bottleneck of climate codes – is being used.


Author(s):  
Jessica Ray ◽  
Ajav Brahmakshatriya ◽  
Richard Wang ◽  
Shoaib Kamil ◽  
Albert Reuther ◽  
...  

2021 ◽  
Vol 205 ◽  
pp. 102610
Author(s):  
Davide Ancona ◽  
Luca Franceschini ◽  
Angelo Ferrando ◽  
Viviana Mascardi

2021 ◽  
pp. 102642
Author(s):  
Xiomarah Guzmán-Guzmán ◽  
Edward Rolando Núñez-Valdez ◽  
Raysa Vásquez-Reynoso ◽  
Angel Asencio ◽  
Vicente García-Díaz

Sign in / Sign up

Export Citation Format

Share Document