scholarly journals Effects of mesh loop modes on performance of unstructured finite volume GPU simulations

2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Yue Weng ◽  
Xi Zhang ◽  
Xiaohu Guo ◽  
Xianwei Zhang ◽  
Yutong Lu ◽  
...  

AbstractIn unstructured finite volume method, loop on different mesh components such as cells, faces, nodes, etc is used widely for the traversal of data. Mesh loop results in direct or indirect data access that affects data locality significantly. By loop on mesh, many threads accessing the same data lead to data dependence. Both data locality and data dependence play an important part in the performance of GPU simulations. For optimizing a GPU-accelerated unstructured finite volume Computational Fluid Dynamics (CFD) program, the performance of hot spots under different loops on cells, faces, and nodes is evaluated on Nvidia Tesla V100 and K80. Numerical tests under different mesh scales show that the effects of mesh loop modes are different on data locality and data dependence. Specifically, face loop makes the best data locality, so long as access to face data exists in kernels. Cell loop brings the smallest overheads due to non-coalescing data access, when both cell and node data are used in computing without face data. Cell loop owns the best performance in the condition that only indirect access of cell data exists in kernels. Atomic operations reduced the performance of kernels largely in K80, which is not obvious on V100. With the suitable mesh loop mode in all kernels, the overall performance of GPU simulations can be increased by 15%-20%. Finally, the program on a single GPU V100 can achieve maximum 21.7 and average 14.1 speed up compared with 28 MPI tasks on two Intel CPUs Xeon Gold 6132.

2021 ◽  
Author(s):  
Yue Weng ◽  
Xi Zhang ◽  
Xiaohu Guo ◽  
Xianwei Zhang ◽  
Yutong Lu ◽  
...  

Abstract In unstructured finite volume method, loop on different mesh components such as cells, faces, nodes, etc is used widely for the traversal of data. Mesh loop results in direct or indirect data access that affects data locality significantly. By loop on mesh, many threads accessing the same data lead to data dependence. Both data locality and data dependence play an important part in the performance of GPU simulations. For optimizing a GPU-accelerated unstructured finite volume Computational Fluid Dynamics (CFD) program, the performance of hot spots under different loops on cells, faces, and nodes is evaluated on Nvidia Tesla V100 and K80. Numerical tests under different mesh scales show that the effects of mesh loop modes are different on data locality and data dependence. Specifically, face loop makes the best data locality, so long as access to face data exists in kernels. Cell loop brings the smallest overheads due to non-coalescing data access, when both cell and node data are used in computing without face data. Cell loop owns the best performance in the condition that only indirect access of cell data exists in kernels. Atomic operations reduced the performance of kernels largely in K80, which is not obvious on V100. With the suitable mesh loop mode in all kernels, the overall performance of GPU simulations can be increased by 15%-20%. Finally, the program on a single GPU V100 can achieve 4.8 speed up comparing with 28 MPI tasks on two Intel CPUs Xeon Gold 6132.


2015 ◽  
Vol 31 (3) ◽  
pp. 937-950 ◽  
Author(s):  
Perry L. Johnson ◽  
Jared M. Pent ◽  
Hrvoje Jasak ◽  
J. Enrique Portillo

Author(s):  
Dong Jin Kang ◽  
Sang Soo Bae ◽  
Jae Won Kim

A Navier-Stokes simulation of the MIT flapping foil experiment is presented. The MIT experiment was designed to provide a good quality database for unsteady boundary layer flows. The unsteady boundary layer around a hydrofoil was generated by flapping two airfoils upstream of the hydrofoil. Present Navier-Stokes simulation is carried out on the entire experimental domain, including the flapping airfoils as well as the downstream fixed hydrofoil. Present Navier-Stokes code uses an unstructured finite volume method based on the SIMPLE algorithm. It uses QUICK scheme for the convective terms and the second order Euler backward differencing for time derivatives to keep second order accuracy spatially and temporally. All other spatial derivatives are approximated by using central difference scheme. All comparisons of present time averaged and unsteady solutions with the corresponding experimental data are satisfactory: all unsteady solutions are compared in terms of time mean and first harmonic. The first harmonic of the velocity shows a peak inside the boundary layer along the surfaces of the hydrofoil and has a local minimum near the edge of the boundary layer. The local minimum becomes manifest as the boundary layer grows. The unsteadiness in the free stream is transferred inside the boundary layer when an unsteady vortex impinges on the surface. The entrained unsteadiness travels with a local velocity slower than that in the free stream. This causes phase lag of the first harmonic between the free stream and the boundary layer and local minimum of the first harmonic near the edge of the boundary layer.


2000 ◽  
Author(s):  
J. Y. Murthy ◽  
S. R. Mathur

Abstract In this paper, calculations of mixed-mode heat transfer in beds of randomly-packed cylinders are presented. An unstructured finite volume method is employed. Random packing is addressed by meshing a periodic module, and creating the bed by stacking and random lateral translation of modules. The ability of the finite volume scheme to employ arbitrary polyhedra is exploited in addressing the resulting non-conformal interfaces. Conduction and radiation are considered, but convection is ignored. Results are presented for conducting and semi-transparent cylinders for a range of fluid and solid conductivities and solid refractive indices and establish the viability and versatility of the method.


Sign in / Sign up

Export Citation Format

Share Document