unstructured finite volume method
Recently Published Documents


TOTAL DOCUMENTS

55
(FIVE YEARS 7)

H-INDEX

11
(FIVE YEARS 1)

2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Yue Weng ◽  
Xi Zhang ◽  
Xiaohu Guo ◽  
Xianwei Zhang ◽  
Yutong Lu ◽  
...  

AbstractIn unstructured finite volume method, loop on different mesh components such as cells, faces, nodes, etc is used widely for the traversal of data. Mesh loop results in direct or indirect data access that affects data locality significantly. By loop on mesh, many threads accessing the same data lead to data dependence. Both data locality and data dependence play an important part in the performance of GPU simulations. For optimizing a GPU-accelerated unstructured finite volume Computational Fluid Dynamics (CFD) program, the performance of hot spots under different loops on cells, faces, and nodes is evaluated on Nvidia Tesla V100 and K80. Numerical tests under different mesh scales show that the effects of mesh loop modes are different on data locality and data dependence. Specifically, face loop makes the best data locality, so long as access to face data exists in kernels. Cell loop brings the smallest overheads due to non-coalescing data access, when both cell and node data are used in computing without face data. Cell loop owns the best performance in the condition that only indirect access of cell data exists in kernels. Atomic operations reduced the performance of kernels largely in K80, which is not obvious on V100. With the suitable mesh loop mode in all kernels, the overall performance of GPU simulations can be increased by 15%-20%. Finally, the program on a single GPU V100 can achieve maximum 21.7 and average 14.1 speed up compared with 28 MPI tasks on two Intel CPUs Xeon Gold 6132.


2021 ◽  
Author(s):  
Yue Weng ◽  
Xi Zhang ◽  
Xiaohu Guo ◽  
Xianwei Zhang ◽  
Yutong Lu ◽  
...  

Abstract In unstructured finite volume method, loop on different mesh components such as cells, faces, nodes, etc is used widely for the traversal of data. Mesh loop results in direct or indirect data access that affects data locality significantly. By loop on mesh, many threads accessing the same data lead to data dependence. Both data locality and data dependence play an important part in the performance of GPU simulations. For optimizing a GPU-accelerated unstructured finite volume Computational Fluid Dynamics (CFD) program, the performance of hot spots under different loops on cells, faces, and nodes is evaluated on Nvidia Tesla V100 and K80. Numerical tests under different mesh scales show that the effects of mesh loop modes are different on data locality and data dependence. Specifically, face loop makes the best data locality, so long as access to face data exists in kernels. Cell loop brings the smallest overheads due to non-coalescing data access, when both cell and node data are used in computing without face data. Cell loop owns the best performance in the condition that only indirect access of cell data exists in kernels. Atomic operations reduced the performance of kernels largely in K80, which is not obvious on V100. With the suitable mesh loop mode in all kernels, the overall performance of GPU simulations can be increased by 15%-20%. Finally, the program on a single GPU V100 can achieve 4.8 speed up comparing with 28 MPI tasks on two Intel CPUs Xeon Gold 6132.


2016 ◽  
Vol 13 (05) ◽  
pp. 1650022
Author(s):  
Lei Fu ◽  
Shuai Zhang ◽  
Yao Zheng

A compact high-order scheme has been successfully proposed and verified in this paper. In this scheme, the traditional gradient reconstruction was replaced with a compact scheme. There were no needs to modify the process and algorithms of unstructured FVM including boundary conditions, flux technique, limiter functions and so on. Both memory and computation loads with the new scheme were not increased than the traditional one. Additionally, we modified Venkatakrishnan limiter to suppress numerical oscillation. The proposed compact scheme and modified Venkatakrishnan limiter have been verified with numerical experiments on benchmark problems. Numerical results showed good agreement with those obtained by other methods.


Sign in / Sign up

Export Citation Format

Share Document