stencil computation
Recently Published Documents


TOTAL DOCUMENTS

66
(FIVE YEARS 17)

H-INDEX

10
(FIVE YEARS 4)

Electronics ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 38
Author(s):  
Huayou Su ◽  
Kaifang Zhang ◽  
Songzhu Mei

Stencil computation optimizations have been investigated quite a lot, and various approaches have been proposed. Loop transformation is a vital kind of optimization in modern production compilers and has proved successful employment within compilers. In this paper, we combine the two aspects to study the potential benefits some common transformation recipes may have for stencils. The recipes consist of loop unrolling, loop fusion, address precalculation, redundancy elimination, instruction reordering, load balance, and a forward and backward update algorithm named semi-stencil. Experimental evaluations of diverse stencil kernels, including 1D, 2D, and 3D computation patterns, on two typical ARM and Intel platforms, demonstrate the respective effects of the transformation recipes. An average speedup of 1.65× is obtained, and the best is 1.88× for the single transformation recipes we analyze. The compound recipes demonstrate a maximum speedup of 1.92×.


2021 ◽  
Author(s):  
Qingxiao Sun ◽  
Yi Liu ◽  
Hailong Yang ◽  
Zhonghui Jiang ◽  
Xiaoyan Liu ◽  
...  

2021 ◽  
Author(s):  
Mingzhen Li ◽  
Yi Liu ◽  
Hailong Yang ◽  
Yongmin Hu ◽  
Qingxiao Sun ◽  
...  

2020 ◽  
Vol E103.D (12) ◽  
pp. 2421-2434
Author(s):  
Jingcheng SHEN ◽  
Fumihiko INO ◽  
Albert FARRÉS ◽  
Mauricio HANZICH
Keyword(s):  

2020 ◽  
Vol 46 (1) ◽  
pp. 1-28
Author(s):  
Fabio Luporini ◽  
Mathias Louboutin ◽  
Michael Lange ◽  
Navjot Kukreja ◽  
Philipp Witte ◽  
...  

Author(s):  
Atsushi Hori ◽  
Kazumi Yoshinaga ◽  
Thomas Herault ◽  
Aurélien Bouteiller ◽  
George Bosilca ◽  
...  

With the increasing fault rate on high-end supercomputers, the topic of fault tolerance has been gathering attention. To cope with this situation, various fault-tolerance techniques are under investigation; these include user-level, algorithm-based fault-tolerance techniques and parallel execution environments that enable jobs to continue following node failure. Even with these techniques, some programs with static load balancing, such as stencil computation, may underperform after a failure recovery. Even when spare nodes are present, they are not always substituted for failed nodes in an effective way. This article considers the questions of how spare nodes should be allocated, how to substitute them for faulty nodes, and how much the communication performance is affected by such a substitution. The third question stems from the modification of the rank mapping by node substitutions, which can incur additional message collisions. In a stencil computation, rank mapping is done in a straightforward way on a Cartesian network without incurring any message collisions. However, once a substitution has occurred, the optimal node-rank mapping may be destroyed. Therefore, these questions must be answered in a way that minimizes the degradation of communication performance. In this article, several spare node allocation and failed node substitution methods will be proposed, analyzed, and compared in terms of communication performance following the substitution. The proposed substitution methods are named sliding methods. The sliding methods are analyzed by using our developed simulation program and evaluated by using the K computer, Blue Gene/Q (BG/Q), and TSUBAME 2.5. It will be shown that when failures occur, the stencil communication performance on the K and BG/Q can be slowed around 10 times depending on the number of node failures. The barrier performance on the K can be cut in half. On BG/Q, barrier performance can be slowed by a factor of 10. Further, it will also be shown that almost no such communication performance degradation can be seen on TSUBAME 2.5. This is because TSUBAME 2.5 has an Infiniband network connected with a FatTree topology, while the K computer and BG/Q have dedicated Cartesian networks. Thus, the communication performance degradation depends on network characteristics.


Sign in / Sign up

Export Citation Format

Share Document