scholarly journals Tiling Optimizations for Stencil Computations Using Rewrite Rules in L ift

2020 ◽  
Vol 16 (4) ◽  
pp. 1-25
Author(s):  
Larisa Stoltzfus ◽  
Bastian Hagedorn ◽  
Michel Steuwer ◽  
Sergei Gorlatch ◽  
Christophe Dubach
2016 ◽  
Vol 51 (6) ◽  
pp. 711-726 ◽  
Author(s):  
Shoaib Kamil ◽  
Alvin Cheung ◽  
Shachar Itzhaky ◽  
Armando Solar-Lezama
Keyword(s):  

SIAM Review ◽  
2009 ◽  
Vol 51 (1) ◽  
pp. 129-159 ◽  
Author(s):  
Kaushik Datta ◽  
Shoaib Kamil ◽  
Samuel Williams ◽  
Leonid Oliker ◽  
John Shalf ◽  
...  

2015 ◽  
Vol 2 (1) ◽  
pp. 1-33 ◽  
Author(s):  
Adam Hammouda ◽  
Andrew R. Siegel ◽  
Stephen F. Siegel

Electronics ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1275
Author(s):  
Changdao Du ◽  
Yoshiki Yamaguchi

Due to performance and energy requirements, FPGA-based accelerators have become a promising solution for high-performance computations. Meanwhile, with the help of high-level synthesis (HLS) compilers, FPGA can be programmed using common programming languages such as C, C++, or OpenCL, thereby improving design efficiency and portability. Stencil computations are significant kernels in various scientific applications. In this paper, we introduce an architecture design for implementing stencil kernels on state-of-the-art FPGA with high bandwidth memory (HBM). Traditional FPGAs are usually equipped with external memory, e.g., DDR3 or DDR4, which limits the design space exploration in the spatial domain of stencil kernels. Therefore, many previous studies mainly relied on exploiting parallelism in the temporal domain to eliminate the bandwidth limitations. In our approach, we scale-up the design performance by considering both the spatial and temporal parallelism of the stencil kernel equally. We also discuss the design portability among different HLS compilers. We use typical stencil kernels to evaluate our design on a Xilinx U280 FPGA board and compare the results with other existing studies. By adopting our method, developers can take broad parallelization strategies based on specific FPGA resources to improve performance.


2007 ◽  
Vol 4 (2) ◽  
pp. 2-26
Author(s):  
Gernot Gebhard ◽  
Philipp Lucas

Retargeting a compiler?s back end to a new architecture is a time-consuming process. This becomes an evident problem in the area of programmable graphics hardware (graphics processing units, GPUs) or embedded processors, where architectural changes are faster than elsewhere. We propose the object-oriented rewrite system OORS to overcome this problem. Using the OORS language, a compiler developer can express the code generation and optimization phase in terms of cost-annotated rewrite rules supporting complex non-linearmatching and replacing patterns. Retargetability is achieved by organizing rules into profiles, one for each supported target architecture. Featuring a rule and profile inheritance mechanism, OORS makes the reuse of existing specifications possible. This is an improvement regarding traditional approaches. Altogether OORS increases the maintainability of the compiler?s back end and thus both decreases the complexity and reduces the effort of the retargeting process. To show the potential of this approach, we have implemented a code generation and a code optimization pattern matcher supporting different target architectures using the OORS language and introduced them in a compiler of a programming language for CPUs and GPUs.


Sign in / Sign up

Export Citation Format

Share Document