Tiling Optimizations for Stencil Computations Using Rewrite Rules in L
            ift

Due to performance and energy requirements, FPGA-based accelerators have become a promising solution for high-performance computations. Meanwhile, with the help of high-level synthesis (HLS) compilers, FPGA can be programmed using common programming languages such as C, C++, or OpenCL, thereby improving design efficiency and portability. Stencil computations are significant kernels in various scientific applications. In this paper, we introduce an architecture design for implementing stencil kernels on state-of-the-art FPGA with high bandwidth memory (HBM). Traditional FPGAs are usually equipped with external memory, e.g., DDR3 or DDR4, which limits the design space exploration in the spatial domain of stencil kernels. Therefore, many previous studies mainly relied on exploiting parallelism in the temporal domain to eliminate the bandwidth limitations. In our approach, we scale-up the design performance by considering both the spatial and temporal parallelism of the stencil kernel equally. We also discuss the design portability among different HLS compilers. We use typical stencil kernels to evaluate our design on a Xilinx U280 FPGA board and compare the results with other existing studies. By adopting our method, developers can take broad parallelization strategies based on specific FPGA resources to improve performance.

Download Full-text

MiniGhost : a miniapp for exploring boundary exchange strategies using stencil computations in scientific parallel computing.

10.2172/1039405 ◽

2012 ◽

Cited By ~ 13

Author(s):

Richard Frederick Barrett ◽

Michael Allen Heroux ◽

Courtenay Thomas Vaughan

Keyword(s):

Parallel Computing ◽

Stencil Computations

Download Full-text

OORS: An object-oriented rewrite system

Computer Science and Information Systems ◽

10.2298/csis0702002g ◽

2007 ◽

Vol 4 (2) ◽

pp. 2-26

Author(s):

Gernot Gebhard ◽

Philipp Lucas

Keyword(s):

Code Generation ◽

Graphics Processing Units ◽

Object Oriented ◽

Graphics Hardware ◽

Code Optimization ◽

Target Architecture ◽

Rewrite Rules ◽

Graphics Processing ◽

Traditional Approaches ◽

Rewrite System

Retargeting a compiler?s back end to a new architecture is a time-consuming process. This becomes an evident problem in the area of programmable graphics hardware (graphics processing units, GPUs) or embedded processors, where architectural changes are faster than elsewhere. We propose the object-oriented rewrite system OORS to overcome this problem. Using the OORS language, a compiler developer can express the code generation and optimization phase in terms of cost-annotated rewrite rules supporting complex non-linearmatching and replacing patterns. Retargetability is achieved by organizing rules into profiles, one for each supported target architecture. Featuring a rule and profile inheritance mechanism, OORS makes the reuse of existing specifications possible. This is an improvement regarding traditional approaches. Altogether OORS increases the maintainability of the compiler?s back end and thus both decreases the complexity and reduces the effort of the retargeting process. To show the potential of this approach, we have implemented a code generation and a code optimization pattern matcher supporting different target architectures using the OORS language and introduced them in a compiler of a programming language for CPUs and GPUs.

Download Full-text