High-Level Synthesis Design for Stencil Computations on FPGA with High Bandwidth Memory

Changdao Du; Yoshiki Yamaguchi

doi:10.3390/electronics9081275

High-Level Synthesis Design for Stencil Computations on FPGA with High Bandwidth Memory

Electronics ◽

10.3390/electronics9081275 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1275

Author(s):

Changdao Du ◽

Yoshiki Yamaguchi

Keyword(s):

Programming Languages ◽

High Performance ◽

Design Space Exploration ◽

Scale Up ◽

High Level Synthesis ◽

Stencil Computations ◽

Temporal Domain ◽

High Bandwidth ◽

Promising Solution ◽

High Level

Due to performance and energy requirements, FPGA-based accelerators have become a promising solution for high-performance computations. Meanwhile, with the help of high-level synthesis (HLS) compilers, FPGA can be programmed using common programming languages such as C, C++, or OpenCL, thereby improving design efficiency and portability. Stencil computations are significant kernels in various scientific applications. In this paper, we introduce an architecture design for implementing stencil kernels on state-of-the-art FPGA with high bandwidth memory (HBM). Traditional FPGAs are usually equipped with external memory, e.g., DDR3 or DDR4, which limits the design space exploration in the spatial domain of stencil kernels. Therefore, many previous studies mainly relied on exploiting parallelism in the temporal domain to eliminate the bandwidth limitations. In our approach, we scale-up the design performance by considering both the spatial and temporal parallelism of the stencil kernel equally. We also discuss the design portability among different HLS compilers. We use typical stencil kernels to evaluate our design on a Xilinx U280 FPGA board and compare the results with other existing studies. By adopting our method, developers can take broad parallelization strategies based on specific FPGA resources to improve performance.

Download Full-text

Performance Modeling for FPGAs: Extending the Roofline Model with High-Level Synthesis Tools

International Journal of Reconfigurable Computing ◽

10.1155/2013/428078 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10 ◽

Cited By ~ 18

Author(s):

Bruno da Silva ◽

An Braeken ◽

Erik H. D’Hollander ◽

Abdellah Touhafi

Keyword(s):

High Performance ◽

Design Space Exploration ◽

Performance Model ◽

High Level Synthesis ◽

Performance Expectations ◽

Proposed Model ◽

Roofline Model ◽

High Level ◽

Performance Computing ◽

Selection Of

The potential of FPGAs as accelerators for high-performance computing applications is very large, but many factors are involved in their performance. The design for FPGAs and the selection of the proper optimizations when mapping computations to FPGAs lead to prohibitively long developing time. Alternatives are the high-level synthesis (HLS) tools, which promise a fast design space exploration due to design at high-level or analytical performance models which provide realistic performance expectations, potential impediments to performance, and optimization guidelines. In this paper we propose the combination of both, in order to construct a performance model for FPGAs which is able to visually condense all the helpful information for the designer. Our proposed model extends the roofline model, by considering the resource consumption and the parameters used in the HLS tools, to maximize the performance and the resource utilization within the area of the FPGA. The proposed model is applied to optimize the design exploration of a class of window-based image processing applications using two different HLS tools. The results show the accuracy of the model as well as its flexibility to be combined with any HLS tool.

Download Full-text

Evaluation of Static Mapping for Dynamic Space-Shared Multi-task Processing on FPGAs

Journal of Signal Processing Systems ◽

10.1007/s11265-020-01633-z ◽

2021 ◽

Author(s):

Umar Ibrahim Minhas ◽

Roger Woods ◽

Georgios Karakonstantis

Keyword(s):

High Performance ◽

Design Space Exploration ◽

Design Space ◽

System Throughput ◽

Design Parameters ◽

Temporal Constraints ◽

Shared Resources ◽

Task Processing ◽

High Level ◽

Performance Computing

AbstractWhilst FPGAs have been used in cloud ecosystems, it is still extremely challenging to achieve high compute density when mapping heterogeneous multi-tasks on shared resources at runtime. This work addresses this by treating the FPGA resource as a service and employing multi-task processing at the high level, design space exploration and static off-line partitioning in order to allow more efficient mapping of heterogeneous tasks onto the FPGA. In addition, a new, comprehensive runtime functional simulator is used to evaluate the effect of various spatial and temporal constraints on both the existing and new approaches when varying system design parameters. A comprehensive suite of real high performance computing tasks was implemented on a Nallatech 385 FPGA card and show that our approach can provide on average 2.9 × and 2.3 × higher system throughput for compute and mixed intensity tasks, while 0.2 × lower for memory intensive tasks due to external memory access latency and bandwidth limitations. The work has been extended by introducing a novel scheduling scheme to enhance temporal utilization of resources when using the proposed approach. Additional results for large queues of mixed intensity tasks (compute and memory) show that the proposed partitioning and scheduling approach can provide higher than 3 × system speedup over previous schemes.

Download Full-text

Implementation and Design Space Exploration of a Turbo Decoder in High-Level Synthesis

2019 International Conference on ReConFigurable Computing and FPGAs (ReConFig) ◽

10.1109/reconfig48160.2019.8994787 ◽

2019 ◽

Author(s):

Wesley Stirk ◽

Jeff Goeders

Keyword(s):

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

High Level Synthesis ◽

Turbo Decoder ◽

High Level

Download Full-text

Distributed design-space exploration for high-level synthesis systems

[1992] Proceedings 29th ACM/IEEE Design Automation Conference ◽

10.1109/dac.1992.227806 ◽

2003 ◽

Cited By ~ 24

Author(s):

R. Dutta ◽

J. Roy ◽

R. Vemuri

Keyword(s):

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

High Level Synthesis ◽

Distributed Design ◽

High Level

Download Full-text

Divide and conquer high-level synthesis design space exploration

ACM Transactions on Design Automation of Electronic Systems ◽

10.1145/2209291.2209302 ◽

2012 ◽

Vol 17 (3) ◽

pp. 1-19 ◽

Cited By ~ 24

Author(s):

Benjamin Carrion Schafer ◽

Kazutoshi Wakabayashi

Keyword(s):

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

Divide And Conquer ◽

High Level Synthesis ◽

Synthesis Design ◽

High Level

Download Full-text

Buffer Placement and Sizing for High-Performance Dataflow Circuits

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3477053 ◽

2022 ◽

Vol 15 (1) ◽

pp. 1-32

Author(s):

Lana Josipović ◽

Shabnam Sheikhha ◽

Andrea Guerrieri ◽

Paolo Ienne ◽

Jordi Cortadella

Keyword(s):

Performance Optimization ◽

Optimization Model ◽

High Performance ◽

Control Flow ◽

High Level Synthesis ◽

Software Applications ◽

Marked Graphs ◽

Variable Latency ◽

High Level ◽

Strong Contrast

Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies (e.g., caches), unpredictable memory dependencies, and irregular control flow. Dataflow circuits exhibit an unconventional property: registers (usually referred to as “buffers”) can be placed anywhere in the circuit without changing its semantics, in strong contrast to what happens in traditional datapaths. Yet, although functionally irrelevant, this placement has a significant impact on the circuit’s timing and throughput. In this work, we show how to strategically place buffers into a dataflow circuit to optimize its performance. Our approach extracts a set of choice-free critical loops from arbitrary dataflow circuits and relies on the theory of marked graphs to optimize the buffer placement and sizing. Our performance optimization model supports important high-level synthesis features such as pipelined computational units, units with variable latency and throughput, and if-conversion. We demonstrate the performance benefits of our approach on a set of dataflow circuits obtained from imperative code.

Download Full-text

Higher-level geophysical modelling

10.5194/egusphere-egu21-2127 ◽

2021 ◽

Author(s):

Roman Nuterman ◽

Dion Häfner ◽

Markus Jochum

Keyword(s):

Machine Learning ◽

Programming Languages ◽

High Performance ◽

Ocean Model ◽

User Friendliness ◽

Model Code ◽

Building Models ◽

Fortran Implementation ◽

High Level ◽

New Generation

Until recently, our pure Python, primitive equation ocean model Veros&#160; has been about 1.5x slower than a corresponding Fortran implementation.&#160; But thanks to a thriving scientific and machine learning library&#160; ecosystem, tremendous speed-ups on GPU, and to a lesser degree CPU, are&#160; within reach. Leveraging Google's JAX library, we find that our Python&#160; model code can reach a 2-5 times higher energy efficiency on GPU&#160; compared to a traditional Fortran model.Therefore, we propose a new generation of geophysical models: One that&#160; combines high-level abstractions and user friendliness on one hand, and&#160; that leverages modern developments in high-performance computing and&#160; machine learning research on the other hand.We discuss what there is to gain from building models in high-level&#160; programming languages, what we have achieved in Veros, and where we see&#160; the modelling community heading in the future.

Download Full-text

Chimera: A Hybrid Machine Learning-Driven Multi-Objective Design Space Exploration Tool for FPGA High-Level Synthesis

10.1007/978-3-030-91608-4_52 ◽

2021 ◽

pp. 524-536

Author(s):

Mang Yu ◽

Sitao Huang ◽

Deming Chen

Keyword(s):

Machine Learning ◽

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

High Level Synthesis ◽

Multi Objective ◽

Exploration Tool ◽

Hybrid Machine ◽

High Level

Download Full-text

High-Level Synthesis Design Space Exploration: Past, Present, and Future

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ◽

10.1109/tcad.2019.2943570 ◽

2020 ◽

Vol 39 (10) ◽

pp. 2628-2639 ◽

Cited By ~ 3

Author(s):

Benjamin Carrion Schafer ◽

Zi Wang

Keyword(s):

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

High Level Synthesis ◽

Synthesis Design ◽

High Level

Download Full-text

Leveraging Prior Knowledge for Effective Design-Space Exploration in High-Level Synthesis

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ◽

10.1109/tcad.2020.3012750 ◽

2020 ◽

Vol 39 (11) ◽

pp. 3736-3747

Author(s):

Lorenzo Ferretti ◽

Jihye Kwon ◽

Giovanni Ansaloni ◽

Giuseppe Di Guglielmo ◽

Luca P. Carloni ◽

...

Keyword(s):

Prior Knowledge ◽

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

High Level Synthesis ◽

High Level ◽

Effective Design

Download Full-text