Buffer Placement and Sizing for High-Performance Dataflow Circuits

Lana Josipović; Shabnam Sheikhha; Andrea Guerrieri; Paolo Ienne; Jordi Cortadella

doi:10.1145/3477053

Buffer Placement and Sizing for High-Performance Dataflow Circuits

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3477053 ◽

2022 ◽

Vol 15 (1) ◽

pp. 1-32

Author(s):

Lana Josipović ◽

Shabnam Sheikhha ◽

Andrea Guerrieri ◽

Paolo Ienne ◽

Jordi Cortadella

Keyword(s):

Performance Optimization ◽

Optimization Model ◽

High Performance ◽

Control Flow ◽

High Level Synthesis ◽

Software Applications ◽

Marked Graphs ◽

Variable Latency ◽

High Level ◽

Strong Contrast

Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies (e.g., caches), unpredictable memory dependencies, and irregular control flow. Dataflow circuits exhibit an unconventional property: registers (usually referred to as “buffers”) can be placed anywhere in the circuit without changing its semantics, in strong contrast to what happens in traditional datapaths. Yet, although functionally irrelevant, this placement has a significant impact on the circuit’s timing and throughput. In this work, we show how to strategically place buffers into a dataflow circuit to optimize its performance. Our approach extracts a set of choice-free critical loops from arbitrary dataflow circuits and relies on the theory of marked graphs to optimize the buffer placement and sizing. Our performance optimization model supports important high-level synthesis features such as pipelined computational units, units with variable latency and throughput, and if-conversion. We demonstrate the performance benefits of our approach on a set of dataflow circuits obtained from imperative code.

Download Full-text

Dependency Graph-based High-level Synthesis for Maximum Instruction Parallelism

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3468875 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1-15

Author(s):

Zhenghua Gu ◽

Wenqing Wan ◽

Jundong Xie ◽

Chang Wu

Keyword(s):

Performance Optimization ◽

Directed Acyclic Graph ◽

Scheduling Algorithm ◽

Dependency Graph ◽

High Level Synthesis ◽

Limiting Factor ◽

Circuit Performance ◽

State Transition Graph ◽

High Level ◽

Basic Blocks

Performance optimization is an important goal for High-level Synthesis (HLS). Existing HLS scheduling algorithms are all based on Control and Data Flow Graph (CDFG) and will schedule basic blocks in sequential order. Our study shows that the sequential scheduling order of basic blocks is a big limiting factor for achievable circuit performance. In this article, we propose a Dependency Graph (DG) with two important properties for scheduling. First, DG is a directed acyclic graph. Thus, no loop breaking heuristic is needed for scheduling. Second, DG can be used to identify the exact instruction parallelism. Our experiment shows that DG can lead to 76% instruction parallelism increase over CDFG. Based on DG, we propose a bottom-up scheduling algorithm to achieve much higher instruction parallelism than existing algorithms. Hierarchical state transition graph with guard conditions is proposed for efficient implementation of such high parallelism scheduling. Our experimental results show that our DG-based HLS algorithm can outperform the CDFG-based LegUp and the state-of-the-art industrial tool Vivado HLS by 2.88× and 1.29× on circuit latency, respectively.

Download Full-text

High-Level Synthesis Design for Stencil Computations on FPGA with High Bandwidth Memory

Electronics ◽

10.3390/electronics9081275 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1275

Author(s):

Changdao Du ◽

Yoshiki Yamaguchi

Keyword(s):

Programming Languages ◽

High Performance ◽

Design Space Exploration ◽

Scale Up ◽

High Level Synthesis ◽

Stencil Computations ◽

Temporal Domain ◽

High Bandwidth ◽

Promising Solution ◽

High Level

Due to performance and energy requirements, FPGA-based accelerators have become a promising solution for high-performance computations. Meanwhile, with the help of high-level synthesis (HLS) compilers, FPGA can be programmed using common programming languages such as C, C++, or OpenCL, thereby improving design efficiency and portability. Stencil computations are significant kernels in various scientific applications. In this paper, we introduce an architecture design for implementing stencil kernels on state-of-the-art FPGA with high bandwidth memory (HBM). Traditional FPGAs are usually equipped with external memory, e.g., DDR3 or DDR4, which limits the design space exploration in the spatial domain of stencil kernels. Therefore, many previous studies mainly relied on exploiting parallelism in the temporal domain to eliminate the bandwidth limitations. In our approach, we scale-up the design performance by considering both the spatial and temporal parallelism of the stencil kernel equally. We also discuss the design portability among different HLS compilers. We use typical stencil kernels to evaluate our design on a Xilinx U280 FPGA board and compare the results with other existing studies. By adopting our method, developers can take broad parallelization strategies based on specific FPGA resources to improve performance.

Download Full-text

High-level synthesis of low-power control-flow intensive circuits

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ◽

10.1109/43.811321 ◽

1999 ◽

Vol 18 (12) ◽

pp. 1715-1729 ◽

Cited By ~ 19

Author(s):

K.S. Khouri ◽

G. Lakshminarayana ◽

N.K. Jha

Keyword(s):

Low Power ◽

Power Control ◽

Control Flow ◽

High Level Synthesis ◽

High Level

Download Full-text

A methodology for high level synthesis of high performance DSP structures targetting FPGAs

10.1109/icasic.1996.562758 ◽

2002 ◽

Author(s):

S. Shehata ◽

B. Haroun ◽

A. Al-Khalili

Keyword(s):

High Performance ◽

High Level Synthesis ◽

High Level

Download Full-text

On the Design of High Performance HW Accelerator through High-level Synthesis Scheduling Approximations

2020 Design, Automation & Test in Europe Conference & Exhibition (DATE) ◽

10.23919/date48585.2020.9116358 ◽

2020 ◽

Author(s):

Siyuan Xu ◽

Benjamin Carrion Schafer

Keyword(s):

High Performance ◽

High Level Synthesis ◽

High Level

Download Full-text

Profiling-Based Control-Flow Reduction in High-Level Synthesis

10.1109/icfpt52863.2021.9609816 ◽

2021 ◽

Author(s):

Austin Liolli ◽

Omar Ragheb ◽

Jason Anderson

Keyword(s):

Control Flow ◽

High Level Synthesis ◽

Flow Reduction ◽

High Level

Download Full-text

Register binding based power management for high-level synthesis of control-flow intensive behaviors

Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors ◽

10.1109/iccd.2002.1106800 ◽

2003 ◽

Cited By ~ 2

Author(s):

Lin Zhong ◽

Jiong Luo ◽

Yunsi Fei ◽

N.K. Jha

Keyword(s):

Power Management ◽

Control Flow ◽

High Level Synthesis ◽

High Level

Download Full-text

Coordinated transformations for high-level synthesis of high performance microprocessor blocks

Proceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324) ◽

10.1145/513918.514140 ◽

2002 ◽

Cited By ~ 4

Author(s):

Sumit Gupta ◽

Nick Savoiu ◽

Nikil Dutt ◽

Rajesh Gupta ◽

Alex Nicolau ◽

...

Keyword(s):

High Performance ◽

High Level Synthesis ◽

High Level

Download Full-text

Combined control flow dominated and data flow dominated high-level synthesis

33rd Design Automation Conference Proceedings, 1996 ◽

10.1109/dac.1996.545641 ◽

2005 ◽

Cited By ~ 2

Author(s):

E. Berrebi ◽

P. Kission ◽

S. Vernalde ◽

S. De Troch ◽

J.C. Berluison ◽

...

Keyword(s):

Data Flow ◽

Control Flow ◽

High Level Synthesis ◽

Combined Control ◽

High Level

Download Full-text

Efficient FPGA Implementation of OpenCL High-Performance Computing Applications via High-Level Synthesis

IEEE Access ◽

10.1109/access.2017.2671881 ◽

2017 ◽

Vol 5 ◽

pp. 2747-2762 ◽

Cited By ~ 29

Author(s):

Fahad Bin Muslim ◽

Liang Ma ◽

Mehdi Roozmeh ◽

Luciano Lavagno

Keyword(s):

High Performance Computing ◽

High Performance ◽

Fpga Implementation ◽

High Level Synthesis ◽

High Level ◽

Performance Computing

Download Full-text