A Highly Configurable High-Level Synthesis Functional Pattern Library

FPGA has recently played an increasingly important role in heterogeneous computing, but Register Transfer Level design flows are not only inefficient in design, but also require designers to be familiar with the circuit architecture. High-level synthesis (HLS) allows developers to design FPGA circuits more efficiently with a more familiar programming language, a higher level of abstraction, and automatic adaptation of timing constraints. When using HLS tools, such as Xilinx Vivado HLS, specific design patterns and techniques are required in order to create high-performance circuits. Moreover, designing efficient concurrency and data flow structures requires a deep understanding of the hardware, imposing more learning costs on programmers. In this paper, we propose a set of functional patterns libraries based on the MapReduce model, implemented by C++ templates, which can quickly implement high-performance parallel pipelined computing models on FPGA with specified simple parameters. The usage of this pattern library allows flexible adaptation of parallel and flow structures in algorithms, which greatly improves the coding efficiency. The contributions of this paper are as follows. (1) Four standard functional operators suitable for hardware parallel computing are defined. (2) Functional concurrent programming patterns are described based on C++ templates and Xilinx HLS. (3) The efficiency of this programming paradigm is verified with two algorithms with different complexity.

Download Full-text

High-Level Synthesis Design for Stencil Computations on FPGA with High Bandwidth Memory

Electronics ◽

10.3390/electronics9081275 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1275

Author(s):

Changdao Du ◽

Yoshiki Yamaguchi

Keyword(s):

Programming Languages ◽

High Performance ◽

Design Space Exploration ◽

Scale Up ◽

High Level Synthesis ◽

Stencil Computations ◽

Temporal Domain ◽

High Bandwidth ◽

Promising Solution ◽

High Level

Due to performance and energy requirements, FPGA-based accelerators have become a promising solution for high-performance computations. Meanwhile, with the help of high-level synthesis (HLS) compilers, FPGA can be programmed using common programming languages such as C, C++, or OpenCL, thereby improving design efficiency and portability. Stencil computations are significant kernels in various scientific applications. In this paper, we introduce an architecture design for implementing stencil kernels on state-of-the-art FPGA with high bandwidth memory (HBM). Traditional FPGAs are usually equipped with external memory, e.g., DDR3 or DDR4, which limits the design space exploration in the spatial domain of stencil kernels. Therefore, many previous studies mainly relied on exploiting parallelism in the temporal domain to eliminate the bandwidth limitations. In our approach, we scale-up the design performance by considering both the spatial and temporal parallelism of the stencil kernel equally. We also discuss the design portability among different HLS compilers. We use typical stencil kernels to evaluate our design on a Xilinx U280 FPGA board and compare the results with other existing studies. By adopting our method, developers can take broad parallelization strategies based on specific FPGA resources to improve performance.

Download Full-text

Buffer Placement and Sizing for High-Performance Dataflow Circuits

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3477053 ◽

2022 ◽

Vol 15 (1) ◽

pp. 1-32

Author(s):

Lana Josipović ◽

Shabnam Sheikhha ◽

Andrea Guerrieri ◽

Paolo Ienne ◽

Jordi Cortadella

Keyword(s):

Performance Optimization ◽

Optimization Model ◽

High Performance ◽

Control Flow ◽

High Level Synthesis ◽

Software Applications ◽

Marked Graphs ◽

Variable Latency ◽

High Level ◽

Strong Contrast

Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies (e.g., caches), unpredictable memory dependencies, and irregular control flow. Dataflow circuits exhibit an unconventional property: registers (usually referred to as “buffers”) can be placed anywhere in the circuit without changing its semantics, in strong contrast to what happens in traditional datapaths. Yet, although functionally irrelevant, this placement has a significant impact on the circuit’s timing and throughput. In this work, we show how to strategically place buffers into a dataflow circuit to optimize its performance. Our approach extracts a set of choice-free critical loops from arbitrary dataflow circuits and relies on the theory of marked graphs to optimize the buffer placement and sizing. Our performance optimization model supports important high-level synthesis features such as pipelined computational units, units with variable latency and throughput, and if-conversion. We demonstrate the performance benefits of our approach on a set of dataflow circuits obtained from imperative code.

Download Full-text

A methodology for high level synthesis of high performance DSP structures targetting FPGAs

10.1109/icasic.1996.562758 ◽

2002 ◽

Author(s):

S. Shehata ◽

B. Haroun ◽

A. Al-Khalili

Keyword(s):

High Performance ◽

High Level Synthesis ◽

High Level

Download Full-text

On the Design of High Performance HW Accelerator through High-level Synthesis Scheduling Approximations

2020 Design, Automation & Test in Europe Conference & Exhibition (DATE) ◽

10.23919/date48585.2020.9116358 ◽

2020 ◽

Author(s):

Siyuan Xu ◽

Benjamin Carrion Schafer

Keyword(s):

High Performance ◽

High Level Synthesis ◽

High Level

Download Full-text

Coordinated transformations for high-level synthesis of high performance microprocessor blocks

Proceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324) ◽

10.1145/513918.514140 ◽

2002 ◽

Cited By ~ 4

Author(s):

Sumit Gupta ◽

Nick Savoiu ◽

Nikil Dutt ◽

Rajesh Gupta ◽

Alex Nicolau ◽

...

Keyword(s):

High Performance ◽

High Level Synthesis ◽

High Level

Download Full-text

Efficient FPGA Implementation of OpenCL High-Performance Computing Applications via High-Level Synthesis

IEEE Access ◽

10.1109/access.2017.2671881 ◽

2017 ◽

Vol 5 ◽

pp. 2747-2762 ◽

Cited By ~ 29

Author(s):

Fahad Bin Muslim ◽

Liang Ma ◽

Mehdi Roozmeh ◽

Luciano Lavagno

Keyword(s):

High Performance Computing ◽

High Performance ◽

Fpga Implementation ◽

High Level Synthesis ◽

High Level ◽

Performance Computing

Download Full-text

Architecture Exploration of High-Performance Floating-Point Fused Multiply-Add Units and their Automatic Use in High-Level Synthesis

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum ◽

10.1109/ipdpsw.2013.106 ◽

2013 ◽

Cited By ~ 2

Author(s):

Bjorn Liebig ◽

Jens Huthmann ◽

Andreas Koch

Keyword(s):

High Performance ◽

High Level Synthesis ◽

Floating Point ◽

Architecture Exploration ◽

High Level

Download Full-text

Transformations of High-Level Synthesis Codes for High-Performance Computing

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2020.3039409 ◽

2021 ◽

Vol 32 (5) ◽

pp. 1014-1029

Author(s):

Johannes de Fine Licht ◽

Maciej Besta ◽

Simon Meierhans ◽

Torsten Hoefler

Keyword(s):

High Performance Computing ◽

High Performance ◽

High Level Synthesis ◽

High Level ◽

Performance Computing

Download Full-text

Heterogeneous Computing Meets Near-Memory Acceleration and High-Level Synthesis in the Post-Moore Era

IEEE Micro ◽

10.1109/mm.2017.3211105 ◽

2017 ◽

Vol 37 (4) ◽

pp. 10-18 ◽

Cited By ~ 12

Author(s):

Nam Sung Kim ◽

Deming Chen ◽

Jinjun Xiong ◽

Wen-mei W. Hwu

Keyword(s):

Heterogeneous Computing ◽

High Level Synthesis ◽

High Level

Download Full-text

Computational Simulation of Gas Turbines: Part 2—Extensible Domain Framework

Journal of Engineering for Gas Turbines and Power ◽

10.1115/1.1287489 ◽

2000 ◽

Vol 122 (3) ◽

pp. 377-386 ◽

Cited By ~ 5

Author(s):

John A. Reed ◽

Abdollah A. Afjeh

Keyword(s):

Gas Turbine ◽

Gas Turbines ◽

Design Patterns ◽

Heterogeneous Computing ◽

Computational Simulation ◽

Object Oriented ◽

Turbine Engine ◽

Engine Model ◽

Computationally Intensive ◽

High Level

This paper describes the design concepts and object-oriented architecture of Onyx, an extensible domain framework for computational simulation of gas turbine engines. Onyx provides a flexible environment for defining, modifying, and simulating the component-based gas turbine models described in Part 1 of this paper. Using advanced object-oriented technologies such as design patterns and frameworks, Onyx enables users to customize and extend the framework to add new functionality or adapt simulation behavior as required. A customizable visual interface provides high-level symbolic control of propulsion system construction and execution. For computationally-intensive analysis, components may be distributed across heterogeneous computing architectures and operating systems. A distributed gas turbine engine model is developed and simulated to illustrate the use of the framework. [S0742-4795(00)02403-0]

Download Full-text