scholarly journals A Highly Configurable High-Level Synthesis Functional Pattern Library

Electronics ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 532
Author(s):  
Lan Huang ◽  
Teng Gao ◽  
Dalin Li ◽  
Zihao Wang ◽  
Kangping Wang

FPGA has recently played an increasingly important role in heterogeneous computing, but Register Transfer Level design flows are not only inefficient in design, but also require designers to be familiar with the circuit architecture. High-level synthesis (HLS) allows developers to design FPGA circuits more efficiently with a more familiar programming language, a higher level of abstraction, and automatic adaptation of timing constraints. When using HLS tools, such as Xilinx Vivado HLS, specific design patterns and techniques are required in order to create high-performance circuits. Moreover, designing efficient concurrency and data flow structures requires a deep understanding of the hardware, imposing more learning costs on programmers. In this paper, we propose a set of functional patterns libraries based on the MapReduce model, implemented by C++ templates, which can quickly implement high-performance parallel pipelined computing models on FPGA with specified simple parameters. The usage of this pattern library allows flexible adaptation of parallel and flow structures in algorithms, which greatly improves the coding efficiency. The contributions of this paper are as follows. (1) Four standard functional operators suitable for hardware parallel computing are defined. (2) Functional concurrent programming patterns are described based on C++ templates and Xilinx HLS. (3) The efficiency of this programming paradigm is verified with two algorithms with different complexity.

Electronics ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1275
Author(s):  
Changdao Du ◽  
Yoshiki Yamaguchi

Due to performance and energy requirements, FPGA-based accelerators have become a promising solution for high-performance computations. Meanwhile, with the help of high-level synthesis (HLS) compilers, FPGA can be programmed using common programming languages such as C, C++, or OpenCL, thereby improving design efficiency and portability. Stencil computations are significant kernels in various scientific applications. In this paper, we introduce an architecture design for implementing stencil kernels on state-of-the-art FPGA with high bandwidth memory (HBM). Traditional FPGAs are usually equipped with external memory, e.g., DDR3 or DDR4, which limits the design space exploration in the spatial domain of stencil kernels. Therefore, many previous studies mainly relied on exploiting parallelism in the temporal domain to eliminate the bandwidth limitations. In our approach, we scale-up the design performance by considering both the spatial and temporal parallelism of the stencil kernel equally. We also discuss the design portability among different HLS compilers. We use typical stencil kernels to evaluate our design on a Xilinx U280 FPGA board and compare the results with other existing studies. By adopting our method, developers can take broad parallelization strategies based on specific FPGA resources to improve performance.


2022 ◽  
Vol 15 (1) ◽  
pp. 1-32
Author(s):  
Lana Josipović ◽  
Shabnam Sheikhha ◽  
Andrea Guerrieri ◽  
Paolo Ienne ◽  
Jordi Cortadella

Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies (e.g., caches), unpredictable memory dependencies, and irregular control flow. Dataflow circuits exhibit an unconventional property: registers (usually referred to as “buffers”) can be placed anywhere in the circuit without changing its semantics, in strong contrast to what happens in traditional datapaths. Yet, although functionally irrelevant, this placement has a significant impact on the circuit’s timing and throughput. In this work, we show how to strategically place buffers into a dataflow circuit to optimize its performance. Our approach extracts a set of choice-free critical loops from arbitrary dataflow circuits and relies on the theory of marked graphs to optimize the buffer placement and sizing. Our performance optimization model supports important high-level synthesis features such as pipelined computational units, units with variable latency and throughput, and if-conversion. We demonstrate the performance benefits of our approach on a set of dataflow circuits obtained from imperative code.


2021 ◽  
Vol 32 (5) ◽  
pp. 1014-1029
Author(s):  
Johannes de Fine Licht ◽  
Maciej Besta ◽  
Simon Meierhans ◽  
Torsten Hoefler

IEEE Micro ◽  
2017 ◽  
Vol 37 (4) ◽  
pp. 10-18 ◽  
Author(s):  
Nam Sung Kim ◽  
Deming Chen ◽  
Jinjun Xiong ◽  
Wen-mei W. Hwu

2000 ◽  
Vol 122 (3) ◽  
pp. 377-386 ◽  
Author(s):  
John A. Reed ◽  
Abdollah A. Afjeh

This paper describes the design concepts and object-oriented architecture of Onyx, an extensible domain framework for computational simulation of gas turbine engines. Onyx provides a flexible environment for defining, modifying, and simulating the component-based gas turbine models described in Part 1 of this paper. Using advanced object-oriented technologies such as design patterns and frameworks, Onyx enables users to customize and extend the framework to add new functionality or adapt simulation behavior as required. A customizable visual interface provides high-level symbolic control of propulsion system construction and execution. For computationally-intensive analysis, components may be distributed across heterogeneous computing architectures and operating systems. A distributed gas turbine engine model is developed and simulated to illustrate the use of the framework. [S0742-4795(00)02403-0]


Sign in / Sign up

Export Citation Format

Share Document