Dependency Graph-based High-level Synthesis for Maximum Instruction Parallelism

2021 ◽  
Vol 14 (4) ◽  
pp. 1-15
Author(s):  
Zhenghua Gu ◽  
Wenqing Wan ◽  
Jundong Xie ◽  
Chang Wu

Performance optimization is an important goal for High-level Synthesis (HLS). Existing HLS scheduling algorithms are all based on Control and Data Flow Graph (CDFG) and will schedule basic blocks in sequential order. Our study shows that the sequential scheduling order of basic blocks is a big limiting factor for achievable circuit performance. In this article, we propose a Dependency Graph (DG) with two important properties for scheduling. First, DG is a directed acyclic graph. Thus, no loop breaking heuristic is needed for scheduling. Second, DG can be used to identify the exact instruction parallelism. Our experiment shows that DG can lead to 76% instruction parallelism increase over CDFG. Based on DG, we propose a bottom-up scheduling algorithm to achieve much higher instruction parallelism than existing algorithms. Hierarchical state transition graph with guard conditions is proposed for efficient implementation of such high parallelism scheduling. Our experimental results show that our DG-based HLS algorithm can outperform the CDFG-based LegUp and the state-of-the-art industrial tool Vivado HLS by 2.88× and 1.29× on circuit latency, respectively.

2022 ◽  
Vol 15 (1) ◽  
pp. 1-32
Author(s):  
Lana Josipović ◽  
Shabnam Sheikhha ◽  
Andrea Guerrieri ◽  
Paolo Ienne ◽  
Jordi Cortadella

Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies (e.g., caches), unpredictable memory dependencies, and irregular control flow. Dataflow circuits exhibit an unconventional property: registers (usually referred to as “buffers”) can be placed anywhere in the circuit without changing its semantics, in strong contrast to what happens in traditional datapaths. Yet, although functionally irrelevant, this placement has a significant impact on the circuit’s timing and throughput. In this work, we show how to strategically place buffers into a dataflow circuit to optimize its performance. Our approach extracts a set of choice-free critical loops from arbitrary dataflow circuits and relies on the theory of marked graphs to optimize the buffer placement and sizing. Our performance optimization model supports important high-level synthesis features such as pipelined computational units, units with variable latency and throughput, and if-conversion. We demonstrate the performance benefits of our approach on a set of dataflow circuits obtained from imperative code.


1996 ◽  
Vol 8 (6) ◽  
pp. 516-523
Author(s):  
Michitaka Kameyama ◽  
◽  
Masayuki Sasaki

In intelligent integrated systems such as robotics for autonomous work, it is essential to respond to the change of the environment very quickly. Therefore, the development of special-purpose VLSI processors with minimum delay time becomes a very important subject. A suitable combination of spatially parallel and temporally parallel processing is very important to realize the minimum delay time. In this article, we present a scheduling algorithm for high-level synthesis, where the input to the scheduler is a behavioral description viewed as a data flow graph. The scheduler minimizes the delay time under the constraint of a silicon area and I/O pins.


2014 ◽  
Vol 45 (11) ◽  
pp. 1470-1479 ◽  
Author(s):  
Alberto A. Del Barrio ◽  
Román Hermida ◽  
Seda Ogrenci Memik ◽  
José M. Mendías ◽  
María C. Molina

Author(s):  
M.R. Corazao ◽  
M.A. Khalaf ◽  
L.M. Guerra ◽  
M. Potkonjak ◽  
J.M. Rabaey

Sign in / Sign up

Export Citation Format

Share Document