Backward-annotation of post-layout delay information into high-level synthesis process for performance optimization

Performance optimization is an important goal for High-level Synthesis (HLS). Existing HLS scheduling algorithms are all based on Control and Data Flow Graph (CDFG) and will schedule basic blocks in sequential order. Our study shows that the sequential scheduling order of basic blocks is a big limiting factor for achievable circuit performance. In this article, we propose a Dependency Graph (DG) with two important properties for scheduling. First, DG is a directed acyclic graph. Thus, no loop breaking heuristic is needed for scheduling. Second, DG can be used to identify the exact instruction parallelism. Our experiment shows that DG can lead to 76% instruction parallelism increase over CDFG. Based on DG, we propose a bottom-up scheduling algorithm to achieve much higher instruction parallelism than existing algorithms. Hierarchical state transition graph with guard conditions is proposed for efficient implementation of such high parallelism scheduling. Our experimental results show that our DG-based HLS algorithm can outperform the CDFG-based LegUp and the state-of-the-art industrial tool Vivado HLS by 2.88× and 1.29× on circuit latency, respectively.

Download Full-text

Buffer Placement and Sizing for High-Performance Dataflow Circuits

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3477053 ◽

2022 ◽

Vol 15 (1) ◽

pp. 1-32

Author(s):

Lana Josipović ◽

Shabnam Sheikhha ◽

Andrea Guerrieri ◽

Paolo Ienne ◽

Jordi Cortadella

Keyword(s):

Performance Optimization ◽

Optimization Model ◽

High Performance ◽

Control Flow ◽

High Level Synthesis ◽

Software Applications ◽

Marked Graphs ◽

Variable Latency ◽

High Level ◽

Strong Contrast

Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies (e.g., caches), unpredictable memory dependencies, and irregular control flow. Dataflow circuits exhibit an unconventional property: registers (usually referred to as “buffers”) can be placed anywhere in the circuit without changing its semantics, in strong contrast to what happens in traditional datapaths. Yet, although functionally irrelevant, this placement has a significant impact on the circuit’s timing and throughput. In this work, we show how to strategically place buffers into a dataflow circuit to optimize its performance. Our approach extracts a set of choice-free critical loops from arbitrary dataflow circuits and relies on the theory of marked graphs to optimize the buffer placement and sizing. Our performance optimization model supports important high-level synthesis features such as pipelined computational units, units with variable latency and throughput, and if-conversion. We demonstrate the performance benefits of our approach on a set of dataflow circuits obtained from imperative code.

Download Full-text

Performance optimization using template mapping for datapath-intensive high-level synthesis

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ◽

10.1109/43.511568 ◽

1996 ◽

Vol 15 (8) ◽

pp. 877-888 ◽

Cited By ~ 64

Author(s):

M.R. Corazao ◽

M.A. Khalaf ◽

L.M. Guerra ◽

M. Potkonjak ◽

J.M. Rabaey

Keyword(s):

Performance Optimization ◽

High Level Synthesis ◽

High Level

Download Full-text

Feedback Driven High Level Synthesis for Performance Optimization

2005 6th International Conference on ASIC ◽

10.1109/icasic.2005.1611468 ◽

2006 ◽

Author(s):

Hao Li ◽

S. Katkoori ◽

Zhipeng Liu

Keyword(s):

Performance Optimization ◽

High Level Synthesis ◽

High Level

Download Full-text

A Survey on Performance Optimization of High-Level Synthesis Tools

Journal of Computer Science and Technology ◽

10.1007/s11390-020-9414-8 ◽

2020 ◽

Vol 35 (3) ◽

pp. 697-720

Author(s):

Lan Huang ◽

Da-Lin Li ◽

Kang-Ping Wang ◽

Teng Gao ◽

Adriano Tavares

Keyword(s):

Performance Optimization ◽

High Level Synthesis ◽

High Level

Download Full-text

A Novel Framework for Applying Multiobjective GA and PSO Based Approaches for Simultaneous Area, Delay, and Power Optimization in High Level Synthesis of Datapaths

VLSI Design ◽

10.1155/2012/273276 ◽

2012 ◽

Vol 2012 ◽

pp. 1-12 ◽

Cited By ~ 15

Author(s):

D. S. Harish Ram ◽

M. C. Bhuvaneswari ◽

Shanthi S. Prabhu

Keyword(s):

High Level Synthesis ◽

Synthesis Process ◽

Weighted Sum ◽

Early Assessment ◽

Nsga Ii ◽

Multi Objective ◽

Trade Offs ◽

Evolutionary Technique ◽

High Level ◽

The Impact

High-Level Synthesis deals with the translation of algorithmic descriptions into an RTL implementation. It is highly multi-objective in nature, necessitating trade-offs between mutually conflicting objectives such as area, power and delay. Thus design space exploration is integral to the High Level Synthesis process for early assessment of the impact of these trade-offs. We propose a methodology for multi-objective optimization of Area, Power and Delay during High Level Synthesis of data paths from Data Flow Graphs (DFGs). The technique performs scheduling and allocation of functional units and registers concurrently. A novel metric based technique is incorporated into the algorithm to estimate the likelihood of a schedule to yield low-power solutions. A true multi-objective evolutionary technique, “Nondominated Sorting Genetic Algorithm II” (NSGA II) is used in this work. Results on standard DFG benchmarks indicate that the NSGA II based approach is much faster than a weighted sum GA approach. It also yields superior solutions in terms of diversity and closeness to the true Pareto front. In addition a framework for applying another evolutionary technique: Weighted Sum Particle Swarm Optimization (WSPSO) is also reported. It is observed that compared to WSGA, WSPSO shows considerable improvement in execution time with comparable solution quality.

Download Full-text

EXTENDED COMPATIBILITY PATH BASED HARDWARE BINDING: AN ADAPTIVE ALGORITHM FOR HIGH LEVEL SYNTHESIS OF AREA-TIME EFFICIENT DESIGNS

Journal of Circuits System and Computers ◽

10.1142/s021812661450131x ◽

2014 ◽

Vol 23 (09) ◽

pp. 1450131

Author(s):

SHARAD SINHA ◽

UDIT DHAWAN ◽

THAMBIPILLAI SRIKANTHAN

Keyword(s):

High Level Synthesis ◽

Synthesis Process ◽

Bipartite Matching ◽

Area Reduction ◽

Time Requirements ◽

Application Aware ◽

High Level ◽

It Application ◽

Multi Mode

Hardware binding is an important step in high level synthesis (HLS). The quality of hardware binding affects the area-time efficiency of a design. The goal of a synthesis process is to produce a design which meets the area-time requirements. In this paper, we present a new hardware binding algorithm with focus on area reduction. It is called extended compatibility path-based (ECPB) hardware binding and extends the compatibility path-based (CPB) hardware binding method by exploiting inter-operation flow dependencies, non-overlapping lifetimes of variables and modifying the weight relation in order to make it application aware and thus adaptive in nature. The presented methodology also takes into account bit width of functional units (FUs) and multi mode FUs. It performs simultaneous FU and register binding. Implemented within a C to register transfer level (RTL) framework, it produces binding results which are better than those produced by weighted bipartite matching (WBM) and CPB algorithms. The use of ECPB algorithm results in an average reduction of 34% and 17.44% in area-time product over WBM and CPB methods, respectively.

Download Full-text

Design Procedure Based on VHDL Language Transformations

VLSI Design ◽

10.1080/10655140290011159 ◽

2002 ◽

Vol 14 (4) ◽

pp. 349-354

Author(s):

László Varga ◽

Gábor Hosszú ◽

Ferenc Kovács

Keyword(s):

Design Procedure ◽

Register Transfer Level ◽

Design Flow ◽

Graph Representation ◽

High Level Synthesis ◽

Synthesis Process ◽

Functional Verification ◽

Behavioral Synthesis ◽

Synthesis Procedure ◽

High Level

One of the major problems within the VHDL based behavioral synthesis is to start the design on higher abstraction level than the register transfer level (RTL). VHDL semantics was designed strictly for simulation, therefore it was not considered as high-level synthesis language. A novel synthesis procedure was developed, which uses the methodology of high level synthesis. It starts from an abstract VHDL model and produces an RTL VHDL description through successive language transformations while preserving the VHDL standard simulation semantics. The steps of the synthesis do not use graph representation or other meta-language, but apply the standard VHDL only. This VHDL representation is simulatable and accessible, functional verification can be performed by simulation at any time, and the simulation results can be used to guide the synthesis process. The output VHDL format is suitable to continue the design flow with RTL based synthesis tools.

Download Full-text