The Impact of Loop Unrolling on Controller Delay in High Level Synthesis

Advanced Encryption Standard (AES) represents a fundamental building module of many network security protocols to ensure data confidentiality in various applications ranging from data servers to low-power hardware embedded systems. In order to optimize such hardware implementations, High-Level Synthesis (HLS) provides flexibility in designing and rapid optimization of dedicated hardware to meet the design constraints. In this paper, we present the implementation of AES encryption processor on FPGA using Xilinx Vivado HLS. The AES architecture was analyzed and designed by loop unrolling, and inner-round and outer-round pipelining techniques to achieve a maximum throughput of the AES algorithm up to 1290 Mbps (Mega bit per second) with very significant low resources of 3.24% slices of the FPGA, achieving 3 Mbps per slice area.

Download Full-text

High Level Synthesis Methodology for Exploring Loop Unrolling Factor and Functional Datapath

2018 International Conference on Advanced Computation and Telecommunication (ICACAT) ◽

10.1109/icacat.2018.8933661 ◽

2018 ◽

Author(s):

Pallabi Sarkar ◽

Mrinal Kanti Naskar ◽

Anirban Sengupta

Keyword(s):

High Level Synthesis ◽

Loop Unrolling ◽

Synthesis Methodology ◽

High Level

Download Full-text

Bus Optimization for Low Power in High-Level Synthesis

Journal of Circuits System and Computers ◽

10.1142/s0218126603000829 ◽

2003 ◽

Vol 12 (01) ◽

pp. 1-17

Author(s):

Sungpack Hong ◽

Taewhan Kim

Keyword(s):

Optimal Solution ◽

Minimum Cost ◽

Maximum Flow ◽

High Level Synthesis ◽

Benchmark Problems ◽

Timing Constraints ◽

Power Efficient ◽

High Level ◽

The Impact ◽

Operation Scheduling

Sub-micron feature sizes have resulted in a considerable portion of power to be dissipated on the buses, causing an increased attention on savings for power at the behavioral level and the RT level of design. This paper addresses the problem of minimizing power dissipated in the switching of the buses in the high-level synthesis of data-dominated behavioral descriptions. Unlike the previous approaches in which the minimization of the power consumed in buses has not been considered until operation scheduling is completed, our approach integrates the bus binding problem into scheduling to exploit the impact of scheduling on the reduction of power dissipated on the buses more fully and effectively. We accomplish this by formulating the problem into a flow problem in a network, and devising an efficient algorithm which iteratively finds the maximum flow of minimum cost solutions in the network. Experimental results on a number of benchmark problems show that given resource and global timing constraints our designs are 19.8% power-efficient over the designs produced by a random-move based solution, and 15.5% power-efficient over the designs by a clock-step based optimal solution.

Download Full-text

GA driven integrated exploration of loop unrolling factor and datapath for optimal scheduling of CDFGs during high level synthesis

2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE) ◽

10.1109/ccece.2015.7129163 ◽

2015 ◽

Cited By ~ 2

Author(s):

Pallabi Sarkar ◽

Anirban Sengupta ◽

Mrinal Kanti Naskar

Keyword(s):

Optimal Scheduling ◽

High Level Synthesis ◽

Loop Unrolling ◽

High Level

Download Full-text

A Novel Framework for Applying Multiobjective GA and PSO Based Approaches for Simultaneous Area, Delay, and Power Optimization in High Level Synthesis of Datapaths

VLSI Design ◽

10.1155/2012/273276 ◽

2012 ◽

Vol 2012 ◽

pp. 1-12 ◽

Cited By ~ 15

Author(s):

D. S. Harish Ram ◽

M. C. Bhuvaneswari ◽

Shanthi S. Prabhu

Keyword(s):

High Level Synthesis ◽

Synthesis Process ◽

Weighted Sum ◽

Early Assessment ◽

Nsga Ii ◽

Multi Objective ◽

Trade Offs ◽

Evolutionary Technique ◽

High Level ◽

The Impact

High-Level Synthesis deals with the translation of algorithmic descriptions into an RTL implementation. It is highly multi-objective in nature, necessitating trade-offs between mutually conflicting objectives such as area, power and delay. Thus design space exploration is integral to the High Level Synthesis process for early assessment of the impact of these trade-offs. We propose a methodology for multi-objective optimization of Area, Power and Delay during High Level Synthesis of data paths from Data Flow Graphs (DFGs). The technique performs scheduling and allocation of functional units and registers concurrently. A novel metric based technique is incorporated into the algorithm to estimate the likelihood of a schedule to yield low-power solutions. A true multi-objective evolutionary technique, “Nondominated Sorting Genetic Algorithm II” (NSGA II) is used in this work. Results on standard DFG benchmarks indicate that the NSGA II based approach is much faster than a weighted sum GA approach. It also yields superior solutions in terms of diversity and closeness to the true Pareto front. In addition a framework for applying another evolutionary technique: Weighted Sum Particle Swarm Optimization (WSPSO) is also reported. It is observed that compared to WSGA, WSPSO shows considerable improvement in execution time with comparable solution quality.

Download Full-text

High Level Synthesis Optimizations of Road Lane Detection Development on Zynq-7000

Pertanika Journal of Science and Technology ◽

10.47836/pjst.29.2.01 ◽

2021 ◽

Vol 29 (2) ◽

Author(s):

Panadda Solod ◽

Nattha Jindapetch ◽

Kiattisak Sengchuai ◽

Apidet Booranawong ◽

Pakpoom Hoyingcharoen ◽

...

Keyword(s):

Low Cost ◽

Optimization Techniques ◽

Lane Detection ◽

High Level Synthesis ◽

Resource Usage ◽

Clock Frequency ◽

Loop Analysis ◽

Loop Unrolling ◽

Loop Pipelining ◽

High Level

In this work, we proposed High-Level Synthesis (HLS) optimization processes to improve the speed and the resource usage of complex algorithms, especially nested-loop. The proposed HLS optimization processes are divided into four steps: array sizing is performed to decrease the resource usage on Programmable Logic (PL) part, loop analysis is performed to determine which loop must be loop unrolling or loop pipelining, array partitioning is performed to resolve the bottleneck of loop unrolling and loop pipelining, and HLS interface is performed to select the best block level and port level interface for array argument of RTL design. A case study road lane detection was analyzed and applied with suitable optimization techniques to implement on the Xilinx Zynq-7000 family (Zybo ZC7010-1) which was a low-cost FPGA. From the experimental results, our proposed method reaches 6.66 times faster than the primitive method at clock frequency 100 MHz or about 6 FPS. Although the proposed methods cannot reach the standard real-time (25 FPS), they can instruct HLS developers for speed increasing and resource decreasing on an FPGA.

Download Full-text