High-Level Synthesis of In-Circuit Assertions for Verification, Debugging, and Timing Analysis

Despite significant performance and power advantages compared to microprocessors, widespread usage of FPGAs has been limited by increased design complexity. High-level synthesis (HLS) tools have reduced design complexity but provide limited support for verification, debugging, and timing analysis. Such tools generally rely on inaccurate software simulation or lengthy register-transfer-level simulations, which are unattractive to software developers. In this paper, we introduce HLS techniques that allow application designers to efficiently synthesize commonly used ANSI-C assertions into FPGA circuits, enabling verification and debugging of circuits generated from HLS tools, while executing in the actual FPGA environment. To verify that HLS-generated circuits meet execution timing constraints, we extend the in-circuit assertion support for testing of elapsed time for arbitrary regions of code. Furthermore, we generalize timing assertions to transparently provide hang detection that back annotates hang occurrences to source code. The presented techniques enable software developers to rapidly verify, debug, and analyze timing for FPGA applications, while reducing frequency by less than 3% and increasing FPGA resource utilization by 0.7% or less for several application case studies on the Altera Stratix-II EP2S180 and Stratix-III EP3SE260 using Impulse-C. The presented techniques reduced area overhead by as much as 3x and improved assertion performance by as much as 100% compared to unoptimized in-circuit assertions.

Download Full-text

Bus Optimization for Low Power in High-Level Synthesis

Journal of Circuits System and Computers ◽

10.1142/s0218126603000829 ◽

2003 ◽

Vol 12 (01) ◽

pp. 1-17

Author(s):

Sungpack Hong ◽

Taewhan Kim

Keyword(s):

Optimal Solution ◽

Minimum Cost ◽

Maximum Flow ◽

High Level Synthesis ◽

Benchmark Problems ◽

Timing Constraints ◽

Power Efficient ◽

High Level ◽

The Impact ◽

Operation Scheduling

Sub-micron feature sizes have resulted in a considerable portion of power to be dissipated on the buses, causing an increased attention on savings for power at the behavioral level and the RT level of design. This paper addresses the problem of minimizing power dissipated in the switching of the buses in the high-level synthesis of data-dominated behavioral descriptions. Unlike the previous approaches in which the minimization of the power consumed in buses has not been considered until operation scheduling is completed, our approach integrates the bus binding problem into scheduling to exploit the impact of scheduling on the reduction of power dissipated on the buses more fully and effectively. We accomplish this by formulating the problem into a flow problem in a network, and devising an efficient algorithm which iteratively finds the maximum flow of minimum cost solutions in the network. Experimental results on a number of benchmark problems show that given resource and global timing constraints our designs are 19.8% power-efficient over the designs produced by a random-move based solution, and 15.5% power-efficient over the designs by a clock-step based optimal solution.

Download Full-text

IP-Enabled C/C++ Based High Level Synthesis: A Step towards Better Designer Productivity and Design Performance

International Journal of Reconfigurable Computing ◽

10.1155/2014/418750 ◽

2014 ◽

Vol 2014 ◽

pp. 1-17

Author(s):

Sharad Sinha ◽

Thambipillai Srikanthan

Keyword(s):

Design Methodology ◽

Arithmetic Functions ◽

High Level Synthesis ◽

Mathematical Functions ◽

Chip Design ◽

Design Complexity ◽

Application Characteristics ◽

Ip Cores ◽

Program Recognition ◽

High Level

Intellectual property (IP) core based design is an emerging design methodology to deal with increasing chip design complexity. C/C++ based high level synthesis (HLS) is also gaining traction as a design methodology to deal with increasing design complexity. In the work presented here, we present a design methodology that combines these two individual methodologies and is therefore more powerful. We discuss our proposed methodology in the context of supporting efficient hardware synthesis of a class of mathematical functions without altering original C/C++ source code. Additionally, we also discuss and propose methods to integrate legacy IP cores in existing HLS flows. Relying on concepts from the domains of program recognition and optimized low level implementations of such arithmetic functions, the described design methodology is a step towards intelligent synthesis where application characteristics are matched with specific architectural resources and relevant IP cores in a transparent manner for improved area-delay results. The combined methodology is more aware of the target hardware architecture than the conventional HLS flow. Implementation results of certain compute kernels from a commercial tool Vivado-HLS as well as proposed flow are also compared to show that proposed flow gives better results.

Download Full-text

FPGA Memory Optimization in High-Level Synthesis

Advances in Systems Analysis, Software Engineering, and High Performance Computing - FPGA Algorithms and Applications for the Internet of Things ◽

10.4018/978-1-5225-9806-0.ch003 ◽

2020 ◽

pp. 51-81

Author(s):

Mingjie Lin ◽

Juan Escobedo

Keyword(s):

Data Reuse ◽

High Level Synthesis ◽

Reconfigurable Architectures ◽

Relative Distance ◽

Memory Optimization ◽

Performance Improvements ◽

Memory Subsystem ◽

Significant Performance ◽

Memory Accesses ◽

High Level

High-level synthesis (HLS) with FPGA can achieve significant performance improvements through effective memory partitioning and meticulous data reuse. In this chapter, the authors will first explore techniques that have been adopted directly from systems that possess a fixed memory subsystem such as CPUs and GPUs (Section 2). Section 3 will focus on techniques that have been developed specifically for reconfigurable architectures which generate custom memory subsystems to take advantage of the peculiarities of a family of affine code called stencil code. The authors will focus on techniques that exploit memory banking to allow for parallel, conflict-free memory accesses in Section 3.1 and techniques that generate an optimal memory micro-architecture for data reuse in Section 3.2. Finally, Section 4 will explore the technique handling code still belonging to the affine family but the relative distance between the addresses.

Download Full-text