AN EVOLUTIONARY ALGORITHM FOR THE ALLOCATION PROBLEM IN HIGH-LEVEL SYNTHESIS

2005 ◽  
Vol 14 (02) ◽  
pp. 347-366 ◽  
Author(s):  
HAIDAR M. HARMANANI ◽  
RONY SALIBA

This paper presents an evolutionary algorithm to solve the datapath allocation problem in high-level synthesis. The method performs allocation of functional units, registers, and multiplexers in addition to controller synthesis with the objective of minimizing the cost of hardware resources. The system handles multicycle functional units as well as structural pipelining. The proposed method was implemented using C++ on a Linux workstation. We tested our method on a set of high-level synthesis benchmarks, all yielding good solutions in a short time. An integration path to Field Programmable Gate Arrays (FPGAs) is provided through VHDL.




2017 ◽  
Vol 2017 ◽  
pp. 1-17 ◽  
Author(s):  
David Wilson ◽  
Aniruddha Shastri ◽  
Greg Stitt

Computing systems with field-programmable gate arrays (FPGAs) often achieve fault tolerance in high-energy radiation environments via triple-modular redundancy (TMR) and configuration scrubbing. Although effective, TMR suffers from a 3x area overhead, which can be prohibitive for many embedded usage scenarios. Furthermore, this overhead is often worsened because TMR often has to be applied to existing register-transfer-level (RTL) code that designers created without considering the triplicated resource requirements. Although a designer could redesign the RTL code to reduce resources, modifying RTL schedules and resource allocations is a time-consuming and error-prone process. In this paper, we present a more transparent high-level synthesis approach that uses scheduling and binding to provide attractive tradeoffs between area, performance, and redundancy, while focusing on FPGA implementation considerations, such as resource realization costs, to produce more efficient architectures. Compared to TMR applied to existing RTL, our approach shows resource savings up to 80% with average resource savings of 34% and an average clock degradation of 6%. Compared to the previous approach, our approach shows resource savings up to 74% with average resource savings of 19% and an average heuristic execution time improvement of 96x.



2016 ◽  
Vol 25 (04) ◽  
pp. 1650029 ◽  
Author(s):  
Adam Ziebinski ◽  
Stanwlaw Swierc

Currently embedded system designs aim to improve areas such as speed, energy efficiency and the cost of an application. Application-specific instruction set extensions on reconfigurable hardware provide such opportunities. The article presents a new approach for generating soft core processors that are optimized for specific tasks. In this work, we describe an automatic method for selecting custom instructions for generating software core processors that are based on the machine code of the application program. As the result, a soft core processor will contain the logic that is absolutely necessary. This solution requires fewer gates to be synthesized in the field programmable gate arrays (FPGA) and has a potential to increase the speed of the information processing that is performed by the system in the target FPGA. Experiments have confirmed the correct operation of the method that was used. After the reduction mechanism was enabled, the total number of slices blocks that were occupied decreased to 47% of its initial value in the best case for the Xilinx Spartan3 (xc3s200) and the maximum frequency increased approximately 44% in the best case for Xilinx Spartan6 (xc6slx4).



2021 ◽  
Author(s):  
Rishit Dagli ◽  
Süleyman Eken

Abstract Recent increases in computational power and the development of specialized architecture led to the possibility to perform machine learning, especially inference, on the edge. OpenVINO is a toolkit based on Convolutional Neural Networks that facilitates fast-track development of computer vision algorithms and deep learning neural networks into vision applications, and enables their easy heterogeneous execution across hardware platforms. A smart queue management can be the key to the success of any sector.} In this paper, we focus on edge deployments to make the Smart Queuing System (SQS) accessible by all also providing ability to run it on cheap devices. This gives it the ability to run the queuing system deep learning algorithms on pre-existing computers which a retail store, public transportation facility or a factory may already possess thus considerably reducing the cost of deployment of such a system. SQS demonstrates how to create a video AI solution on the edge. We validate our results by testing it on multiple edge devices namely CPU, Integrated Edge Graphic Processing Unit (iGPU), Vision Processing Unit (VPU) and Field Programmable Gate Arrays (FPGAs). Experimental results show that deploying a SQS on edge is very promising.



2008 ◽  
Vol 2008 ◽  
pp. 1-14
Author(s):  
Johan Ditmar ◽  
Steve McKeever ◽  
Alex Wilson

This paper discusses a pair of synthesis algorithms that optimise a SystemC design to minimise area when targeting FPGAs. Each can significantly improve the synthesis of a high-level language construct, thus allowing a designer to concentrate more on an algorithm description and less on hardware-specific implementation details. The first algorithm is a source-level transformation implementing function exlining—where a separate block of hardware implements a function and is shared between multiple calls to the function. The second is a novel algorithm for mapping arrays to memories which involves assigning array accesses to memory ports such that no port is ever accessed more than once in a clock cycle. This algorithm assigns accesses to read/write only ports and read-write ports concurrently, solving the assignment problem more efficiently for a wider range of memories compared to existing methods. Both optimisations operate on a high-level program representation and have been implemented in a commercial SystemC compiler. Experiments show that in suitable circumstances these techniques result in significant reductions in logic utilisation for FPGAs.



Author(s):  
B. Naresh Kumar Reddy ◽  
N. Suresh ◽  
J.V.N. Ramesh

<p>Programming of Field Programmable Gate Arrays (FPGAs) have long been the domain of engineers with VHDL or Verilog expertise. FPGA’s have caught the attention of algorithm developers and communication researchers, who want to use FPGAs to instantiate systems or implement DSP algorithms. These efforts however, are often stifled by the complexities of programming FPGAs. RTL programming in either VHDL or Verilog is generally not a high level of abstraction needed to represent the world of signal flow graphs and complex signal processing algorithms. This paper describes the FPGA Programs using Graphical Language rather than Verilog, VHDL with the help of LabVIEW and features of the LabVIEW FPGA environment.</p>



2019 ◽  
Vol 08 (03) ◽  
pp. 1950008 ◽  
Author(s):  
Haomiao Wang ◽  
Prabu Thiagaraj ◽  
Oliver Sinnen

Field-Programmable Gate Arrays (FPGAs) are widely used in the central signal processing design of the Square Kilometer Array (SKA) as hardware accelerators. The frequency domain acceleration search (FDAS) module is an important part of the SKA1-MID pulsar search engine. To develop for a yet to be finalized hardware, for cross-discipline interoperability and to achieve fast prototyping, OpenCL as a high-level FPGA synthesis approaches employed to create the sub-modules of FDAS. The FT convolution and the harmonic-summing plus some other minor sub-modules are elements in the FDAS module that have been well-optimized separately before. In this paper, we explore the design space of combining well-optimized designs, dealing with the ensuing need to trade-off and compromise. Pipeline computing is employed to handle multiple input arrays at high speed. The hardware target is to employ multiple high-end FPGAs to process the combined FDAS module. The results show interesting consequences, where the best individual solutions are not necessarily the best solutions for the speed of a pipeline where FPGA resources and memory bandwidth need to be shared. By proposing multiple buffering techniques to the pipeline, the combined FDAS module can achieve up to 2[Formula: see text] speedup over implementations without pipeline computing. We perform an extensive experimental evaluation on multiple high-end FPGA cards hosted in a workstation and compare to a technology comparable mid-range GPU.



Author(s):  
Naresh Kumar Reddy ◽  
N. Suresh

Programming of Field Programmable Gate Arrays (FPGAs) have long been the domain of engineers with VHDL or Verilog expertise.FPGA’s have caught the attention of algorithm developers and communication researchers, who want to use FPGAs to instantiate systems or implement DSP algorithms. These efforts however, are often stifled by the complexities of programming FPGAs. RTL programming in either VHDL or Verilog is generally not a high level of abstraction needed to represent the world of signal flow graphs and complex signal processing algorithms. This paper describes the FPGA Programs using Graphical Language rather than Verilog, VHDL with the help of LabVIEW and features of the LabVIEW FPGA environment.



1997 ◽  
Vol 07 (06) ◽  
pp. 517-535 ◽  
Author(s):  
J. H. Satyanarayana ◽  
B. Nowrouzian

This paper is concerned with the exploitation of genetic algorithms and their application to the development of a new optimization technique for the high-level synthesis of digit-serial digital filter data-paths. In the resulting optimization technique, the cost associated with the final digital filter data-path is minimized subject to user-specified constraints on the number of physical arithmetic functional units employed. The proposed technique is capable of obtaining global area-optimal, time-optimal, or combined area-cum-time-optimal data-paths, where the optimality takes into account not only the cost associated with the required arithmetic functional units but also that associated with the required support cells (multiplexors and registers). This optimization is made computationally effective by encoding the digital filter data flow-graph into chromosomes which preserve the data-dependency relationships in the original digital filter signal flow-graph under the operations of crossover and mutation by the underlying genetic algorithm. The usefulness of the proposed technique is demonstrated by applying to the constrained optimization of a benchmark elliptic wave digital filter for full bit-serial, full bit-parallel, as well as general digit-serial high-level synthesis. The results thus obtained are compared to those of the existing techniques (whenever appropriate) to confirm the validity of the technique.



Electronics ◽  
2021 ◽  
Vol 10 (8) ◽  
pp. 926
Author(s):  
Elyas Zamiri ◽  
Alberto Sanchez ◽  
Marina Yushkova ◽  
Maria Sofia Martínez-García ◽  
Angel de Castro

This paper aims to compare different design alternatives of hardware-in-the-loop (HIL) for emulating power converters in Field Programmable Gate Arrays (FPGAs). It proposes various numerical formats (fixed and floating-point) and different approaches (pure VHSIC Hardware Description Language (VHDL), Intellectual Properties (IPs), automated MATLAB HDL code, and High-Level Synthesis (HLS)) to design power converters. Although the proposed models are simple power electronics HIL systems, the idea can be extended to any HIL system. This study compares the design effort of different coding methods and numerical formats considering possible synthesis tools (Precision and Vivado), and it comprises an analytical discussion in terms of area and speed. The different models are synthesized as ad-hoc modules in general-purpose FPGAs, but also using the NI myRIO device as an example of a commercial tool capable of implementing HIL models. The comparison confirms that the optimum design alternative must be chosen based on the application (complexity, frequency, etc.) and designers’ constraints, such as available area, coding expertise, and design effort.



Sign in / Sign up

Export Citation Format

Share Document