Implementation of FFT on General-Purpose Architectures for FPGA

This paper describes two general-purpose architectures targeted to Field Programmable Gate Array (FPGA) implementation. The first architecture is based on the coupling of a coarse-grain reconfigurable array with a general-purpose processor core. The second architecture is a homogeneous multi-processor system-on-chip (MP-SoC). Both architectures have been mapped onto two different Altera FPGA devices, a StratixII and a StratixIV. Although mapping onto the StratixIV results in higher operating frequencies, the capabilities of the device are not fully exploited. The implementation of a FFT on the two platforms shows a considerable speed-up in comparison with a single-processor reference architecture. The speed-up is higher in the reconfigurable solution but the MP-SoC provides an easier programming interface that is completely based on C language. The authors’ approach proves that implementing a programmable architecture on FPGA and then programming it using a high-level software language is a viable alternative to designing a dedicated hardware block with a hardware description language (HDL) and mapping it on FPGA.

Download Full-text

Implementation of FFT on General-Purpose Architectures for FPGA

Innovations in Embedded and Real-Time Systems Engineering for Communication ◽

10.4018/978-1-4666-0912-9.ch009 ◽

2012 ◽

pp. 156-175

Author(s):

Fabio Garzia ◽

Roberto Airoldi ◽

Jari Nurmi

Keyword(s):

General Purpose ◽

Reference Architecture ◽

Processor Core ◽

General Purpose Processor ◽

Programmable Architecture ◽

Reconfigurable Array ◽

Field Programmable ◽

Speed Up ◽

On Chip ◽

High Level

This paper describes two general-purpose architectures targeted to Field Programmable Gate Array (FPGA) implementation. The first architecture is based on the coupling of a coarse-grain reconfigurable array with a general-purpose processor core. The second architecture is a homogeneous multi-processor system-on-chip (MP-SoC). Both architectures have been mapped onto two different Altera FPGA devices, a StratixII and a StratixIV. Although mapping onto the StratixIV results in higher operating frequencies, the capabilities of the device are not fully exploited. The implementation of a FFT on the two platforms shows a considerable speed-up in comparison with a single-processor reference architecture. The speed-up is higher in the reconfigurable solution but the MP-SoC provides an easier programming interface that is completely based on C language. The authors’ approach proves that implementing a programmable architecture on FPGA and then programming it using a high-level software language is a viable alternative to designing a dedicated hardware block with a hardware description language (HDL) and mapping it on FPGA.

Download Full-text

HLS Based Approach to Develop an Implementable HDR Algorithm

Electronics ◽

10.3390/electronics7110332 ◽

2018 ◽

Vol 7 (11) ◽

pp. 332 ◽

Cited By ~ 1

Author(s):

Rappy Saha ◽

Partha Banik ◽

Ki-Doo Kim

Keyword(s):

Dynamic Range ◽

Signal To Noise Ratio ◽

Simple Algorithm ◽

Structural Similarity ◽

High Dynamic Range ◽

Field Programmable ◽

Hardware Description ◽

On Chip ◽

High Level ◽

Removal Technique

Hardware suitability of an algorithm can only be verified when the algorithm is actually implemented in the hardware. By hardware, we indicate system on chip (SoC) where both processor and field-programmable gate array (FPGA) are available. Our goal is to develop a simple algorithm that can be implemented on hardware where high-level synthesis (HLS) will reduce the tiresome work of manual hardware description language (HDL) optimization. We propose an algorithm to achieve high dynamic range (HDR) image from a single low dynamic range (LDR) image. We use highlight removal technique for this purpose. Our target is to develop parameter free simple algorithm that can be easily implemented on hardware. For this purpose, we use statistical information of the image. While software development is verified with state of the art, the HLS approach confirms that the proposed algorithm is implementable to hardware. The performance of the algorithm is measured using four no-reference metrics. According to the measurement of the structural similarity (SSIM) index metric and peak signal-to-noise ratio (PSNR), hardware simulated output is at least 98.87 percent and 39.90 dB similar to the software simulated output. Our approach is novel and effective in the development of hardware implementable HDR algorithm from a single LDR image using the HLS tool.

Download Full-text

FPGA–Based Efficient Hardware/Software Co–Design for Industrial Systems with Consideration of Output Selection

Journal of Electrical Engineering ◽

10.1515/jee-2016-0022 ◽

2016 ◽

Vol 67 (3) ◽

pp. 150-159 ◽

Cited By ~ 1

Author(s):

Kyriakos M. Deliparaschos ◽

Konstantinos Michail ◽

Argyrios C. Zolotas ◽

Spyros G. Tzafestas

Keyword(s):

System Modeling ◽

Robustness Analysis ◽

Sensor Selection ◽

Linear Quadratic ◽

Industrial Systems ◽

Field Programmable ◽

Speed Up ◽

Hardware Description ◽

Selection Framework ◽

High Level

Abstract This work presents a field programmable gate array (FPGA)-based embedded software platform coupled with a software-based plant, forming a hardware-in-the-loop (HIL) that is used to validate a systematic sensor selection framework. The systematic sensor selection framework combines multi-objective optimization, linear-quadratic-Gaussian (LQG)-type control, and the nonlinear model of a maglev suspension. A robustness analysis of the closed-loop is followed (prior to implementation) supporting the appropriateness of the solution under parametric variation. The analysis also shows that quantization is robust under different controller gains. While the LQG controller is implemented on an FPGA, the physical process is realized in a high-level system modeling environment. FPGA technology enables rapid evaluation of the algorithms and test designs under realistic scenarios avoiding heavy time penalty associated with hardware description language (HDL) simulators. The HIL technique facilitates significant speed-up in the required execution time when compared to its software-based counterpart model.

Download Full-text

Comparison of Different Design Alternatives for Hardware-in-the-Loop of Power Converters

Electronics ◽

10.3390/electronics10080926 ◽

2021 ◽

Vol 10 (8) ◽

pp. 926

Author(s):

Elyas Zamiri ◽

Alberto Sanchez ◽

Marina Yushkova ◽

Maria Sofia Martínez-García ◽

Angel de Castro

Keyword(s):

Ad Hoc ◽

Power Converters ◽

General Purpose ◽

Hardware In The Loop ◽

Gate Arrays ◽

Design Alternatives ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Hardware Description ◽

High Level

This paper aims to compare different design alternatives of hardware-in-the-loop (HIL) for emulating power converters in Field Programmable Gate Arrays (FPGAs). It proposes various numerical formats (fixed and floating-point) and different approaches (pure VHSIC Hardware Description Language (VHDL), Intellectual Properties (IPs), automated MATLAB HDL code, and High-Level Synthesis (HLS)) to design power converters. Although the proposed models are simple power electronics HIL systems, the idea can be extended to any HIL system. This study compares the design effort of different coding methods and numerical formats considering possible synthesis tools (Precision and Vivado), and it comprises an analytical discussion in terms of area and speed. The different models are synthesized as ad-hoc modules in general-purpose FPGAs, but also using the NI myRIO device as an example of a commercial tool capable of implementing HIL models. The comparison confirms that the optimum design alternative must be chosen based on the application (complexity, frequency, etc.) and designers’ constraints, such as available area, coding expertise, and design effort.

Download Full-text

SoC-FPGA systems for the acquisition and processing of electroencephalographic signals

International Journal of Reconfigurable and Embedded Systems (IJRES) ◽

10.11591/ijres.v10.i3.pp237-248 ◽

2021 ◽

Vol 10 (3) ◽

pp. 237

Author(s):

Matias Javier Oliva ◽

Pablo Andrés García ◽

Enrique Mario Spinelli ◽

Alejandro Luis Veiga

Keyword(s):

Embedded System ◽

Real Time ◽

General Purpose ◽

System Response ◽

Single Chip ◽

Real Time Processing ◽

General Purpose Processor ◽

Time Operation ◽

Electroencephalographic Signals ◽

High Level

<span lang="EN-US">Real-time acquisition and processing of electroencephalographic signals have promising applications in the implementation of brain-computer interfaces. These devices allow the user to control a device without performing motor actions, and are usually made up of a biopotential acquisition stage and a personal computer (PC). This structure is very flexible and appropriate for research, but for final users it is necessary to migrate to an embedded system, eliminating the PC from the scheme. The strict real-time processing requirements of such systems justify the choice of a system on a chip field-programmable gate arrays (SoC-FPGA) for its implementation. This article proposes a platform for the acquisition and processing of electroencephalographic signals using this type of device, which combines the parallelism and speed capabilities of an FPGA with the simplicity of a general-purpose processor on a single chip. In this scheme, the FPGA is in charge of the real-time operation, acquiring and processing the signals, while the processor solves the high-level tasks, with the interconnection between processing elements solved by buses integrated into the chip. The proposed scheme was used to implement a brain-computer interface based on steady-state visual evoked potentials, which was used to command a speller. The first tests of the system show that a selection time of 5 seconds per command can be achieved. The time delay between the user’s selection and the system response has been estimated at 343 µs.</span>

Download Full-text

An Empirical Investigation on System and Statement Level Parallelism Strategies for Accelerating Scatter Search Using Handel-C and Impulse-C

VLSI Design ◽

10.1155/2012/793196 ◽

2012 ◽

Vol 2012 ◽

pp. 1-11

Author(s):

M. Walton ◽

O. Ahmed ◽

G. Grewal ◽

S. Areibi

Keyword(s):

Optimization Problems ◽

Scatter Search ◽

Population Based ◽

Field Programmable ◽

Speed Up ◽

Time Required ◽

High Level ◽

Level Parallelism ◽

Code Optimizations ◽

Established Population

Scatter Search is an effective and established population-based metaheuristic that has been used to solve a variety of hard optimization problems. However, the time required to find high-quality solutions can become prohibitive as problem sizes grow. In this paper, we present a hardware implementation of Scatter Search on a field-programmable gate array (FPGA). Our objective is to improve the run time of Scatter Search by exploiting the potentially massive performance benefits that are available through the native parallelism in hardware. When implementing Scatter Search we employ two different high-level languages (HLLs): Handel-C and Impulse-C. Our empirical results show that by effectively exploiting source-code optimizations, data parallelism, and pipelining, a 28x speed up over software can be achieved.

Download Full-text

An Auto-Programming Approach to Vulkan

10.20948/graphicon-2021-3027-150-165 ◽

2021 ◽

Author(s):

Vladimir Alexandrovich Frolov ◽

Vadim Sanzharov ◽

Vladimir Alexandrovich Galaktionov ◽

Alexandr Scherbakov

Keyword(s):

Performance Studies ◽

General Purpose ◽

Software Implementation ◽

Programming Approach ◽

Fine Grained ◽

Speed Up ◽

Cross Platform ◽

Increase Productivity ◽

And Performance ◽

High Level

We propose a novel high-level approach for software development on GPU using Vulkan API. Our goal is to speed-up development and performance studies for complex algorithms on GPU, which is quite difficult and laborious for Vulkan due to large number of HW features low level details. The proposed approach uses auto programming to translate ordinary C++ to optimized Vulkan implementation with automatic shaders generation, resource binding and fine-grained barriers placement. Our model is not general-purpose programming, but is extendible and customer-focused. For a single C++ input our tool can generate multiple different implementations of algorithm in Vulkan for different cases or types of hardware. For example, we automatically detect reduction in C++ source code and then generate several variants of parallel reduction on GPU: with optimization for different warp size, with or without atomics, using or not subgroup operations. Another example is GPU ray tracing applications for which we can generate different variants: pure software implementation in compute shader, using hardware accelerated ray queries, using full RTX pipeline. The goal of our work is to increase productivity of developers who are forced to use Vulkan due to various required hardware features in their software but still do care about cross-platform ability of the developed software and want to debug their algorithm logic on the CPU. Therefore, we assume that the user will take generated code and integrate it with hand-written Vulkan code.

Download Full-text

Ευφυή ενσωματωμένα συστήματα

10.12681/eadd/17758 ◽

2009 ◽

Author(s):

Αλέξανδρος Δημόπουλος

Keyword(s):

Real Time ◽

General Purpose ◽

Hardware Description Languages ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Hardware Description ◽

Description Languages ◽

Hard Real Time ◽

Dedicated Hardware

Τα τελευταία χρόνια, λόγω της αυξημένης ζήτησης σε υπολογιστική ισχύ σε συνδυασμό με την απαίτηση για μεταφερσιμότητα (portability) παρατηρείται μια μεγάλη αύξηση του πεδίου που βρίσκουν εφαρμογή τα Ενσωματωμένα Συστήματα. Ως Ενσωματωμένο Σύστημα (ΕΣ) μπορεί να οριστεί ένα σύστημα - περιορισμένου φυσικού μεγέθους - ειδικού σκοπού το οποίο επιτελεί μία συγκεκριμένη και προκαθορισμένη εργασία. Μάλιστα, αναλόγως με τους χρονικούς περιορισμούς της εφαρμογής χωρίζονται σε δύο μεγάλες κατηγορίες, σε αυτά που πρέπει να αντεπεξέλθουν αυστηρά σε εφαρμογές πραγματικού χρόνου (hard real time embedded) και σε εκείνα τα οποία είναι πιο χαλαρά σε σχέση με τους χρονικούς περιορισμούς (soft real time embedded). Για να εκτελέσει την εργασία του, το ΕΣ είναι συνήθως εφοδιασμένο με έναν μικροελεγκτή ή ακόμη και μικροεπεξεργαστή και έχει όσο το δυνατόν μικρότερες απαιτήσεις ενέργειας και σχετικά χαμηλό κόστος. Συνεπώς, αντί για την επιλογή ενός γενικής χρήσης συστήματος, προτιμάται για συγκεκριμένες εφαρμογές, η χρήση ενός συστήματος που μπορεί να εκτελέσει αποκλειστικά και μόνο μια συγκεκριμένη εργασία αλλά είναι μικρό σε μέγεθος, έχει χαμηλή κατανάλωση ισχύος, αυξημένες χρονικές απαιτήσεις και το κόστος του είναι χαμηλό. Με την ολοένα αυξανόμενη χρήση των ΕΣ, υπάρχει μια μετατόπιση των εφαρμογών από το πεδίο του καθαρού λογισμικού που εκτελείται σε υλικό γενικής χρήσης (pure software on general purpose hardware) σε αυτό της συνύπαρξης εξειδικευμένου υλικού/λογισμικού (dedicated hardware). Στη μετατόπιση αυτή έχει βοηθήσει σημαντικά και η εξέλιξη του προγραμματιζομένου υλικού και πιο συγκεκριμένα των FPGA (Field Programmable Gate Arrays), που αποτελούν μονάδες υλικού που επιτρέπουν τον εσωτερικό προγραμματισμό του υλικού τους, ώστε να επιτελείται μια συγκεκριμένη εργασία. Έτσι, ο προγραμματισμός περνάει σε ένα νέο επίπεδο, στο οποίο ο προγραμματιστής πια δεν καλείται να γράψει κώδικα που θα εκτελεστεί σε ένα συγκεκριμένο επεξεργαστή αλλά ο κώδικας περιγράφει το ίδιο το εξειδικευμένο υλικό. Η περιγραφή αυτή γίνεται με ειδικές γλώσσες περιγραφής υλικού (Hardware Description Languages - HDL), με κυριότερες τις Verilog και VHDL. Η συγκεκριμένη διατριβή έχει ως αντικείμενο την υλοποίηση σε υλικό (FPGA) κατάλληλων αλγορίθμων που προσθέτουν "ευφυΐα" σε ένα ΕΣ, μέσω της αναγνώρισης προτάσεων που ανήκουν σε γραμματικές χωρίς συμφραζόμενα καθώς και την αναγνώριση και υπολογισμό των αντίστοιχων κατηγορημάτων για προτάσεις που ανήκουν σε κατηγορικές γραμματικές. Ως γραμματική ορίζεται ένα σύστημα από κανόνες παραγωγής συμβολοσειρών. Οι κατηγορίες των γραμματικών, όπως έχουν οριστεί από τον Chomsky, ποικίλλουν αλλά στη συγκεκριμένη εργασία θα χρησιμοποιηθούν οι γραμματικές χωρίς συμφραζόμενα και μια επέκταση αυτών, οι κατηγορικές γραμματικές. Για τη συντακτική αναγνώριση μιας πρότασης, υπάρχουν κατάλληλοι αλγόριθμοι που ελέγχουν εάν η πρόταση ανήκει σε μια δοσμένη γραμματική ή όχι. Οι αλγόριθμοι αυτοί, που απλώς απαντούν δυαδικά με ένα ναι ή ένα όχι για κάθε πρόταση, ονομάζονται αναγνωριστές (recognizer). Στην περίπτωση που κατά τη διάρκεια της αναγνώρισης, παράγεται και το συντακτικό δέντρο αναγνώρισης (parse tree) για τη συγκεκριμένη πρόταση, τότε ο αλγόριθμος ονομάζεται συντακτικός αναλυτής (parser). Επιπλέον, προτείνεται ένα σύστημα για την αυτόματη παραγωγή ευφυών ΕΣ. Αντί να περιγράφεται το ΕΣ με τις κλασικές γλώσσες Verilog και VHDL περιγράφεται σε υψηλότερο δηλωτικό επίπεδο με τη χρήση του συμβολισμού των κατηγορικών γραμματικών. Η υλοποίηση των κατηγορικών γραμματικών βασίζεται στην παράλληλη υλοποίηση του αλγορίθμου του Earley και αποτελεί επέκταση του τελευταίου, επιτρέποντας την υλοποίησή του με μικρότερες απαιτήσεις χώρου αλλά και χρονικές απαιτήσεις. Επιπλέον, έχει επεκταθεί κατάλληλα ο αλγόριθμος προκειμένου να μπορεί να χειρισθεί και κατηγορικές γραμματικές και να υπολογίζει τα αντίστοιχα κατηγορήματα, που καθορίζουν τη σημασιολογία της γραμματικής. Για τον υπολογισμό των κατηγορημάτων, είναι απαραίτητη η ύπαρξη κάποιας μονάδας που να εκτελεί τις αναγκαίες πράξεις. Κατά τη διάρκεια της συγκεκριμένης εργασίας, έγινε χρήση διαφορετικών αρχιτεκτονικών για τη συντακτική ανάλυση και τον υπολογισμό των κατηγορημάτων. Συγκεκριμένα, για τον υπολογισμό των κατηγορημάτων - και ανάλογα με τις απαιτήσεις σε υπολογιστική ισχύ και μέγεθος - επιλέγεται ενίοτε η χρήση μικροελεγκτή, η χρήση εξωτερικού μικροεπεξεργαστή γενικής χρήσης, η χρήση εσωτερικού μικροεπεξεργαστή γενικής χρήσης και τελικά η χρήση επεξεργαστή εξειδικευμένης χρήσης ειδικά σχεδιασμένου για τις ανάγκες κάθε εφαρμογής.

Download Full-text

PERFORMANCE ANALYSIS OF A JPEG ENCODER MAPPED ONTO A VIRTUAL MPSoC-NoC ARCHITECTURE USING TLM 2.0.1

Journal of Circuits System and Computers ◽

10.1142/s0218126613500369 ◽

2013 ◽

Vol 22 (05) ◽

pp. 1350036

Author(s):

F. A. ESCOBAR-JUZGA ◽

F. E. SEGURA-QUIJANO

Keyword(s):

Message Passing ◽

Programming Model ◽

Task Graph ◽

Modeling Tools ◽

Networks On Chip ◽

Hardware Description ◽

On Chip ◽

Network Speed ◽

High Level ◽

The Impact

Networks on Chip (NoCs) are commonly used to integrate complex embedded systems and multiprocessor platforms due to their scalability and versatility. Modeling tools used at the functional level use SystemC to perform hardware–software co-design and error correction concurrently, thus, reducing time to market. This work analyzes a JPEG encoding algorithm mapped onto a configurable M × N, mesh/torus, NoC platform described in SystemC with the transaction level modeling (TLM) standard; timing constraints for both, the router and network interface controller, are assigned according to a hardware description language (HDL) model written for this purpose. Processing nodes are also described as SystemC threads and their computation delays are assigned depending on the amount and cost of the operations they perform. The programming model employed is message passing. We start by describing and profiling the JPEG algorithm as a task graph; then, four partitioning proposals are mapped onto three NoCs of different size. Our analysis comprises changes in topology, virtual channel depth, routing algorithms, network speed and task-node assignments. Through several high-level simulations we evaluate the impact of each parameter and we show that, for the proposed model, most improvements come from the algorithm partitioning.

Download Full-text