A Heterogeneous Hardware Accelerator for Image Classification in Embedded Systems

Convolutional neural networks (CNN) have been extensively employed for image classification due to their high accuracy. However, inference is a computationally-intensive process that often requires hardware acceleration to operate in real time. For mobile devices, the power consumption of graphics processors (GPUs) is frequently prohibitive, and field-programmable gate arrays (FPGA) become a solution to perform inference at high speed. Although previous works have implemented CNN inference on FPGAs, their high utilization of on-chip memory and arithmetic resources complicate their application on resource-constrained edge devices. In this paper, we present a scalable, low power, low resource-utilization accelerator architecture for inference on the MobileNet V2 CNN. The architecture uses a heterogeneous system with an embedded processor as the main controller, external memory to store network data, and dedicated hardware implemented on reconfigurable logic with a scalable number of processing elements (PE). Implemented on a XCZU7EV FPGA running at 200 MHz and using four PEs, the accelerator infers with 87% top-5 accuracy and processes an image of 224×224 pixels in 220 ms. It consumes 7.35 W of power and uses less than 30% of the logic and arithmetic resources used by other MobileNet FPGA accelerators.

Download Full-text

An Efficient FPGA-Based Convolutional Neural Network for Classification: Ad-MobileNet

Electronics ◽

10.3390/electronics10182272 ◽

2021 ◽

Vol 10 (18) ◽

pp. 2272

Author(s):

Safa Bouguezzi ◽

Hana Ben Fredj ◽

Tarek Belabed ◽

Carlos Valderrama ◽

Hassene Faiedh ◽

...

Keyword(s):

Recognition Rate ◽

Hardware Acceleration ◽

Implementation Model ◽

Gate Arrays ◽

Proposed Model ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Computer Vision Applications ◽

On Chip ◽

Segmentation Image

Convolutional Neural Networks (CNN) continue to dominate research in the area of hardware acceleration using Field Programmable Gate Arrays (FPGA), proving its effectiveness in a variety of computer vision applications such as object segmentation, image classification, face detection, and traffic signs recognition, among others. However, there are numerous constraints for deploying CNNs on FPGA, including limited on-chip memory, CNN size, and configuration parameters. This paper introduces Ad-MobileNet, an advanced CNN model inspired by the baseline MobileNet model. The proposed model uses an Ad-depth engine, which is an improved version of the depth-wise separable convolution unit. Moreover, we propose an FPGA-based implementation model that supports the Mish, TanhExp, and ReLU activation functions. The experimental results using the CIFAR-10 dataset show that our Ad-MobileNet has a classification accuracy of 88.76% while requiring little computational hardware resources. Compared to state-of-the-art methods, our proposed method has a fairly high recognition rate while using fewer computational hardware resources. Indeed, the proposed model helps to reduce hardware resources by more than 41% compared to that of the baseline model.

Download Full-text

Comparative analysis of soft and hard on-chip interconnects for field-programmable gate arrays

IET Computers & Digital Techniques ◽

10.1049/iet-cdt.2011.0169 ◽

2012 ◽

Vol 6 (6) ◽

pp. 396-405 ◽

Cited By ~ 2

Author(s):

J.Y. Hur ◽

M.A. Wahlah ◽

L. Mhamdi ◽

K. Goossens

Keyword(s):

Comparative Analysis ◽

Field Programmable Gate Arrays ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

On Chip

Download Full-text

Low-Complexity Nonlinear Self-Inverse Permutation for Creating Physically Clone-Resistant Identities

Cryptography ◽

10.3390/cryptography4010006 ◽

2020 ◽

Vol 4 (1) ◽

pp. 6 ◽

Cited By ~ 1

Author(s):

Saleh Mulhem ◽

Ayoub Mars ◽

Wael Adi

Keyword(s):

Field Programmable Gate Arrays ◽

Low Complexity ◽

System On Chip ◽

Large Classes ◽

Physical Unclonable Functions ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Security Levels ◽

On Chip

New large classes of permutations over ℤ 2 n based on T-Functions as Self-Inverting Permutation Functions (SIPFs) are presented. The presented classes exhibit negligible or low complexity when implemented in emerging FPGA technologies. The target use of such functions is in creating the so called Secret Unknown Ciphers (SUC) to serve as resilient Clone-Resistant structures in smart non-volatile Field Programmable Gate Arrays (FPGA) devices. SUCs concepts were proposed a decade ago as digital consistent alternatives to the conventional analog inconsistent Physical Unclonable Functions PUFs. The proposed permutation classes are designed and optimized particularly to use non-consumed Mathblock cores in programmable System-on-Chip (SoC) FPGA devices. Hardware and software complexities for realizing such structures are optimized and evaluated for a sample expected target FPGA technology. The attained security levels of the resulting SUCs are evaluated and shown to be scalable and usable even for post-quantum crypto systems.

Download Full-text

Efficient Use of On-Chip Memories and Scheduling Techniques to Eliminate the Reconfiguration Overheads in Reconfigurable Systems

Journal of Circuits System and Computers ◽

10.1142/s0218126619502463 ◽

2019 ◽

Vol 28 (14) ◽

pp. 1950246

Author(s):

I. Hariharan ◽

M. Kannan

Keyword(s):

System Performance ◽

Replacement Policy ◽

Reconfigurable Systems ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Configuration Data ◽

On Chip ◽

Time And Energy ◽

Static Systems

Modern embedded systems are packed with dedicated Field Programmable Gate Arrays (FPGAs) to accelerate the overall system performance. However, the FPGAs are susceptible to reconfiguration overheads. The reconfiguration overheads are mainly because of the configuration data being fetched from the off-chip memory at run-time and also due to the improper management of tasks during execution. To reduce these overheads, our proposed methodology mainly focuses on the prefetch heuristic, reuse technique, and the available memory hierarchy to provide an efficient mapping of tasks over the available memories. Our paper includes a new replacement policy which reduces the overall time and energy reconfiguration overheads for static systems in their subsequent iterations. It is evident from the result that most of the reconfiguration overheads are eliminated when the applications are managed and executed based on our methodology.

Download Full-text

New Mathblocks-Based Feistel-Like Ciphers for Creating Clone-Resistant FPGA Devices

Cryptography ◽

10.3390/cryptography3040028 ◽

2019 ◽

Vol 3 (4) ◽

pp. 28 ◽

Cited By ~ 2

Author(s):

Saleh Mulhem ◽

Wael Adi

Keyword(s):

Hard Core ◽

Major Elements ◽

Physical Unclonable Functions ◽

Mapping Functions ◽

The Public ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

On Chip ◽

Near Future

The Secret Unknown Cipher (SUC) concept was introduced a decade ago as a promising technique for creating pure digital clone-resistant electronic units as alternatives to the traditional non-consistent Physical Unclonable Functions (PUFs). In this work, a very special unconventional cipher design is presented. The design uses hard-core FPGA (Field Programmable Gate Arrays) -Mathblocks available in modern system-on-chip (SoC) FPGAs. Such Mathblocks are often not completely used in many FPGA applications; therefore, it seems wise to make use of such dead (unused) modules to fabricate usable physical security functions for free. Standard cipher designs usually avoid deploying multipliers in the cipher mapping functions due to their high complexity. The main target of this work is to design large cipher classes (e.g., cipher class size >2600) by mainly deploying the FPGA specific mathematical cores. The proposed cipher designs are novel hardware-oriented and new in the public literature, using fully new unusual mapping functions. If a random unknown selection of one cipher out of 2600 ciphers is self-configured in a device, then a Secret Unknown Cipher module is created within a device, making it physically hard to clone. We consider the cipher module for free (for zero cost) if the major elements in the cipher module are making use of unused reanimated Mathblocks. Such ciphers are usable in many future mass products for protecting vehicular units against cloning and modeling attacks. The required self-reconfigurable devices for that concept are not available now; however, they are expected to emerge in the near future.

Download Full-text

BPR-TCAM—Block and Partial Reconfiguration based TCAM on Xilinx FPGAs

Electronics ◽

10.3390/electronics9020353 ◽

2020 ◽

Vol 9 (2) ◽

pp. 353 ◽

Cited By ~ 1

Author(s):

Anees Ullah ◽

Ali Zahir ◽

Noaman A. Khan ◽

Waleed Ahmad ◽

Alexis Ramos ◽

...

Keyword(s):

Resource Utilization ◽

High Speed ◽

State Of The Art ◽

Field Programmable Gate Arrays ◽

Partial Reconfiguration ◽

Gate Arrays ◽

Content Addressable Memories ◽

Field Programmable ◽

Programmable Gate Arrays

Field Programmable Gate Arrays (FPGAs) based Ternary Content Addressable Memories (TCAMs) are widely used in high-speed networking applications.However, TCAMs are not present on state-of-the-art FPGAs and need to be emulated on SRAM-based memories (i.e., LUTRAMs and Block RAMs) which requires a large amount of FPGA resources. In this paper, we present an efficient methodology to implement FPGA-based TCAMs with significant resource savings compared to existing schemes. The proposed methodology exploits the fracturable nature of Look Up Tables (LUTs) and the built-in slice carry-chains for simultaneous mapping of two rules and its matching logic to a single FPGA slice. Multiple slices can be stacked together to build deeper and wider TCAMs in a modular way. The combination of all these techniques results in significant savings in resource utilization compared to existing approaches.

Download Full-text

The Case for Embedded Networks on Chip on Field-Programmable Gate Arrays

IEEE Micro ◽

10.1109/mm.2013.131 ◽

2014 ◽

Vol 34 (1) ◽

pp. 80-89 ◽

Cited By ~ 20

Author(s):

Mohamed S. Abdelfattah ◽

Vaughn Betz

Keyword(s):

Field Programmable Gate Arrays ◽

Embedded Networks ◽

Networks On Chip ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

On Chip

Download Full-text

Nonvolatile Nanoelectromechanical Memory Switches for Low-Power and High-Speed Field-Programmable Gate Arrays

IEEE Transactions on Electron Devices ◽

10.1109/ted.2014.2380992 ◽

2015 ◽

Vol 62 (2) ◽

pp. 673-679 ◽

Cited By ~ 14

Author(s):

Yong Jun Kim ◽

Woo Young Choi

Keyword(s):

Low Power ◽

High Speed ◽

Field Programmable Gate Arrays ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text

Real-time audio signal processing using system-on-chip field programmable gate arrays

The Journal of the Acoustical Society of America ◽

10.1121/1.5136987 ◽

2019 ◽

Vol 146 (4) ◽

pp. 2879-2879

Author(s):

Ross K. Snider ◽

Trevor Vannoy ◽

James Eaton ◽

Matthew Blunt ◽

E. Bailey Galacci ◽

...

Keyword(s):

Signal Processing ◽

Real Time ◽

Audio Signal ◽

Field Programmable Gate Arrays ◽

System On Chip ◽

Audio Signal Processing ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

On Chip

Download Full-text

A Comparison of Filtering Approaches Using Low-Speed DACs for Hardware-in-the-Loop Implemented in FPGAs

Electronics ◽

10.3390/electronics8101116 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1116 ◽

Cited By ~ 4

Author(s):

Yushkova ◽

Sanchez ◽

de Castro ◽

Martínez-García

Keyword(s):

High Speed ◽

Hardware In The Loop ◽

Low Speed ◽

Digital To Analog Converters ◽

Gate Arrays ◽

Simulation Techniques ◽

Input Signals ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Input Waveform

The use of Hardware-in-the-Loop (HIL) systems implemented in Field Programmable Gate Arrays (FPGAs) is constantly increasing because of its advantages compared to traditional simulation techniques. This increase in usage has caused new challenges related to the improvement of their performance and features like the number of output channels, while the price of HIL systems is diminishing. At present, the use of low-speed Digital-to-Analog Converters (DACs) is starting to be a commercial possibility because of two reasons. One is their lower price and the other is their lower pin count, which determines the number and price of the FPGAs that are necessary to handle those DACs. This paper compares four filtering approaches for providing suitable data to low-speed DACs, which help to filter high-speed input signals, discarding the need of using expensive high-speed DACS, and therefore decreasing the total cost of HIL implementations. Results show that the selection of the appropriate filter should be based on the type of the input waveform and the relative importance of the dynamics versus the area.

Download Full-text