scholarly journals An Efficient FPGA-Based Convolutional Neural Network for Classification: Ad-MobileNet

Electronics ◽  
2021 ◽  
Vol 10 (18) ◽  
pp. 2272
Author(s):  
Safa Bouguezzi ◽  
Hana Ben Fredj ◽  
Tarek Belabed ◽  
Carlos Valderrama ◽  
Hassene Faiedh ◽  
...  

Convolutional Neural Networks (CNN) continue to dominate research in the area of hardware acceleration using Field Programmable Gate Arrays (FPGA), proving its effectiveness in a variety of computer vision applications such as object segmentation, image classification, face detection, and traffic signs recognition, among others. However, there are numerous constraints for deploying CNNs on FPGA, including limited on-chip memory, CNN size, and configuration parameters. This paper introduces Ad-MobileNet, an advanced CNN model inspired by the baseline MobileNet model. The proposed model uses an Ad-depth engine, which is an improved version of the depth-wise separable convolution unit. Moreover, we propose an FPGA-based implementation model that supports the Mish, TanhExp, and ReLU activation functions. The experimental results using the CIFAR-10 dataset show that our Ad-MobileNet has a classification accuracy of 88.76% while requiring little computational hardware resources. Compared to state-of-the-art methods, our proposed method has a fairly high recognition rate while using fewer computational hardware resources. Indeed, the proposed model helps to reduce hardware resources by more than 41% compared to that of the baseline model.

Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2637
Author(s):  
Ignacio Pérez ◽  
Miguel Figueroa

Convolutional neural networks (CNN) have been extensively employed for image classification due to their high accuracy. However, inference is a computationally-intensive process that often requires hardware acceleration to operate in real time. For mobile devices, the power consumption of graphics processors (GPUs) is frequently prohibitive, and field-programmable gate arrays (FPGA) become a solution to perform inference at high speed. Although previous works have implemented CNN inference on FPGAs, their high utilization of on-chip memory and arithmetic resources complicate their application on resource-constrained edge devices. In this paper, we present a scalable, low power, low resource-utilization accelerator architecture for inference on the MobileNet V2 CNN. The architecture uses a heterogeneous system with an embedded processor as the main controller, external memory to store network data, and dedicated hardware implemented on reconfigurable logic with a scalable number of processing elements (PE). Implemented on a XCZU7EV FPGA running at 200 MHz and using four PEs, the accelerator infers with 87% top-5 accuracy and processes an image of 224×224 pixels in 220 ms. It consumes 7.35 W of power and uses less than 30% of the logic and arithmetic resources used by other MobileNet FPGA accelerators.


Cryptography ◽  
2020 ◽  
Vol 4 (1) ◽  
pp. 6 ◽  
Author(s):  
Saleh Mulhem ◽  
Ayoub Mars ◽  
Wael Adi

New large classes of permutations over ℤ 2 n based on T-Functions as Self-Inverting Permutation Functions (SIPFs) are presented. The presented classes exhibit negligible or low complexity when implemented in emerging FPGA technologies. The target use of such functions is in creating the so called Secret Unknown Ciphers (SUC) to serve as resilient Clone-Resistant structures in smart non-volatile Field Programmable Gate Arrays (FPGA) devices. SUCs concepts were proposed a decade ago as digital consistent alternatives to the conventional analog inconsistent Physical Unclonable Functions PUFs. The proposed permutation classes are designed and optimized particularly to use non-consumed Mathblock cores in programmable System-on-Chip (SoC) FPGA devices. Hardware and software complexities for realizing such structures are optimized and evaluated for a sample expected target FPGA technology. The attained security levels of the resulting SUCs are evaluated and shown to be scalable and usable even for post-quantum crypto systems.


2019 ◽  
Vol 28 (14) ◽  
pp. 1950246
Author(s):  
I. Hariharan ◽  
M. Kannan

Modern embedded systems are packed with dedicated Field Programmable Gate Arrays (FPGAs) to accelerate the overall system performance. However, the FPGAs are susceptible to reconfiguration overheads. The reconfiguration overheads are mainly because of the configuration data being fetched from the off-chip memory at run-time and also due to the improper management of tasks during execution. To reduce these overheads, our proposed methodology mainly focuses on the prefetch heuristic, reuse technique, and the available memory hierarchy to provide an efficient mapping of tasks over the available memories. Our paper includes a new replacement policy which reduces the overall time and energy reconfiguration overheads for static systems in their subsequent iterations. It is evident from the result that most of the reconfiguration overheads are eliminated when the applications are managed and executed based on our methodology.


Cryptography ◽  
2019 ◽  
Vol 3 (4) ◽  
pp. 28 ◽  
Author(s):  
Saleh Mulhem ◽  
Wael Adi

The Secret Unknown Cipher (SUC) concept was introduced a decade ago as a promising technique for creating pure digital clone-resistant electronic units as alternatives to the traditional non-consistent Physical Unclonable Functions (PUFs). In this work, a very special unconventional cipher design is presented. The design uses hard-core FPGA (Field Programmable Gate Arrays) -Mathblocks available in modern system-on-chip (SoC) FPGAs. Such Mathblocks are often not completely used in many FPGA applications; therefore, it seems wise to make use of such dead (unused) modules to fabricate usable physical security functions for free. Standard cipher designs usually avoid deploying multipliers in the cipher mapping functions due to their high complexity. The main target of this work is to design large cipher classes (e.g., cipher class size >2600) by mainly deploying the FPGA specific mathematical cores. The proposed cipher designs are novel hardware-oriented and new in the public literature, using fully new unusual mapping functions. If a random unknown selection of one cipher out of 2600 ciphers is self-configured in a device, then a Secret Unknown Cipher module is created within a device, making it physically hard to clone. We consider the cipher module for free (for zero cost) if the major elements in the cipher module are making use of unused reanimated Mathblocks. Such ciphers are usable in many future mass products for protecting vehicular units against cloning and modeling attacks. The required self-reconfigurable devices for that concept are not available now; however, they are expected to emerge in the near future.


2019 ◽  
Vol 146 (4) ◽  
pp. 2879-2879
Author(s):  
Ross K. Snider ◽  
Trevor Vannoy ◽  
James Eaton ◽  
Matthew Blunt ◽  
E. Bailey Galacci ◽  
...  

Electronics ◽  
2019 ◽  
Vol 8 (7) ◽  
pp. 803 ◽  
Author(s):  
Deguang Wang ◽  
Junzhong Shen ◽  
Mei Wen ◽  
Chunyuan Zhang

Three-dimensional (3D) deconvolution is widely used in many computer vision applications. However, most previous works have only focused on accelerating two-dimensional (2D) deconvolutional neural networks (DCNNs) on Field-Programmable Gate Arrays (FPGAs), while the acceleration of 3D DCNNs has not been well studied in depth as they have higher computational complexity and sparsity than 2D DCNNs. In this paper, we focus on the acceleration of both 2D and 3D sparse DCNNs on FPGAs by proposing efficient schemes for mapping 2D and 3D sparse DCNNs on a uniform architecture. Firstly, a pruning method is used to prune unimportant network connections and increase the sparsity of weights. After being pruned, the number of parameters of DCNNs is reduced significantly without accuracy loss. Secondly, the remaining non-zero weights are encoded in coordinate (COO) format, reducing the memory demands of parameters. Finally, to demonstrate the effectiveness of our work, we implement our accelerator design on the Xilinx VC709 evaluation platform for four real-life 2D and 3D DCNNs. After the first two steps, the storage required of DCNNs is reduced up to 3.9×. Results show that the performance of our method on the accelerator outperforms that of the our prior work by 2.5× to 3.6× in latency.


Cryptography ◽  
2018 ◽  
Vol 2 (3) ◽  
pp. 15 ◽  
Author(s):  
Don Owen Jr. ◽  
Derek Heeger ◽  
Calvin Chan ◽  
Wenjie Che ◽  
Fareena Saqib ◽  
...  

Secure booting within a field-programmable gate array (FPGA) environment is traditionally implemented using hardwired embedded cryptographic primitives and non-volatile memory (NVM)-based keys, whereby an encrypted bitstream is decrypted as it is loaded from an external storage medium, e.g., Flash memory. A novel technique is proposed in this paper that self-authenticates an unencrypted FPGA configuration bitstream loaded into the FPGA during the start-up. The internal configuration access port (ICAP) interface is accessed to read out configuration information of the unencrypted bitstream, which is then used as input to a secure hash function SHA-3 to generate a digest. In contrast to conventional authentication, where the digest is computed and compared with a second pre-computed value, we use the digest as a challenge to a hardware-embedded delay physical unclonable function (PUF) called HELP. The delays of the paths sensitized by the challenges are used to generate a decryption key using the HELP algorithm. The decryption key is used in the second stage of the boot process to decrypt the operating system (OS) and applications. It follows that any type of malicious tampering with the unencrypted bitstream changes the challenges and the corresponding decryption key, resulting in key regeneration failure. A ring oscillator is used as a clock to make the process autonomous (and unstoppable), and a novel on-chip time-to-digital-converter is used to measure path delays, making the proposed boot process completely self-contained, i.e., implemented entirely within the re-configurable fabric and without utilizing any vendor-specific FPGA features.


2009 ◽  
Vol 2009 ◽  
pp. 1-11 ◽  
Author(s):  
Christian Schuck ◽  
Bastian Haetzer ◽  
Jü rgen Becker

Partial and dynamic online reconfiguration of Field Programmable Gate Arrays (FPGAs) is a promising approach to design high adaptive systems with lower power consumption, higher task specific performance, and even build-in fault tolerance. Different techniques and tool flows have been successfully developed. One of them, the two-dimensional partial reconfiguration, based on the Readback-Modify-Writeback method implemented on Xilinx Virtex devices, makes them ideally suited to be used as a hardware platform in future organic computing systems, where a highly adaptive hardware is necessary. In turn, decentralisation, the key property of an organic computing system, is in contradiction with the central nature of the FPGAs configuration port. Therefore, this paper presents an approach that connects the single ICAP port to a network on chip (NoC) to provide access for all clients of the network. Through this a virtual decentralisation of the ICAP is achieved. Further true 2-dimensional partial reconfiguration is raised to a higher level of abstraction through a lightweight Readback-Modify-Writeback hardware module with different configuration and addressing modes. Results show that configuration data as well as reconfiguration times could be significantly reduced.


Sign in / Sign up

Export Citation Format

Share Document