Efficient Implementation of 2D and 3D Sparse Deconvolutional Neural Networks with a Uniform Architecture on FPGAs

Three-dimensional (3D) deconvolution is widely used in many computer vision applications. However, most previous works have only focused on accelerating two-dimensional (2D) deconvolutional neural networks (DCNNs) on Field-Programmable Gate Arrays (FPGAs), while the acceleration of 3D DCNNs has not been well studied in depth as they have higher computational complexity and sparsity than 2D DCNNs. In this paper, we focus on the acceleration of both 2D and 3D sparse DCNNs on FPGAs by proposing efficient schemes for mapping 2D and 3D sparse DCNNs on a uniform architecture. Firstly, a pruning method is used to prune unimportant network connections and increase the sparsity of weights. After being pruned, the number of parameters of DCNNs is reduced significantly without accuracy loss. Secondly, the remaining non-zero weights are encoded in coordinate (COO) format, reducing the memory demands of parameters. Finally, to demonstrate the effectiveness of our work, we implement our accelerator design on the Xilinx VC709 evaluation platform for four real-life 2D and 3D DCNNs. After the first two steps, the storage required of DCNNs is reduced up to 3.9×. Results show that the performance of our method on the accelerator outperforms that of the our prior work by 2.5× to 3.6× in latency.

Download Full-text

Memory Requirement Reduction of Deep Neural Networks for Field Programmable Gate Arrays Using Low-Bit Quantization of Parameters

2020 28th European Signal Processing Conference (EUSIPCO) ◽

10.23919/eusipco47968.2020.9287739 ◽

2021 ◽

Author(s):

Niccolo Nicodemo ◽

Gaurav Naithani ◽

Konstantinos Drossos ◽

Tuomas Virtanen ◽

Roberto Saletti

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Field Programmable Gate Arrays ◽

Memory Requirement ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text

Wiring requirement and three-dimensional integration technology for field programmable gate arrays

IEEE Transactions on Very Large Scale Integration (VLSI) Systems ◽

10.1109/tvlsi.2003.810003 ◽

2003 ◽

Vol 11 (1) ◽

pp. 44-54 ◽

Cited By ~ 58

Author(s):

A. Rahman ◽

S. Das ◽

A.P. Chandrakasan ◽

R. Reif

Keyword(s):

Three Dimensional ◽

Field Programmable Gate Arrays ◽

Integration Technology ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text

An Efficient FPGA-Based Convolutional Neural Network for Classification: Ad-MobileNet

Electronics ◽

10.3390/electronics10182272 ◽

2021 ◽

Vol 10 (18) ◽

pp. 2272

Author(s):

Safa Bouguezzi ◽

Hana Ben Fredj ◽

Tarek Belabed ◽

Carlos Valderrama ◽

Hassene Faiedh ◽

...

Keyword(s):

Recognition Rate ◽

Hardware Acceleration ◽

Implementation Model ◽

Gate Arrays ◽

Proposed Model ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Computer Vision Applications ◽

On Chip ◽

Segmentation Image

Convolutional Neural Networks (CNN) continue to dominate research in the area of hardware acceleration using Field Programmable Gate Arrays (FPGA), proving its effectiveness in a variety of computer vision applications such as object segmentation, image classification, face detection, and traffic signs recognition, among others. However, there are numerous constraints for deploying CNNs on FPGA, including limited on-chip memory, CNN size, and configuration parameters. This paper introduces Ad-MobileNet, an advanced CNN model inspired by the baseline MobileNet model. The proposed model uses an Ad-depth engine, which is an improved version of the depth-wise separable convolution unit. Moreover, we propose an FPGA-based implementation model that supports the Mish, TanhExp, and ReLU activation functions. The experimental results using the CIFAR-10 dataset show that our Ad-MobileNet has a classification accuracy of 88.76% while requiring little computational hardware resources. Compared to state-of-the-art methods, our proposed method has a fairly high recognition rate while using fewer computational hardware resources. Indeed, the proposed model helps to reduce hardware resources by more than 41% compared to that of the baseline model.

Download Full-text

Hardware synthesis of artificial neural networks using field programmable gate arrays and fixed-point numbers

2006 IEEE Region 5 Conference ◽

10.1109/tpsd.2006.5507410 ◽

2006 ◽

Cited By ~ 2

Author(s):

Mychal Hoffman ◽

Paul Bauer ◽

Brian Hemrnelman ◽

Abul Hasan

Keyword(s):

Neural Networks ◽

Fixed Point ◽

Artificial Neural Networks ◽

Field Programmable Gate Arrays ◽

Hardware Synthesis ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Artificial Neural

Download Full-text

Design and Implementation of Perceptron Neuron in Machine Learning for Handwritten Character Recognition

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.h6680.0891020 ◽

2020 ◽

Vol 9 (10) ◽

pp. 357-363

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Feature Extraction ◽

Field Programmable Gate Arrays ◽

Fpga Implementation ◽

Floating Point ◽

Neuron Network ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Due to the exponential increase of electronic devices that are connected to the Internet, the amount of data that they produce have grown to the same extent. In order to face the processing of these data, the use of some automatic learning algorithms, also known as Machine Learning, has become widespread. The most popular is the one known as neural networks. These algorithms need a great deal of resources to compute all their operations, and because of that, they have been traditionally implemented in application specific integrated circuits. However, recently there have been a boom in implementations in field programmable gate arrays, also known as FPGAs. These allow greater parallelism in the implementation of the algorithms. Field Programmable Gate Arrays (FPGA) implementation based feature extraction method is proposed in this paper. This particular application is handwritten offline digit recognition. The classification depends on simple 2 layer MultiLayer Perceptron (MLP). The particular feature extraction approach is suitable for execution of FPGA because it is utilized with subtraction and addition operations. From Standard database handwritten digit images of normalized 40×40 pixel the features are extracted by the proposed method. It has been discovered by experiential outcomes that 85% accuracy is achieved by proposed system. Overall, as compared to other systems, it is less complex, more accurate and simple. Further this project explains IEE-754 format single precision floating point MAC unit’s FPGA implementation which is utilized for feeding the neurons weighted inputs in artificial neural networks. Data representation range is improved by floating point numbers utilization to a higher number from smaller number that is highly suggested for Artificial Neuron Network. The code is developed in HDL, simulated and synthesis results are extracted using Xilinx synthesis tools .In order to validate its computational accuracy of the FFT, an MATLAB validation script is used to verify the output of HDL with standard reference model.

Download Full-text

Deploying a Smart Queuing System on Edge with Intel OpenVINO Toolkit

10.21203/rs.3.rs-509460/v1 ◽

2021 ◽

Author(s):

Rishit Dagli ◽

Süleyman Eken

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Public Transportation ◽

Queuing System ◽

Processing Unit ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

The Cost ◽

Hardware Platforms

Abstract Recent increases in computational power and the development of specialized architecture led to the possibility to perform machine learning, especially inference, on the edge. OpenVINO is a toolkit based on Convolutional Neural Networks that facilitates fast-track development of computer vision algorithms and deep learning neural networks into vision applications, and enables their easy heterogeneous execution across hardware platforms. A smart queue management can be the key to the success of any sector.} In this paper, we focus on edge deployments to make the Smart Queuing System (SQS) accessible by all also providing ability to run it on cheap devices. This gives it the ability to run the queuing system deep learning algorithms on pre-existing computers which a retail store, public transportation facility or a factory may already possess thus considerably reducing the cost of deployment of such a system. SQS demonstrates how to create a video AI solution on the edge. We validate our results by testing it on multiple edge devices namely CPU, Integrated Edge Graphic Processing Unit (iGPU), Vision Processing Unit (VPU) and Field Programmable Gate Arrays (FPGAs). Experimental results show that deploying a SQS on edge is very promising.

Download Full-text

mNet2FPGA: A Design Flow for Mapping a Fixed-Point CNN to Zynq SoC FPGA

Electronics ◽

10.3390/electronics9111823 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1823

Author(s):

Tomyslav Sledevič ◽

Artūras Serackis

Keyword(s):

Neural Networks ◽

Fixed Point ◽

Processing System ◽

Design Flow ◽

Feature Maps ◽

Gate Arrays ◽

The Core ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Fully Connected

The convolutional neural networks (CNNs) are a computation and memory demanding class of deep neural networks. The field-programmable gate arrays (FPGAs) are often used to accelerate the networks deployed in embedded platforms due to the high computational complexity of CNNs. In most cases, the CNNs are trained with existing deep learning frameworks and then mapped to FPGAs with specialized toolflows. In this paper, we propose a CNN core architecture called mNet2FPGA that places a trained CNN on a SoC FPGA. The processing system (PS) is responsible for convolution and fully connected core configuration according to the list of prescheduled instructions. The programmable logic holds cores of convolution and fully connected layers. The hardware architecture is based on the advanced extensible interface (AXI) stream processing with simultaneous bidirectional transfers between RAM and the CNN core. The core was tested on a cost-optimized Z-7020 FPGA with 16-bit fixed-point VGG networks. The kernel binarization and merging with the batch normalization layer were applied to reduce the number of DSPs in the multi-channel convolutional core. The convolutional core processes eight input feature maps at once and generates eight output channels of the same size and composition at 50 MHz. The core of the fully connected (FC) layer works at 100 MHz with up to 4096 neurons per layer. In a current version of the CNN core, the size of the convolutional kernel is fixed to 3×3. The estimated average performance is 8.6 GOPS for VGG13 and near 8.4 GOPS for VGG16/19 networks.

Download Full-text