Energy-Efficient Architecture for CNNs Inference on Heterogeneous FPGA

Due to the huge requirements in terms of both computational and memory capabilities, implementing energy-efficient and high-performance Convolutional Neural Networks (CNNs) by exploiting embedded systems still represents a major challenge for hardware designers. This paper presents the complete design of a heterogeneous embedded system realized by using a Field-Programmable Gate Array Systems-on-Chip (SoC) and suitable to accelerate the inference of Convolutional Neural Networks in power-constrained environments, such as those related to IoT applications. The proposed architecture is validated through its exploitation in large-scale CNNs on low-cost devices. The prototype realized on a Zynq XC7Z045 device achieves a power efficiency up to 135 Gops/W. When the VGG-16 model is inferred, a frame rate up to 11.8 fps is reached.

Download Full-text

A high performance FPGA-based accelerator for large-scale convolutional neural networks

2016 26th International Conference on Field Programmable Logic and Applications (FPL) ◽

10.1109/fpl.2016.7577308 ◽

2016 ◽

Cited By ~ 9

Author(s):

Huimin Li ◽

Xitian Fan ◽

Li Jiao ◽

Wei Cao ◽

Xuegong Zhou ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

High Performance ◽

Large Scale

Download Full-text

A Case Study of Quantizing Convolutional Neural Networks for Fast Disease Diagnosis on Portable Medical Devices

Sensors ◽

10.3390/s22010219 ◽

2021 ◽

Vol 22 (1) ◽

pp. 219

Author(s):

Mukhammed Garifulla ◽

Juncheol Shin ◽

Chanho Kim ◽

Won Hwa Kim ◽

Hye Jung Kim ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Medical Devices ◽

Energy Efficient ◽

High Performance ◽

Medical Image Analysis ◽

Disease Diagnosis ◽

Mobile System ◽

Performance Computing

Recently, the amount of attention paid towards convolutional neural networks (CNN) in medical image analysis has rapidly increased since they can analyze and classify images faster and more accurately than human abilities. As a result, CNNs are becoming more popular and play a role as a supplementary assistant for healthcare professionals. Using the CNN on portable medical devices can enable a handy and accurate disease diagnosis. Unfortunately, however, the CNNs require high-performance computing resources as they involve a significant amount of computation to process big data. Thus, they are limited to being used on portable medical devices with limited computing resources. This paper discusses the network quantization techniques that reduce the size of CNN models and enable fast CNN inference with an energy-efficient CNN accelerator integrated into recent mobile processors. With extensive experiments, we show that the quantization technique reduces inference time by 97% on the mobile system integrating a CNN acceleration engine.

Download Full-text

A High-Performance Accelerator for Large-Scale Convolutional Neural Networks

2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC) ◽

10.1109/ispa/iucc.2017.00099 ◽

2017 ◽

Cited By ~ 4

Author(s):

Fan Sun ◽

Chao Wang ◽

Lei Gong ◽

Chongchong Xu ◽

Yiwei Zhang ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

High Performance ◽

Large Scale

Download Full-text

Large-Scale E-Commerce Image Retrieval with Top-Weighted Convolutional Neural Networks

Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval - ICMR '16 ◽

10.1145/2911996.2912052 ◽

2016 ◽

Cited By ~ 2

Author(s):

Shichao Zhao ◽

Youjiang Xu ◽

Yahong Han

Keyword(s):

Neural Networks ◽

Image Retrieval ◽

Convolutional Neural Networks ◽

Large Scale

Download Full-text

Computational Complexity Reduction of Neural Networks of Brain Tumor Image Segmentation by Introducing Fermi–Dirac Correction Functions

Entropy ◽

10.3390/e23020223 ◽

2021 ◽

Vol 23 (2) ◽

pp. 223

Author(s):

Yen-Ling Tai ◽

Shin-Jhe Huang ◽

Chien-Chang Chen ◽

Henry Horng-Shing Lu

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Computational Complexity ◽

High Performance ◽

Low Cost ◽

Structural Complexity ◽

Correction Function ◽

Computational Time ◽

Learning Methods ◽

Band Theory

Nowadays, deep learning methods with high structural complexity and flexibility inevitably lean on the computational capability of the hardware. A platform with high-performance GPUs and large amounts of memory could support neural networks having large numbers of layers and kernels. However, naively pursuing high-cost hardware would probably drag the technical development of deep learning methods. In the article, we thus establish a new preprocessing method to reduce the computational complexity of the neural networks. Inspired by the band theory of solids in physics, we map the image space into a noninteraction physical system isomorphically and then treat image voxels as particle-like clusters. Then, we reconstruct the Fermi–Dirac distribution to be a correction function for the normalization of the voxel intensity and as a filter of insignificant cluster components. The filtered clusters at the circumstance can delineate the morphological heterogeneity of the image voxels. We used the BraTS 2019 datasets and the dimensional fusion U-net for the algorithmic validation, and the proposed Fermi–Dirac correction function exhibited comparable performance to other employed preprocessing methods. By comparing to the conventional z-score normalization function and the Gamma correction function, the proposed algorithm can save at least 38% of computational time cost under a low-cost hardware architecture. Even though the correction function of global histogram equalization has the lowest computational time among the employed correction functions, the proposed Fermi–Dirac correction function exhibits better capabilities of image augmentation and segmentation.

Download Full-text

AxR-NN: Approximate Computation Reuse for Energy-Efficient Convolutional Neural Networks

Proceedings of the 2020 on Great Lakes Symposium on VLSI ◽

10.1145/3386263.3407595 ◽

2020 ◽

Author(s):

Dongning Ma ◽

Xunzhao Yin ◽

Michael Niemier ◽

X. Sharon Hu ◽

Xun Jiao

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Energy Efficient ◽

Approximate Computation

Download Full-text

An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks

Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing ◽

10.1145/3431379.3460644 ◽

2020 ◽

Author(s):

Albert Njoroge Kahira ◽

Truong Thao Nguyen ◽

Leonardo Bautista Gomez ◽

Ryousei Takano ◽

Rosa M. Badia ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Large Scale ◽

Scale Model ◽

Large Scale Model

Download Full-text

COSY: An Energy-Efficient Hardware Architecture for Deep Convolutional Neural Networks Based on Systolic Array

2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS) ◽

10.1109/icpads.2017.00034 ◽

2017 ◽

Cited By ~ 2

Author(s):

Chen Xin ◽

Qiang Chen ◽

Miren Tian ◽

Mohan Ji ◽

Chenglong Zou ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Systolic Array ◽

Energy Efficient ◽

Hardware Architecture ◽

Deep Convolutional Neural Networks

Download Full-text

High Performance Low Cost Implementation of FPGA-Based Fractional-Order Operators

Volume 6: 5th International Conference on Multibody Systems, Nonlinear Dynamics, and Control, Parts A, B, and C ◽

10.1115/detc2005-84796 ◽

2005 ◽

Cited By ~ 3

Author(s):

Cindy X. Jiang ◽

Tom T. Hartley ◽

Joan E. Carletta

Keyword(s):

Fractional Order ◽

Word Length ◽

High Performance ◽

Low Cost ◽

Careful Consideration ◽

Order System ◽

System Quality ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Hardware implementation of fractional-order differentiators and integrators requires careful consideration of issues of system quality, hardware cost, and speed. This paper proposes using field programmable gate arrays (FPGAs) to implement fractional-order systems, and demonstrates the advantages that FPGAs provide. As an illustration, the fundamental operators to a real power is approximated via the binomial expansion of the backward difference. The resulting high-order FIR filter is implemented in a pipelined multiplierless architecture on a low-cost Spartan-3 FPGA. Unlike common digital implementations in which all filter coefficients have the same word length, this approach exploits variable word length for each coefficient. Our system requires twenty percent less hardware than a system of comparable quality generated by Xilinx’s System Generator on its most area-efficient multiplierless setting. The work shows an effective way to implement a high quality, high throughput approximation to a fractional-order system, while maintaining less cost than traditional FPGA-based designs.

Download Full-text

Real-time classification of hand movements as a basis for intuitive control of grasp neuroprostheses

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2020-2011 ◽

2020 ◽

Vol 6 (2) ◽

Author(s):

Dmitry Amelin ◽

Ivan Potapov ◽

Josep Cardona Audí ◽

Andreas Kogut ◽

Rüdiger Rupp ◽

...

Keyword(s):

Neural Networks ◽

Standard Deviation ◽

Real Time ◽

Convolutional Neural Networks ◽

Recurrent Neural Networks ◽

Healthy Subjects ◽

Hand Movements ◽

Cord Injury ◽

Field Programmable

AbstractThis paper reports on the evaluation of recurrent and convolutional neural networks as real-time grasp phase classifiers for future control of neuroprostheses for people with high spinal cord injury. A field-programmable gate array has been chosen as an implementation platform due to its form factor and ability to perform parallel computations, which are specific for the selected neural networks. Three different phases of two grasp patterns and the additional open hand pattern were predicted by means of surface Electromyography (EMG) signals (i.e. Seven classes in total). Across seven healthy subjects, CNN (Convolutional Neural Networks) and RNN (Recurrent Neural Networks) had a mean accuracy of 85.23% with a standard deviation of 4.77% and 112 µs per prediction and 83.30% with a standard deviation of 4.36% and 40 µs per prediction, respectively.

Download Full-text