scholarly journals Energy-Efficient Architecture for CNNs Inference on Heterogeneous FPGA

2019 ◽  
Vol 10 (1) ◽  
pp. 1 ◽  
Author(s):  
Fanny Spagnolo ◽  
Stefania Perri ◽  
Fabio Frustaci ◽  
Pasquale Corsonello

Due to the huge requirements in terms of both computational and memory capabilities, implementing energy-efficient and high-performance Convolutional Neural Networks (CNNs) by exploiting embedded systems still represents a major challenge for hardware designers. This paper presents the complete design of a heterogeneous embedded system realized by using a Field-Programmable Gate Array Systems-on-Chip (SoC) and suitable to accelerate the inference of Convolutional Neural Networks in power-constrained environments, such as those related to IoT applications. The proposed architecture is validated through its exploitation in large-scale CNNs on low-cost devices. The prototype realized on a Zynq XC7Z045 device achieves a power efficiency up to 135 Gops/W. When the VGG-16 model is inferred, a frame rate up to 11.8 fps is reached.

Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 219
Author(s):  
Mukhammed Garifulla ◽  
Juncheol Shin ◽  
Chanho Kim ◽  
Won Hwa Kim ◽  
Hye Jung Kim ◽  
...  

Recently, the amount of attention paid towards convolutional neural networks (CNN) in medical image analysis has rapidly increased since they can analyze and classify images faster and more accurately than human abilities. As a result, CNNs are becoming more popular and play a role as a supplementary assistant for healthcare professionals. Using the CNN on portable medical devices can enable a handy and accurate disease diagnosis. Unfortunately, however, the CNNs require high-performance computing resources as they involve a significant amount of computation to process big data. Thus, they are limited to being used on portable medical devices with limited computing resources. This paper discusses the network quantization techniques that reduce the size of CNN models and enable fast CNN inference with an energy-efficient CNN accelerator integrated into recent mobile processors. With extensive experiments, we show that the quantization technique reduces inference time by 97% on the mobile system integrating a CNN acceleration engine.


Entropy ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. 223
Author(s):  
Yen-Ling Tai ◽  
Shin-Jhe Huang ◽  
Chien-Chang Chen ◽  
Henry Horng-Shing Lu

Nowadays, deep learning methods with high structural complexity and flexibility inevitably lean on the computational capability of the hardware. A platform with high-performance GPUs and large amounts of memory could support neural networks having large numbers of layers and kernels. However, naively pursuing high-cost hardware would probably drag the technical development of deep learning methods. In the article, we thus establish a new preprocessing method to reduce the computational complexity of the neural networks. Inspired by the band theory of solids in physics, we map the image space into a noninteraction physical system isomorphically and then treat image voxels as particle-like clusters. Then, we reconstruct the Fermi–Dirac distribution to be a correction function for the normalization of the voxel intensity and as a filter of insignificant cluster components. The filtered clusters at the circumstance can delineate the morphological heterogeneity of the image voxels. We used the BraTS 2019 datasets and the dimensional fusion U-net for the algorithmic validation, and the proposed Fermi–Dirac correction function exhibited comparable performance to other employed preprocessing methods. By comparing to the conventional z-score normalization function and the Gamma correction function, the proposed algorithm can save at least 38% of computational time cost under a low-cost hardware architecture. Even though the correction function of global histogram equalization has the lowest computational time among the employed correction functions, the proposed Fermi–Dirac correction function exhibits better capabilities of image augmentation and segmentation.


Author(s):  
Cindy X. Jiang ◽  
Tom T. Hartley ◽  
Joan E. Carletta

Hardware implementation of fractional-order differentiators and integrators requires careful consideration of issues of system quality, hardware cost, and speed. This paper proposes using field programmable gate arrays (FPGAs) to implement fractional-order systems, and demonstrates the advantages that FPGAs provide. As an illustration, the fundamental operators to a real power is approximated via the binomial expansion of the backward difference. The resulting high-order FIR filter is implemented in a pipelined multiplierless architecture on a low-cost Spartan-3 FPGA. Unlike common digital implementations in which all filter coefficients have the same word length, this approach exploits variable word length for each coefficient. Our system requires twenty percent less hardware than a system of comparable quality generated by Xilinx’s System Generator on its most area-efficient multiplierless setting. The work shows an effective way to implement a high quality, high throughput approximation to a fractional-order system, while maintaining less cost than traditional FPGA-based designs.


2020 ◽  
Vol 6 (2) ◽  
Author(s):  
Dmitry Amelin ◽  
Ivan Potapov ◽  
Josep Cardona Audí ◽  
Andreas Kogut ◽  
Rüdiger Rupp ◽  
...  

AbstractThis paper reports on the evaluation of recurrent and convolutional neural networks as real-time grasp phase classifiers for future control of neuroprostheses for people with high spinal cord injury. A field-programmable gate array has been chosen as an implementation platform due to its form factor and ability to perform parallel computations, which are specific for the selected neural networks. Three different phases of two grasp patterns and the additional open hand pattern were predicted by means of surface Electromyography (EMG) signals (i.e. Seven classes in total). Across seven healthy subjects, CNN (Convolutional Neural Networks) and RNN (Recurrent Neural Networks) had a mean accuracy of 85.23% with a standard deviation of 4.77% and 112 µs per prediction and 83.30% with a standard deviation of 4.36% and 40 µs per prediction, respectively.


Sign in / Sign up

Export Citation Format

Share Document