A Case Study of Quantizing Convolutional Neural Networks for Fast Disease Diagnosis on Portable Medical Devices

Recently, the amount of attention paid towards convolutional neural networks (CNN) in medical image analysis has rapidly increased since they can analyze and classify images faster and more accurately than human abilities. As a result, CNNs are becoming more popular and play a role as a supplementary assistant for healthcare professionals. Using the CNN on portable medical devices can enable a handy and accurate disease diagnosis. Unfortunately, however, the CNNs require high-performance computing resources as they involve a significant amount of computation to process big data. Thus, they are limited to being used on portable medical devices with limited computing resources. This paper discusses the network quantization techniques that reduce the size of CNN models and enable fast CNN inference with an energy-efficient CNN accelerator integrated into recent mobile processors. With extensive experiments, we show that the quantization technique reduces inference time by 97% on the mobile system integrating a CNN acceleration engine.

Download Full-text

Computing Solution for the Recognition of Basic Actions of Violence in Real Time, from the use of Convolutional Neural Networks, Video Sequences and High Performance Computing

2019 XLV Latin American Computing Conference (CLEI) ◽

10.1109/clei47609.2019.235100 ◽

2019 ◽

Author(s):

Almendra Prisila Laureano Lumba ◽

Roy Roger Rios Nunez ◽

Isaac Ocampo Yahuarcani ◽

Rodolfo Cardenas Vigo ◽

Carlos Alberto Garcia Cortegano ◽

...

Keyword(s):

Neural Networks ◽

High Performance Computing ◽

Real Time ◽

Convolutional Neural Networks ◽

High Performance ◽

Video Sequences ◽

Performance Computing

Download Full-text

High performance computing on SpiNNaker neuromorphic platform: A case study for energy efficient image processing

2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC) ◽

10.1109/pccc.2016.7820645 ◽

2016 ◽

Cited By ~ 9

Author(s):

Indar Sugiarto ◽

Gengting Liu ◽

Simon Davidson ◽

Luis A. Plana ◽

Steve B. Furber

Keyword(s):

Image Processing ◽

High Performance Computing ◽

Energy Efficient ◽

High Performance ◽

Performance Computing

Download Full-text

Power Efficient Design of High-Performance Convolutional Neural Networks Hardware Accelerator on FPGA: A Case Study with GoogLeNet

IEEE Access ◽

10.1109/access.2021.3126838 ◽

2021 ◽

pp. 1-1

Author(s):

Ahmed J. Abd El-Maksoud ◽

Mohamed Ebbed ◽

Ahmed H. Khalil ◽

Hassan Mostafa

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

High Performance ◽

Hardware Accelerator ◽

Efficient Design ◽

Power Efficient

Download Full-text

Energy-Efficient Architecture for CNNs Inference on Heterogeneous FPGA

Journal of Low Power Electronics and Applications ◽

10.3390/jlpea10010001 ◽

2019 ◽

Vol 10 (1) ◽

pp. 1 ◽

Cited By ~ 1

Author(s):

Fanny Spagnolo ◽

Stefania Perri ◽

Fabio Frustaci ◽

Pasquale Corsonello

Keyword(s):

Neural Networks ◽

Embedded System ◽

Convolutional Neural Networks ◽

Energy Efficient ◽

Power Efficiency ◽

High Performance ◽

Large Scale ◽

Low Cost ◽

Frame Rate ◽

Field Programmable

Due to the huge requirements in terms of both computational and memory capabilities, implementing energy-efficient and high-performance Convolutional Neural Networks (CNNs) by exploiting embedded systems still represents a major challenge for hardware designers. This paper presents the complete design of a heterogeneous embedded system realized by using a Field-Programmable Gate Array Systems-on-Chip (SoC) and suitable to accelerate the inference of Convolutional Neural Networks in power-constrained environments, such as those related to IoT applications. The proposed architecture is validated through its exploitation in large-scale CNNs on low-cost devices. The prototype realized on a Zynq XC7Z045 device achieves a power efficiency up to 135 Gops/W. When the VGG-16 model is inferred, a frame rate up to 11.8 fps is reached.

Download Full-text

Aspects of programming for implementation of convolutional neural networks on multisystem HPC architectures

Journal of Physics Conference Series ◽

10.1088/1742-6596/2062/1/012016 ◽

2021 ◽

Vol 2062 (1) ◽

pp. 012016

Author(s):

Sunil Pandey ◽

Naresh Kumar Nagwani ◽

Shrish Verma

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Network ◽

High Performance Computing ◽

Convolutional Neural Networks ◽

High Performance ◽

Processing Unit ◽

Computational Performance ◽

Performance Computing

Abstract The training of deep learning convolutional neural networks is extremely compute intensive and takes long times for completion, on all except small datasets. This is a major limitation inhibiting the widespread adoption of convolutional neural networks in real world applications despite their better image classification performance in comparison with other techniques. Multidirectional research and development efforts are therefore being pursued with the objective of boosting the computational performance of convolutional neural networks. Development of parallel and scalable deep learning convolutional neural network implementations for multisystem high performance computing architectures is important in this background. Prior analysis based on computational experiments indicates that a combination of pipeline and task parallelism results in significant convolutional neural network performance gains of up to 18 times. This paper discusses the aspects which are important from the perspective of implementation of parallel and scalable convolutional neural networks on central processing unit based multisystem high performance computing architectures including computational pipelines, convolutional neural networks, convolutional neural network pipelines, multisystem high performance computing architectures and parallel programming models.

Download Full-text