Processor Pipelining Method for Efficient Deep Neural Network Inference on Embedded Devices

With the development of mobile edge computing (MEC), more and more intelligent services and applications based on deep neural networks are deployed on mobile devices to meet the diverse and personalized needs of users. Unfortunately, deploying and inferencing deep learning models on resource-constrained devices are challenging. The traditional cloud-based method usually runs the deep learning model on the cloud server. Since a large amount of input data needs to be transmitted to the server through WAN, it will cause a large service latency. This is unacceptable for most current latency-sensitive and computation-intensive applications. In this paper, we propose Cogent, an execution framework that accelerates deep neural network inference through device-edge synergy. In the Cogent framework, it is divided into two operation stages, including the automatic pruning and partition stage and the containerized deployment stage. Cogent uses reinforcement learning (RL) to automatically predict pruning and partition strategies based on feedback from the hardware configuration and system conditions so that the pruned and partitioned model can better adapt to the system environment and user hardware configuration. Then through containerized deployment to the device and the edge server to accelerate model inference, experiments show that the learning-based hardware-aware automatic pruning and partition scheme can significantly reduce the service latency, and it accelerates the overall model inference process while maintaining accuracy. Using this method can accelerate up to 8.89× without loss of accuracy of more than 7%.

Download Full-text

Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing

IEEE Transactions on Wireless Communications ◽

10.1109/twc.2019.2946140 ◽

2020 ◽

Vol 19 (1) ◽

pp. 447-457 ◽

Cited By ~ 25

Author(s):

En Li ◽

Liekang Zeng ◽

Zhi Zhou ◽

Xu Chen

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Network Inference ◽

Edge Computing ◽

On Demand

Download Full-text

SNAP: A 1.67 — 21.55TOPS/W Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference in 16nm CMOS

2019 Symposium on VLSI Circuits ◽

10.23919/vlsic.2019.8778193 ◽

2019 ◽

Cited By ~ 4

Author(s):

Jie-Fang Zhang ◽

Ching-En Lee ◽

Chester Liu ◽

Yakun Sophia Shao ◽

Stephen W. Keckler ◽

...

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Network Inference

Download Full-text

Accurate deep neural network inference using computational phase-change memory

Nature Communications ◽

10.1038/s41467-020-16108-9 ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 9

Author(s):

Vinay Joshi ◽

Manuel Le Gallo ◽

Simon Haefeli ◽

Irem Boybat ◽

S. R. Nandakumar ◽

...

Keyword(s):

Neural Network ◽

Phase Change ◽

Deep Neural Network ◽

Network Inference ◽

Phase Change Memory ◽

Change Memory

Download Full-text

Fast Sparse Deep Neural Network Inference with Flexible SpMM Optimization Space Exploration

10.1109/hpec49654.2021.9622791 ◽

2021 ◽

Author(s):

Jie Xin ◽

Xianqi Ye ◽

Long Zheng ◽

Qinggang Wang ◽

Yu Huang ◽

...

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Network Inference ◽

Space Exploration

Download Full-text

Binocular intelligent following robot based on YOLO-LITE

MATEC Web of Conferences ◽

10.1051/matecconf/202133603002 ◽

2021 ◽

Vol 336 ◽

pp. 03002

Author(s):

Yuanyuan Zheng ◽

Jun Ge

Keyword(s):

Neural Network ◽

Real Time ◽

Deep Neural Network ◽

Frame Rate ◽

Raspberry Pi ◽

Detection Accuracy ◽

Location Error ◽

Embedded Devices ◽

Robot System ◽

Time Performance

In order to solve the problem that the deep neural network model is large in scale, the calculation time is too long, and the real-time performance is severely limited when combined with embedded devices, so studied the intelligent follower robot system based on YOLO-LITE algorithm combined with Raspberry Pi 3B+. The system mainly includes camera processing, target detection and other modules. Obtained the internal and external parameters of the camera through calibration, and according to these parameters to correct the binocular camera. Recognized and located the target in each frame of image, calculated the distance from the camera to the target and the center location error, and driven the car to move. The experimental results show that the following car has excellent real-time performance, the average detection frame rate can reach 20Fps, and the average detection accuracy can reach more than 80%.

Download Full-text

Early Diagnosis of Stroke and Internal Hemorrhage via Deep Neural Network Inference of Microwave Signals

2019 IEEE International Conference on Microwaves, Antennas, Communications and Electronic Systems (COMCAS) ◽

10.1109/comcas44984.2019.8958136 ◽

2019 ◽

Author(s):

Ofir Tal ◽

Shye Shapira

Keyword(s):

Neural Network ◽

Early Diagnosis ◽

Deep Neural Network ◽

Network Inference ◽

Microwave Signals

Download Full-text

EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators

IEEE Journal on Emerging and Selected Topics in Circuits and Systems ◽

10.1109/jetcas.2019.2950093 ◽

2019 ◽

Vol 9 (4) ◽

pp. 723-734 ◽

Cited By ~ 3

Author(s):

Lukas Cavigelli ◽

Georg Rutishauser ◽

Luca Benini

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Network Inference ◽

Plane Compression ◽

Bit Plane ◽

And Training

Download Full-text

EmotionNet Nano: An Efficient Deep Convolutional Neural Network Design for Real-Time Facial Expression Recognition

Frontiers in Artificial Intelligence ◽

10.3389/frai.2020.609673 ◽

2021 ◽

Vol 3 ◽

Author(s):

James Ren Lee ◽

Linda Wang ◽

Alexander Wong

Keyword(s):

Neural Network ◽

Facial Expression ◽

Convolutional Neural Network ◽

Network Design ◽

Real Time ◽

Facial Expression Recognition ◽

Deep Neural Network ◽

Low Cost ◽

Expression Recognition ◽

Embedded Devices

While recent advances in deep learning have led to significant improvements in facial expression classification (FEC), a major challenge that remains a bottleneck for the widespread deployment of such systems is their high architectural and computational complexities. This is especially challenging given the operational requirements of various FEC applications, such as safety, marketing, learning, and assistive living, where real-time requirements on low-cost embedded devices is desired. Motivated by this need for a compact, low latency, yet accurate system capable of performing FEC in real-time on low-cost embedded devices, this study proposes EmotionNet Nano, an efficient deep convolutional neural network created through a human-machine collaborative design strategy, where human experience is combined with machine meticulousness and speed in order to craft a deep neural network design catered toward real-time embedded usage. To the best of the author’s knowledge, this is the very first deep neural network architecture for facial expression recognition leveraging machine-driven design exploration in its design process, and exhibits unique architectural characteristics such as high architectural heterogeneity and selective long-range connectivity not seen in previous FEC network architectures. Two different variants of EmotionNet Nano are presented, each with a different trade-off between architectural and computational complexity and accuracy. Experimental results using the CK + facial expression benchmark dataset demonstrate that the proposed EmotionNet Nano networks achieved accuracy comparable to state-of-the-art FEC networks, while requiring significantly fewer parameters. Furthermore, we demonstrate that the proposed EmotionNet Nano networks achieved real-time inference speeds (e.g., >25 FPS and >70 FPS at 15 and 30 W, respectively) and high energy efficiency (e.g., >1.7 images/sec/watt at 15 W) on an ARM embedded processor, thus further illustrating the efficacy of EmotionNet Nano for deployment on embedded devices.

Download Full-text