Processor Pipelining Method for Efficient Deep Neural Network Inference on Embedded Devices

Author(s):  
Akshay Parashar ◽  
Arun Abraham ◽  
Deepak Chaudhary ◽  
Vikram Nelvoy Rajendiran
2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Nanliang Shan ◽  
Zecong Ye ◽  
Xiaolong Cui

With the development of mobile edge computing (MEC), more and more intelligent services and applications based on deep neural networks are deployed on mobile devices to meet the diverse and personalized needs of users. Unfortunately, deploying and inferencing deep learning models on resource-constrained devices are challenging. The traditional cloud-based method usually runs the deep learning model on the cloud server. Since a large amount of input data needs to be transmitted to the server through WAN, it will cause a large service latency. This is unacceptable for most current latency-sensitive and computation-intensive applications. In this paper, we propose Cogent, an execution framework that accelerates deep neural network inference through device-edge synergy. In the Cogent framework, it is divided into two operation stages, including the automatic pruning and partition stage and the containerized deployment stage. Cogent uses reinforcement learning (RL) to automatically predict pruning and partition strategies based on feedback from the hardware configuration and system conditions so that the pruned and partitioned model can better adapt to the system environment and user hardware configuration. Then through containerized deployment to the device and the edge server to accelerate model inference, experiments show that the learning-based hardware-aware automatic pruning and partition scheme can significantly reduce the service latency, and it accelerates the overall model inference process while maintaining accuracy. Using this method can accelerate up to 8.89× without loss of accuracy of more than 7%.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Vinay Joshi ◽  
Manuel Le Gallo ◽  
Simon Haefeli ◽  
Irem Boybat ◽  
S. R. Nandakumar ◽  
...  

2021 ◽  
Author(s):  
Jie Xin ◽  
Xianqi Ye ◽  
Long Zheng ◽  
Qinggang Wang ◽  
Yu Huang ◽  
...  

2021 ◽  
Vol 336 ◽  
pp. 03002
Author(s):  
Yuanyuan Zheng ◽  
Jun Ge

In order to solve the problem that the deep neural network model is large in scale, the calculation time is too long, and the real-time performance is severely limited when combined with embedded devices, so studied the intelligent follower robot system based on YOLO-LITE algorithm combined with Raspberry Pi 3B+. The system mainly includes camera processing, target detection and other modules. Obtained the internal and external parameters of the camera through calibration, and according to these parameters to correct the binocular camera. Recognized and located the target in each frame of image, calculated the distance from the camera to the target and the center location error, and driven the car to move. The experimental results show that the following car has excellent real-time performance, the average detection frame rate can reach 20Fps, and the average detection accuracy can reach more than 80%.


2021 ◽  
Vol 3 ◽  
Author(s):  
James Ren Lee ◽  
Linda Wang ◽  
Alexander Wong

While recent advances in deep learning have led to significant improvements in facial expression classification (FEC), a major challenge that remains a bottleneck for the widespread deployment of such systems is their high architectural and computational complexities. This is especially challenging given the operational requirements of various FEC applications, such as safety, marketing, learning, and assistive living, where real-time requirements on low-cost embedded devices is desired. Motivated by this need for a compact, low latency, yet accurate system capable of performing FEC in real-time on low-cost embedded devices, this study proposes EmotionNet Nano, an efficient deep convolutional neural network created through a human-machine collaborative design strategy, where human experience is combined with machine meticulousness and speed in order to craft a deep neural network design catered toward real-time embedded usage. To the best of the author’s knowledge, this is the very first deep neural network architecture for facial expression recognition leveraging machine-driven design exploration in its design process, and exhibits unique architectural characteristics such as high architectural heterogeneity and selective long-range connectivity not seen in previous FEC network architectures. Two different variants of EmotionNet Nano are presented, each with a different trade-off between architectural and computational complexity and accuracy. Experimental results using the CK + facial expression benchmark dataset demonstrate that the proposed EmotionNet Nano networks achieved accuracy comparable to state-of-the-art FEC networks, while requiring significantly fewer parameters. Furthermore, we demonstrate that the proposed EmotionNet Nano networks achieved real-time inference speeds (e.g., >25 FPS and >70 FPS at 15 and 30 W, respectively) and high energy efficiency (e.g., >1.7 images/sec/watt at 15 W) on an ARM embedded processor, thus further illustrating the efficacy of EmotionNet Nano for deployment on embedded devices.


Sign in / Sign up

Export Citation Format

Share Document