Knowledge Distillation and Gradient Estimation for Active Error Compensation in Approximate Neural Networks

Explaining the prediction of deep neural networks makes the networks more understandable and trusted, leading to their use in various mission critical tasks. Recent progress in the learning capability of networks has primarily been due to the enormous number of model parameters, so that it is usually hard to interpret their operations, as opposed to classical white-box models. For this purpose, generating saliency maps is a popular approach to identify the important input features used for the model prediction. Existing explanation methods typically only use the output of the last convolution layer of the model to generate a saliency map, lacking the information included in intermediate layers. Thus, the corresponding explanations are coarse and result in limited accuracy. Although the accuracy can be improved by iteratively developing a saliency map, this is too time-consuming and is thus impractical. To address these problems, we proposed a novel approach to explain the model prediction by developing an attentive surrogate network using the knowledge distillation. The surrogate network aims to generate a fine-grained saliency map corresponding to the model prediction using meaningful regional information presented over all network layers. Experiments demonstrated that the saliency maps are the result of spatially attentive features learned from the distillation. Thus, they are useful for fine-grained classification tasks. Moreover, the proposed method runs at the rate of 24.3 frames per second, which is much faster than the existing methods by orders of magnitude.

Download Full-text

Communication Failure Resilient Distributed Neural Network for Edge Devices

Electronics ◽

10.3390/electronics10141614 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1614

Author(s):

Jonghun Jeong ◽

Jong Sung Park ◽

Hoeseok Yang

Keyword(s):

Neural Network ◽

Neural Networks ◽

High Performance ◽

State Of The Art ◽

Wearable Devices ◽

Communication Failure ◽

Canadian Institute ◽

Multiple Devices ◽

Knowledge Distillation ◽

Partitioning Technique

Recently, the necessity to run high-performance neural networks (NN) is increasing even in resource-constrained embedded systems such as wearable devices. However, due to the high computational and memory requirements of the NN applications, it is typically infeasible to execute them on a single device. Instead, it has been proposed to run a single NN application cooperatively on top of multiple devices, a so-called distributed neural network. In the distributed neural network, workloads of a single big NN application are distributed over multiple tiny devices. While the computation overhead could effectively be alleviated by this approach, the existing distributed NN techniques, such as MoDNN, still suffer from large traffics between the devices and vulnerability to communication failures. In order to get rid of such big communication overheads, a knowledge distillation based distributed NN, called Network of Neural Networks (NoNN), was proposed, which partitions the filters in the final convolutional layer of the original NN into multiple independent subsets and derives smaller NNs out of each subset. However, NoNN also has limitations in that the partitioning result may be unbalanced and it considerably compromises the correlation between filters in the original NN, which may result in an unacceptable accuracy degradation in case of communication failure. In this paper, in order to overcome these issues, we propose to enhance the partitioning strategy of NoNN in two aspects. First, we enhance the redundancy of the filters that are used to derive multiple smaller NNs by means of averaging to increase the immunity of the distributed NN to communication failure. Second, we propose a novel partitioning technique, modified from Eigenvector-based partitioning, to preserve the correlation between filters as much as possible while keeping the consistent number of filters distributed to each device. Throughout extensive experiments with the CIFAR-100 (Canadian Institute For Advanced Research-100) dataset, it has been observed that the proposed approach maintains high inference accuracy (over 70%, 1.53× improvement over the state-of-the-art approach), on average, even when a half of eight devices in a distributed NN fail to deliver their partial inference results.

Download Full-text

Time-varying Position Error Compensation of Machine Tools Based on Dynamic Fuzzy Neural Networks

Journal of Mechanical Engineering ◽

10.3901/jme.2011.13.175 ◽

2011 ◽

Vol 47 (13) ◽

pp. 175 ◽

Cited By ~ 4

Author(s):

Fuji WANG

Keyword(s):

Neural Networks ◽

Error Compensation ◽

Machine Tools ◽

Fuzzy Neural Networks ◽

Position Error ◽

Time Varying ◽

Fuzzy Neural

Download Full-text

Research on Thermal Error Compensation Technology of Grinding Machine Based on Neural Networks

Advances in Grinding and Abrasive Technology XIV - Key Engineering Materials ◽

10.4028/0-87849-459-6.569 ◽

2007 ◽

pp. 569-573

Author(s):

Qian Jian Guo ◽

Jian Guo Yang ◽

Xiao Ni Qi

Keyword(s):

Neural Networks ◽

Error Compensation ◽

Thermal Error ◽

Grinding Machine ◽

Thermal Error Compensation

Download Full-text

Improving the Interpretability of Deep Neural Networks with Knowledge Distillation

2018 IEEE International Conference on Data Mining Workshops (ICDMW) ◽

10.1109/icdmw.2018.00132 ◽

2018 ◽

Cited By ~ 1

Author(s):

Xuan Liu ◽

Xiaoguang Wang ◽

Stan Matwin

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Knowledge Distillation

Download Full-text

Stochastic Gradient Estimation for Artificial Neural Networks

SSRN Electronic Journal ◽

10.2139/ssrn.3318847 ◽

2019 ◽

Author(s):

Yijie Peng ◽

Li Xiao ◽

Bernd Heidergott ◽

L. Jeff Hong ◽

Henry Lam

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Stochastic Gradient ◽

Gradient Estimation ◽

Artificial Neural

Download Full-text

Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation

2020 International Conference on Machine Vision and Image Processing (MVIP) ◽

10.1109/mvip49855.2020.9116923 ◽

2020 ◽

Author(s):

Sajjad Abbasi ◽

Mohsen Hajabdollahi ◽

Nader Karimi ◽

Shadrokh Samavi

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Teacher Student ◽

Knowledge Distillation

Download Full-text

Knowledge Distillation for Optimization of Quantized Deep Neural Networks

2020 IEEE Workshop on Signal Processing Systems (SiPS) ◽

10.1109/sips50750.2020.9195219 ◽

2020 ◽

Author(s):

Sungho Shin ◽

Yoonho Boo ◽

Wonyong Sung

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Knowledge Distillation

Download Full-text

Deep Collaborative Learning for Randomly Wired Neural Networks

Electronics ◽

10.3390/electronics10141669 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1669

Author(s):

Ehab Essa ◽

Xianghua Xie

Keyword(s):

Neural Networks ◽

Collaborative Learning ◽

Transfer Learning ◽

Graph Algorithm ◽

Ensemble Model ◽

Effective Learning ◽

Teacher Networks ◽

Knowledge Distillation ◽

Small Model ◽

A Chain

A deep collaborative learning approach is introduced in which a chain of randomly wired neural networks is trained simultaneously to improve the overall generalization and form a strong ensemble model. The proposed method takes advantage of functional-preserving transfer learning and knowledge distillation to produce an ensemble model. Knowledge distillation is an effective learning scheme for improving the performance of small neural networks by using the knowledge learned by teacher networks. Most of the previous methods learn from one or more teachers but not in a collaborative way. In this paper, we created a chain of randomly wired neural networks based on a random graph algorithm and collaboratively trained the models using functional-preserving transfer learning, so that the small network in the chain could learn from the largest one simultaneously. The training method applies knowledge distillation between randomly wired models, where each model is considered as a teacher to the next model in the chain. The decision of multiple chains of models can be combined to produce a robust ensemble model. The proposed method is evaluated on CIFAR-10, CIFAR-100, and TinyImageNet. The experimental results show that the collaborative training significantly improved the generalization of each model, which allowed for obtaining a small model that can mimic the performance of a large model and produce a more robust ensemble approach.

Download Full-text

A Review of Knowledge Distillation in Deep Neural Networks

Computer Science and Application ◽

10.12677/csa.2020.109171 ◽

2020 ◽

Vol 10 (09) ◽

pp. 1625-1630

Author(s):

宇韩

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Knowledge Distillation

Download Full-text