scholarly journals Self-boosting for Feature Distillation

Author(s):  
Yulong Pei ◽  
Yanyun Qu ◽  
Junping Zhang

Knowledge distillation is a simple but effective method for model compression, which obtains a better-performing small network (Student) by learning from a well-trained large network (Teacher). However, when the difference in the model sizes of Student and Teacher is large, the gap in capacity leads to poor performance of Student. Existing methods focus on seeking simplified or more effective knowledge from Teacher to narrow the Teacher-Student gap, while we address this problem by Student's self-boosting. Specifically, we propose a novel distillation method named Self-boosting Feature Distillation (SFD), which eases the Teacher-Student gap by feature integration and self-distillation of Student. Three different modules are designed for feature integration to enhance the discriminability of Student's feature, which leads to improving the order of convergence in theory. Moreover, an easy-to-operate self-distillation strategy is put forward to stabilize the training process and promote the performance of Student, without additional forward propagation or memory consumption. Extensive experiments on multiple benchmarks and networks show that our method is significantly superior to existing methods.

2021 ◽  
Vol 2083 (4) ◽  
pp. 042028
Author(s):  
Zhihao Liang

Abstract As a common method of model compression, the knowledge distillation method can distill the knowledge from the complex large model with strong learning ability to student small model with weak learning ability in the training process, to improve the accuracy and performance of the small model. At present, there has been much knowledge distillation methods specially designed for object detection and achieved good results. However, almost all methods failed to solve the problem of performance degradation caused by the high noise in the current detection framework. In this study, we proposed a feature automatic weight learning method based on EMD to solve these two problems. That is, the EMD method is used to process the space vector to reduce the impact of negative transfer and noise as much as possible, and at the same time, the weights are allocated adaptive to reduce student model’s learning from the teacher model with poor performance and make students more inclined to learn from good teachers. The loss (EMD Loss) was redesigned, and the HEAD was improved to fit our approach. We have carried out different comprehensive performance tests on multiple datasets, including PASCAL, KITTI, ILSVRC, and MS-COCO, and obtained encouraging results, which can not only be applied to the one-stage and two-stage detectors but also can be used radiatively with some other methods.


Author(s):  
Yuzhao Chen ◽  
Yatao Bian ◽  
Xi Xiao ◽  
Yu Rong ◽  
Tingyang Xu ◽  
...  

Recently, the teacher-student knowledge distillation framework has demonstrated its potential in training Graph Neural Networks (GNNs). However, due to the difficulty of training over-parameterized GNN models, one may not easily obtain a satisfactory teacher model for distillation. Furthermore, the inefficient training process of teacher-student knowledge distillation also impedes its applications in GNN models. In this paper, we propose the first teacher-free knowledge distillation method for GNNs, termed GNN Self-Distillation (GNN-SD), that serves as a drop-in replacement of the standard training process. The method is built upon the proposed neighborhood discrepancy rate (NDR), which quantifies the non-smoothness of the embedded graph in an efficient way. Based on this metric, we propose the adaptive discrepancy retaining (ADR) regularizer to empower the transferability of knowledge that maintains high neighborhood discrepancy across GNN layers. We also summarize a generic GNN-SD framework that could be exploited to induce other distillation strategies. Experiments further prove the effectiveness and generalization of our approach, as it brings: 1) state-of-the-art GNN distillation performance with less training cost, 2) consistent and considerable performance enhancement for various popular backbones.


Information ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 264
Author(s):  
Jinghan Wang ◽  
Guangyue Li ◽  
Wenzhao Zhang

The powerful performance of deep learning is evident to all. With the deepening of research, neural networks have become more complex and not easily generalized to resource-constrained devices. The emergence of a series of model compression algorithms makes artificial intelligence on edge possible. Among them, structured model pruning is widely utilized because of its versatility. Structured pruning prunes the neural network itself and discards some relatively unimportant structures to compress the model’s size. However, in the previous pruning work, problems such as evaluation errors of networks, empirical determination of pruning rate, and low retraining efficiency remain. Therefore, we propose an accurate, objective, and efficient pruning algorithm—Combine-Net, introducing Adaptive BN to eliminate evaluation errors, the Kneedle algorithm to determine the pruning rate objectively, and knowledge distillation to improve the efficiency of retraining. Results show that, without precision loss, Combine-Net achieves 95% parameter compression and 83% computation compression on VGG16 on CIFAR10, 71% of parameter compression and 41% computation compression on ResNet50 on CIFAR100. Experiments on different datasets and models have proved that Combine-Net can efficiently compress the neural network’s parameters and computation.


2021 ◽  
Author(s):  
Zhiwei Hao ◽  
Yong Luo ◽  
Han Hu ◽  
Jianping An ◽  
Yonggang Wen

2021 ◽  
Vol 43 (13) ◽  
pp. 2888-2898
Author(s):  
Tianze Gao ◽  
Yunfeng Gao ◽  
Yu Li ◽  
Peiyuan Qin

An essential element for intelligent perception in mechatronic and robotic systems (M&RS) is the visual object detection algorithm. With the ever-increasing advance of artificial neural networks (ANN), researchers have proposed numerous ANN-based visual object detection methods that have proven to be effective. However, networks with cumbersome structures do not befit the real-time scenarios in M&RS, necessitating the techniques of model compression. In the paper, a novel approach to training light-weight visual object detection networks is developed by revisiting knowledge distillation. Traditional knowledge distillation methods are oriented towards image classification is not compatible with object detection. Therefore, a variant of knowledge distillation is developed and adapted to a state-of-the-art keypoint-based visual detection method. Two strategies named as positive sample retaining and early distribution softening are employed to yield a natural adaption. The mutual consistency between teacher model and student model is further promoted through a hint-based distillation. By extensive controlled experiments, the proposed method is testified to be effective in enhancing the light-weight network’s performance by a large margin.


Author(s):  
Štefan Balkó ◽  
Zbigniev Borysiuk ◽  
Jaromír Šimonek

DOI: http://dx.doi.org/10.5007/1980-0037.2016v18n4p391 In many sport disciplines reaction time plays a key role in the sport performance. It is good to point out for example ball games or fighting sports (fencing, karate etc.). The research is focused on detection of the differences in the simple and choice reaction time during visual stimulation among elite, sub-elite fencers and beginners. For the measurement a Fitrosword device and the SWORD software were used. An additional stimulus was added during measuring which should increase the overall number of stimuli, but shouldn’t force fencer to any reaction whatsoever. The results from presented study can be compared with Hicks law. The next focus of the study was to identify the difference in reaction time during two different movement tasks with different complexity movement requirements. The research was built up on a hypothesis that the results will differ among different performance groups of fencers. The difference however was overt among beginners and elite fencers (p = 0.0088, d = 0.5) in reaction time during different movement tasks (direct hit vs. lunge). The results of this research could be useful to trainers for training process organisation and increase the effectivity of muscle coordination during several movements in fencing.


2021 ◽  
Author(s):  
Yingruo Fan ◽  
Jacqueline CK Lam ◽  
Victor On Kwok Li

<div> <div> <div> <p>Facial emotions are expressed through a combination of facial muscle movements, namely, the Facial Action Units (FAUs). FAU intensity estimation aims to estimate the intensity of a set of structurally dependent FAUs. Contrary to the existing works that focus on improving FAU intensity estimation, this study investigates how knowledge distillation (KD) incorporated into a training model can improve FAU intensity estimation efficiency while achieving the same level of performance. Given the intrinsic structural characteristics of FAU, it is desirable to distill deep structural relationships, namely, DSR-FAU, using heatmap regression. Our methodology is as follows: First, a feature map-level distillation loss was applied to ensure that the student network and the teacher network share similar feature distributions. Second, the region-wise and channel-wise relationship distillation loss functions were introduced to penalize the difference in structural relationships. Specifically, the region-wise relationship can be represented by the structural correlations across the facial features, whereas the channel-wise relationship is represented by the implicit FAU co-occurrence dependencies. Third, we compared the model performance of DSR-FAU with the state-of-the-art models, based on two benchmarking datasets. Our proposed model achieves comparable performance with other baseline models, though requiring a lower number of model parameters and lower computation complexities. </p> </div> </div> </div>


2020 ◽  
Author(s):  
Andrey De Aguiar Salvi ◽  
Rodrigo Coelho Barros

Recent research on Convolutional Neural Networks focuses on how to create models with a reduced number of parameters and a smaller storage size while keeping the model’s ability to perform its task, allowing the use of the best CNN for automating tasks in limited devices, with reduced processing power, memory, or energy consumption constraints. There are many different approaches in the literature: removing parameters, reduction of the floating-point precision, creating smaller models that mimic larger models, neural architecture search (NAS), etc. With all those possibilities, it is challenging to say which approach provides a better trade-off between model reduction and performance, due to the difference between the approaches, their respective models, the benchmark datasets, or variations in training details. Therefore, this article contributes to the literature by comparing three state-of-the-art model compression approaches to reduce a well-known convolutional approach for object detection, namely YOLOv3. Our experimental analysis shows that it is possible to create a reduced version of YOLOv3 with 90% fewer parameters and still outperform the original model by pruning parameters. We also create models that require only 0.43% of the original model’s inference effort.


Sign in / Sign up

Export Citation Format

Share Document