scholarly journals On Self-Distilling Graph Neural Network

Author(s):  
Yuzhao Chen ◽  
Yatao Bian ◽  
Xi Xiao ◽  
Yu Rong ◽  
Tingyang Xu ◽  
...  

Recently, the teacher-student knowledge distillation framework has demonstrated its potential in training Graph Neural Networks (GNNs). However, due to the difficulty of training over-parameterized GNN models, one may not easily obtain a satisfactory teacher model for distillation. Furthermore, the inefficient training process of teacher-student knowledge distillation also impedes its applications in GNN models. In this paper, we propose the first teacher-free knowledge distillation method for GNNs, termed GNN Self-Distillation (GNN-SD), that serves as a drop-in replacement of the standard training process. The method is built upon the proposed neighborhood discrepancy rate (NDR), which quantifies the non-smoothness of the embedded graph in an efficient way. Based on this metric, we propose the adaptive discrepancy retaining (ADR) regularizer to empower the transferability of knowledge that maintains high neighborhood discrepancy across GNN layers. We also summarize a generic GNN-SD framework that could be exploited to induce other distillation strategies. Experiments further prove the effectiveness and generalization of our approach, as it brings: 1) state-of-the-art GNN distillation performance with less training cost, 2) consistent and considerable performance enhancement for various popular backbones.

Author(s):  
Yulong Pei ◽  
Yanyun Qu ◽  
Junping Zhang

Knowledge distillation is a simple but effective method for model compression, which obtains a better-performing small network (Student) by learning from a well-trained large network (Teacher). However, when the difference in the model sizes of Student and Teacher is large, the gap in capacity leads to poor performance of Student. Existing methods focus on seeking simplified or more effective knowledge from Teacher to narrow the Teacher-Student gap, while we address this problem by Student's self-boosting. Specifically, we propose a novel distillation method named Self-boosting Feature Distillation (SFD), which eases the Teacher-Student gap by feature integration and self-distillation of Student. Three different modules are designed for feature integration to enhance the discriminability of Student's feature, which leads to improving the order of convergence in theory. Moreover, an easy-to-operate self-distillation strategy is put forward to stabilize the training process and promote the performance of Student, without additional forward propagation or memory consumption. Extensive experiments on multiple benchmarks and networks show that our method is significantly superior to existing methods.


Author(s):  
Himel Das Gupta ◽  
Kun Zhang ◽  
Victor S. Sheng

Deep neural network (DNN) has shown significant improvement in learning and generalizing different machine learning tasks over the years. But it comes with an expense of heavy computational power and memory requirements. We can see that machine learning applications are even running in portable devices like mobiles and embedded systems nowadays, which generally have limited resources regarding computational power and memory and thus can only run small machine learning models. However, smaller networks usually do not perform very well. In this paper, we have implemented a simple ensemble learning based knowledge distillation network to improve the accuracy of such small models. Our experimental results prove that the performance enhancement of smaller models can be achieved through distilling knowledge from a combination of small models rather than using a cumbersome model for the knowledge transfer. Besides, the ensemble knowledge distillation network is simpler, time-efficient, and easy to implement.


2016 ◽  
Vol 12 (22) ◽  
pp. 150
Author(s):  
Djelle Opely Patrice Aime

This article is meant to work on teacher-learner relationship and the outcomes of the student-knowledge. The aim is to determine the impact of the teacher-learner relationship on the good atmosphere on creating a good learning environment in the learning process. The assumption is that teacher’s positive attitude toward learners always favours a good learning atmosphere environment. Our methodology will be based on semi directive interviews, questionnairs and inquiries on the class beginning atmosphere, classroom chronicles and evaluations meant to teachers and students in some middle schools and high schools schools in Côte d’Ivoire. The results tell us that emotion in teacher-student relationship plays various roles in the reasoning-learning process, for « they are the heart of human beings mental life ».


Author(s):  
Xiang Deng ◽  
Zhongfei Zhang

Knowledge distillation (KD) transfers knowledge from a teacher network to a student by enforcing the student to mimic the outputs of the pretrained teacher on training data. However, data samples are not always accessible in many cases due to large data sizes, privacy, or confidentiality. Many efforts have been made on addressing this problem for convolutional neural networks (CNNs) whose inputs lie in a grid domain within a continuous space such as images and videos, but largely overlook graph neural networks (GNNs) that handle non-grid data with different topology structures within a discrete space. The inherent differences between their inputs make these CNN-based approaches not applicable to GNNs. In this paper, we propose to our best knowledge the first dedicated approach to distilling knowledge from a GNN without graph data. The proposed graph-free KD (GFKD) learns graph topology structures for knowledge transfer by modeling them with multinomial distribution. We then introduce a gradient estimator to optimize this framework. Essentially, the gradients w.r.t. graph structures are obtained by only using GNN forward-propagation without back-propagation, which means that GFKD is compatible with modern GNN libraries such as DGL and Geometric. Moreover, we provide the strategies for handling different types of prior knowledge in the graph data or the GNNs. Extensive experiments demonstrate that GFKD achieves the state-of-the-art performance for distilling knowledge from GNNs without training data.


2019 ◽  
Vol 2019 ◽  
pp. 1-10
Author(s):  
Diehao Kong ◽  
Xuefeng Yan

Autoencoders are used for fault diagnosis in chemical engineering. To improve their performance, experts have paid close attention to regularized strategies and the creation of new and effective cost functions. However, existing methods are modified on the basis of only one model. This study provides a new perspective for strengthening the fault diagnosis model, which attempts to gain useful information from a model (teacher model) and applies it to a new model (student model). It pretrains the teacher model by fitting ground truth labels and then uses a sample-wise strategy to transfer knowledge from the teacher model. Finally, the knowledge and the ground truth labels are used to train the student model that is identical to the teacher model in terms of structure. The current student model is then used as the teacher of next student model. After step-by-step teacher-student reconfiguration and training, the optimal model is selected for fault diagnosis. Besides, knowledge distillation is applied in training procedures. The proposed method is applied to several benchmarked problems to prove its effectiveness.


2020 ◽  
Vol 19 (S2) ◽  
pp. 88-95
Author(s):  
N Suslov ◽  
V Mishustin ◽  
N Sentiabrev

Aim. The purpose of the article is searching for conditions in the training process for forming a permanent part of motor coordination typical for a competitive technique during the growth spurt. Materials and methods. Body length and body weight were measured in 2 groups of young males on an annual basis. The first group consisted of non-athletes (n = 18), the second group comprised of young weightlifters (n = 18). The examination was conducted in the following age groups: from 10 to 13 years and from 14 to 16 years. Body composition was assessed through the Matejko method. Special fitness was measured by the barbell velocity during the lift established with the help of the photoelectron device based on a transmitter (optical quantum generator), a photodetector, and a recording device. Using the correlation between the factual and model values of minimum fixation speed the efficiency of motor coordination was assessed in the range of 60–100% of barbell weight. These correlations were also used for establishing a limiting character of technique and strength. Results. The analysis of body weight during the annual cycle showed the change in weight categories. This reflected athletes’ weight gain while preserving their qualification. An earlier increase in general growth in young weightlifters has been established, which confirms the advisability of early specialization. It was found that when performing a jerk, the minimum fixation speed of the barbell increases with increasing body length. The insufficiency of this indicator compared to qualified athletes is compensated by an increase in the importance of the muscle factor, which reduces the reliability of motor actions. A training technique is proposed aimed at a more effective implementation of motor skills and a decrease in the barbell speed based on pedagogical instruction and information on the barbell speed. This resulted in a statistically significant (t = 2.89; p < 0.05) increase in the number of successful barbell lifts. Conclusion. When doing weightlifting, it is necessary to stimulate general growth and the growth of muscle mass with training loads being a leading factor in sports results. The training process combined with pedagogical instruction forms a stable motor skill and a reasonable transition to the stage of performance enhancement.


Sign in / Sign up

Export Citation Format

Share Document