On Self-Distilling Graph Neural Network

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/314 ◽

2021 ◽

Author(s):

Yuzhao Chen ◽

Yatao Bian ◽

Xi Xiao ◽

Yu Rong ◽

Tingyang Xu ◽

...

Keyword(s):

Performance Enhancement ◽

Training Process ◽

Student Knowledge ◽

Teacher Student ◽

Embedded Graph ◽

Training Cost ◽

Knowledge Distillation ◽

Discrepancy Rate ◽

Standard Training ◽

Graph Neural Networks

Recently, the teacher-student knowledge distillation framework has demonstrated its potential in training Graph Neural Networks (GNNs). However, due to the difficulty of training over-parameterized GNN models, one may not easily obtain a satisfactory teacher model for distillation. Furthermore, the inefficient training process of teacher-student knowledge distillation also impedes its applications in GNN models. In this paper, we propose the first teacher-free knowledge distillation method for GNNs, termed GNN Self-Distillation (GNN-SD), that serves as a drop-in replacement of the standard training process. The method is built upon the proposed neighborhood discrepancy rate (NDR), which quantifies the non-smoothness of the embedded graph in an efficient way. Based on this metric, we propose the adaptive discrepancy retaining (ADR) regularizer to empower the transferability of knowledge that maintains high neighborhood discrepancy across GNN layers. We also summarize a generic GNN-SD framework that could be exploited to induce other distillation strategies. Experiments further prove the effectiveness and generalization of our approach, as it brings: 1) state-of-the-art GNN distillation performance with less training cost, 2) consistent and considerable performance enhancement for various popular backbones.

Download Full-text

Hybrid Learning with Teacher-student Knowledge Distillation for Recommenders

2020 International Conference on Data Mining Workshops (ICDMW) ◽

10.1109/icdmw51313.2020.00040 ◽

2020 ◽

Author(s):

Hangbin Zhang ◽

Raymond K. Wong ◽

Victor W. Chu

Keyword(s):

Hybrid Learning ◽

Student Knowledge ◽

Teacher Student ◽

Knowledge Distillation

Download Full-text

Self-boosting for Feature Distillation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/131 ◽

2021 ◽

Author(s):

Yulong Pei ◽

Yanyun Qu ◽

Junping Zhang

Keyword(s):

Poor Performance ◽

Feature Integration ◽

Large Network ◽

Distillation Method ◽

Memory Consumption ◽

Training Process ◽

Model Compression ◽

Teacher Student ◽

Knowledge Distillation ◽

The Difference

Knowledge distillation is a simple but effective method for model compression, which obtains a better-performing small network (Student) by learning from a well-trained large network (Teacher). However, when the difference in the model sizes of Student and Teacher is large, the gap in capacity leads to poor performance of Student. Existing methods focus on seeking simplified or more effective knowledge from Teacher to narrow the Teacher-Student gap, while we address this problem by Student's self-boosting. Specifically, we propose a novel distillation method named Self-boosting Feature Distillation (SFD), which eases the Teacher-Student gap by feature integration and self-distillation of Student. Three different modules are designed for feature integration to enhance the discriminability of Student's feature, which leads to improving the order of convergence in theory. Moreover, an easy-to-operate self-distillation strategy is put forward to stabilize the training process and promote the performance of Student, without additional forward propagation or memory consumption. Extensive experiments on multiple benchmarks and networks show that our method is significantly superior to existing methods.

Download Full-text

A Simple Ensemble Learning Knowledge Distillation

Machine Learning and Artificial Intelligence - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200778 ◽

2020 ◽

Author(s):

Himel Das Gupta ◽

Kun Zhang ◽

Victor S. Sheng

Keyword(s):

Machine Learning ◽

Ensemble Learning ◽

Performance Enhancement ◽

Portable Devices ◽

Computational Power ◽

Small Machine ◽

Learning Tasks ◽

Machine Learning Applications ◽

Knowledge Distillation ◽

Small Models

Deep neural network (DNN) has shown significant improvement in learning and generalizing different machine learning tasks over the years. But it comes with an expense of heavy computational power and memory requirements. We can see that machine learning applications are even running in portable devices like mobiles and embedded systems nowadays, which generally have limited resources regarding computational power and memory and thus can only run small machine learning models. However, smaller networks usually do not perform very well. In this paper, we have implemented a simple ensemble learning based knowledge distillation network to improve the accuracy of such small models. Our experimental results prove that the performance enhancement of smaller models can be achieved through distilling knowledge from a combination of small models rather than using a cumbersome model for the knowledge transfer. Besides, the ensemble knowledge distillation network is simpler, time-efficient, and easy to implement.

Download Full-text

Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation

2020 International Conference on Machine Vision and Image Processing (MVIP) ◽

10.1109/mvip49855.2020.9116923 ◽

2020 ◽

Author(s):

Sajjad Abbasi ◽

Mohsen Hajabdollahi ◽

Nader Karimi ◽

Shadrokh Samavi

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Teacher Student ◽

Knowledge Distillation

Download Full-text

Relation Affective Entre Apprenant-Enseignant: Cas De Certains Lycees De Cȏte d`Ivoire

European Scientific Journal ESJ ◽

10.19044/esj.2016.v12n22p150 ◽

2016 ◽

Vol 12 (22) ◽

pp. 150

Author(s):

Djelle Opely Patrice Aime

Keyword(s):

Learning Process ◽

Human Beings ◽

Student Knowledge ◽

Teacher Student ◽

Teacher Student Relationship ◽

Teachers And Students ◽

Good Learning ◽

Student Relationship ◽

Atmosphere Environment ◽

The Impact

This article is meant to work on teacher-learner relationship and the outcomes of the student-knowledge. The aim is to determine the impact of the teacher-learner relationship on the good atmosphere on creating a good learning environment in the learning process. The assumption is that teacher’s positive attitude toward learners always favours a good learning atmosphere environment. Our methodology will be based on semi directive interviews, questionnairs and inquiries on the class beginning atmosphere, classroom chronicles and evaluations meant to teachers and students in some middle schools and high schools schools in Côte d’Ivoire. The results tell us that emotion in teacher-student relationship plays various roles in the reasoning-learning process, for « they are the heart of human beings mental life ».

Download Full-text

Graph-Free Knowledge Distillation for Graph Neural Networks

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/320 ◽

2021 ◽

Author(s):

Xiang Deng ◽

Zhongfei Zhang

Keyword(s):

Neural Networks ◽

Back Propagation ◽

Multinomial Distribution ◽

Large Data ◽

Training Data ◽

Grid Data ◽

Continuous Space ◽

Graph Data ◽

Knowledge Distillation ◽

Graph Neural Networks

Knowledge distillation (KD) transfers knowledge from a teacher network to a student by enforcing the student to mimic the outputs of the pretrained teacher on training data. However, data samples are not always accessible in many cases due to large data sizes, privacy, or confidentiality. Many efforts have been made on addressing this problem for convolutional neural networks (CNNs) whose inputs lie in a grid domain within a continuous space such as images and videos, but largely overlook graph neural networks (GNNs) that handle non-grid data with different topology structures within a discrete space. The inherent differences between their inputs make these CNN-based approaches not applicable to GNNs. In this paper, we propose to our best knowledge the first dedicated approach to distilling knowledge from a GNN without graph data. The proposed graph-free KD (GFKD) learns graph topology structures for knowledge transfer by modeling them with multinomial distribution. We then introduce a gradient estimator to optimize this framework. Essentially, the gradients w.r.t. graph structures are obtained by only using GNN forward-propagation without back-propagation, which means that GFKD is compatible with modern GNN libraries such as DGL and Geometric. Moreover, we provide the strategies for handling different types of prior knowledge in the graph data or the GNNs. Extensive experiments demonstrate that GFKD achieves the state-of-the-art performance for distilling knowledge from GNNs without training data.

Download Full-text

Novel Model Based on Stacked Autoencoders with Sample-Wise Strategy for Fault Diagnosis

Mathematical Problems in Engineering ◽

10.1155/2019/8985657 ◽

2019 ◽

Vol 2019 ◽

pp. 1-10

Author(s):

Diehao Kong ◽

Xuefeng Yan

Keyword(s):

Fault Diagnosis ◽

Chemical Engineering ◽

Ground Truth ◽

Student Model ◽

Teacher Student ◽

Stacked Autoencoders ◽

Knowledge Distillation ◽

New Perspective ◽

Current Student ◽

Teacher Model

Autoencoders are used for fault diagnosis in chemical engineering. To improve their performance, experts have paid close attention to regularized strategies and the creation of new and effective cost functions. However, existing methods are modified on the basis of only one model. This study provides a new perspective for strengthening the fault diagnosis model, which attempts to gain useful information from a model (teacher model) and applies it to a new model (student model). It pretrains the teacher model by fitting ground truth labels and then uses a sample-wise strategy to transfer knowledge from the teacher model. Finally, the knowledge and the ground truth labels are used to train the student model that is identical to the teacher model in terms of structure. The current student model is then used as the teacher of next student model. After step-by-step teacher-student reconfiguration and training, the optimal model is selected for fault diagnosis. Besides, knowledge distillation is applied in training procedures. The proposed method is applied to several benchmarked problems to prove its effectiveness.

Download Full-text

Segmenting Neuronal Structure in 3D Optical Microscope Images via Knowledge Distillation with Teacher-Student Network

2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019) ◽

10.1109/isbi.2019.8759326 ◽

2019 ◽

Cited By ~ 3

Author(s):

Heng Wang ◽

Donghao Zhang ◽

Yang Song ◽

Siqi Liu ◽

Yue Wang ◽

...

Keyword(s):

Optical Microscope ◽

Neuronal Structure ◽

Teacher Student ◽

Microscope Images ◽

Knowledge Distillation

Download Full-text

THE CONDITIONS FOR TRANSITING TO THE STAGE OF PERFORMANCE ENHANCEMENT IN 14–15-YEAR OLD WRESTLERS

Human Sport Medicine ◽

10.14529/hsm19s212 ◽

2020 ◽

Vol 19 (S2) ◽

pp. 88-95

Author(s):

N Suslov ◽

V Mishustin ◽

N Sentiabrev

Keyword(s):

Body Weight ◽

Body Length ◽

Motor Coordination ◽

Performance Enhancement ◽

Age Groups ◽

Growth Spurt ◽

Recording Device ◽

Training Process ◽

Young Males ◽

General Growth

Aim. The purpose of the article is searching for conditions in the training process for forming a permanent part of motor coordination typical for a competitive technique during the growth spurt. Materials and methods. Body length and body weight were measured in 2 groups of young males on an annual basis. The first group consisted of non-athletes (n = 18), the second group comprised of young weightlifters (n = 18). The examination was conducted in the following age groups: from 10 to 13 years and from 14 to 16 years. Body composition was assessed through the Matejko method. Special fitness was measured by the barbell velocity during the lift established with the help of the photoelectron device based on a transmitter (optical quantum generator), a photodetector, and a recording device. Using the correlation between the factual and model values of minimum fixation speed the efficiency of motor coordination was assessed in the range of 60–100% of barbell weight. These correlations were also used for establishing a limiting character of technique and strength. Results. The analysis of body weight during the annual cycle showed the change in weight categories. This reflected athletes’ weight gain while preserving their qualification. An earlier increase in general growth in young weightlifters has been established, which confirms the advisability of early specialization. It was found that when performing a jerk, the minimum fixation speed of the barbell increases with increasing body length. The insufficiency of this indicator compared to qualified athletes is compensated by an increase in the importance of the muscle factor, which reduces the reliability of motor actions. A training technique is proposed aimed at a more effective implementation of motor skills and a decrease in the barbell speed based on pedagogical instruction and information on the barbell speed. This resulted in a statistically significant (t = 2.89; p < 0.05) increase in the number of successful barbell lifts. Conclusion. When doing weightlifting, it is necessary to stimulate general growth and the growth of muscle mass with training loads being a leading factor in sports results. The training process combined with pedagogical instruction forms a stable motor skill and a reasonable transition to the stage of performance enhancement.

Download Full-text

Extract the Knowledge of Graph Neural Networks and Go Beyond it: An Effective Knowledge Distillation Framework

Proceedings of the Web Conference 2021 ◽

10.1145/3442381.3450068 ◽

2021 ◽

Author(s):

Cheng Yang ◽

Jiawei Liu ◽

Chuan Shi

Keyword(s):

Neural Networks ◽

Knowledge Distillation ◽

Graph Neural Networks

Download Full-text