Experimental Study of Some Properties of  Knowledge Distillation

For more complex classification problems it is inevitable that we use increasingly complex and cumbersome classifying models. However, often we do not have the space or processing power to deploy these models.Knowledge distillation is an effective way to improve the accuracy of an otherwise smaller, simpler model using a more complex teacher network or ensemble of networks. This way we can have a classifier with an accuracy that is comparable to the accuracy of the teacher while small enough to deploy.In this paper we evaluate certain features of this distilling method, while trying to improve its results. These experiments and examinations and the discovered properties may also help to further develop this operation.

Download Full-text

Robust CNN Compression Framework for Security-Sensitive Embedded Systems

Applied Sciences ◽

10.3390/app11031093 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1093

Author(s):

Jeonghyun Lee ◽

Sangkyun Lee

Keyword(s):

Embedded Systems ◽

Optimization Problem ◽

State Of The Art ◽

Classification Problems ◽

Proximal Gradient Method ◽

Knowledge Distillation ◽

New Type ◽

Adversarial Examples ◽

Adversarial Training ◽

Memory Efficient

Convolutional neural networks (CNNs) have achieved tremendous success in solving complex classification problems. Motivated by this success, there have been proposed various compression methods for downsizing the CNNs to deploy them on resource-constrained embedded systems. However, a new type of vulnerability of compressed CNNs known as the adversarial examples has been discovered recently, which is critical for security-sensitive systems because the adversarial examples can cause malfunction of CNNs and can be crafted easily in many cases. In this paper, we proposed a compression framework to produce compressed CNNs robust against such adversarial examples. To achieve the goal, our framework uses both pruning and knowledge distillation with adversarial training. We formulate our framework as an optimization problem and provide a solution algorithm based on the proximal gradient method, which is more memory-efficient than the popular ADMM-based compression approaches. In experiments, we show that our framework can improve the trade-off between adversarial robustness and compression rate compared to the existing state-of-the-art adversarial pruning approach.

Download Full-text

An Optimized Multi-teacher Knowledge Distillation Method: Application to Early Diagnosis of Otitis Media (Preprint)

10.2196/preprints.22690 ◽

2020 ◽

Author(s):

Hongwei Liu ◽

Shuai Luo ◽

Shuaibing Guo

Keyword(s):

Early Diagnosis ◽

Teacher Knowledge ◽

Otitis Media ◽

Computational Cost ◽

Insufficient Data ◽

Distillation Method ◽

Computational Costs ◽

Knowledge Distillation ◽

Diagnosis Accuracy ◽

Teacher Network

BACKGROUND Otitis media (OM) is a common ear disease, which can induce hearing loss and even life-threatening. However, due to poor classification performance, insufficient data, and high computational costs, OM cannot be diagnosed accurately. OBJECTIVE An optimized multi-teacher knowledge distillation method is proposed to realize the early diagnosis of otitis media with insufficient data at a lower computational cost. METHODS Based on ensemble learning and conventional knowledge distillation method, an optimized multi-teacher knowledge distillation method is proposed. The framework of the method consists of a teacher network and a student network. The teacher network is responsible for learning from raw data and exporting prior knowledge, and the student network is responsible for the diagnosis task. The teacher network is composed of three components: VGG, ResNet, and Inception. Each component could be regarded as a teacher to learn knowledge. The student network consists of three identical lightweight CNNs (convolutional neural networks). Each CNN could be viewed as a student to obtain the knowledge from teachers and execute the diagnosis task. First, three teachers learn from raw data separately to obtain prior knowledge. Then, the student is trained based on the learned knowledge from a teacher. This is a knowledge transfer process that could compress the teacher network and reduce the computational costs. Next, to improve the diagnosis accuracy, the predicted results of three well-trained students are fused based on two contrastive methods: the voting-based knowledge fusion method and the average-based knowledge fusion method. Finally, the well-trained model forms and could be used for the diagnosis task. The validity of the proposed method is verified on a tympanic membrane data set. RESULTS The well-trained model achieves a good performance in the early diagnosis of OM at a lower computational cost. The training diagnosis accuracy of the average-based model reaches 99.02%, and the testing diagnosis accuracy reaches 97.38%, which exceeds that of any teacher. Compared with using the teacher network for the diagnosis task directly, the training time of the proposed well-trained model reduces by 64.37%, which greatly shortens the calculation time. Three deep and large teachers are compressed into a lightweight well-trained model, which greatly reduces the computational costs. CONCLUSIONS The optimized multi-teacher knowledge distillation method is suitable for the early diagnosis of OM with insufficient data. In addition, the method realizes model compression and reduces the computational costs.

Download Full-text

Computational and experimental study of classification of a mixture of dissimilar components

Vestnik IGEU ◽

10.17588/2072-2672.2020.5.056-063 ◽

2020 ◽

pp. 56-63

Author(s):

A.Ye. Barochkin ◽

A.N. Belyakov ◽

H. Otwinowski ◽

T. Wylecial ◽

E.V. Barochkin

Keyword(s):

Experimental Study ◽

Physical Properties ◽

Optimization Problems ◽

Systems Dynamics ◽

Classification Problems ◽

Relevant Issue ◽

Separation Technology ◽

The Difference ◽

Main Component

The classification of particles by size is traditionally considered in relation to homogeneous materials, which must be divided into coarse and fine products. However, often there are the impurities in the material that differ in their physical properties from the base component. When classifying such mixtures, the difference in physical properties can be used to isolate, purify, or enrich the main component. The choice of the technology for such processing dissimilar components is possible based on simple and adequate models. The formulation and solution of classification problems for mixtures of dissimilar components on the basis of adequate models is the relevant issue for the power industry and related industries. Fundamental laws of dispersed systems dynamics are used to simulate the classification process; mathematical programming methods are used to identify models and improve separation technology. Experimental study of the separation of a mixture of dissimilar components in a two-stage classifying system has been carried out. Using the obtained experimental data, the model was identified, and its adequacy was shown. The presented experimental results and computational model can be used to formulate and solve optimization problems of fractionation of dispersed materials and to increase the efficiency of the process in classifying systems. The results obtained can be used in the energy, chemical and other industries to improve the efficiency of resource and energy-saving technologies for obtaining dispersed products with acceptable content of impurities.

Download Full-text

Viewpoint Robust Knowledge Distillation for Accelerating Vehicle Re-identification

10.21203/rs.3.rs-104548/v1 ◽

2020 ◽

Author(s):

Yi Xie ◽

Fei Shen ◽

Jianqing Zhu ◽

Huanqiang Zeng

Keyword(s):

Posterior Probability ◽

State Of The Art ◽

Probability Distributions ◽

Global Average ◽

Teacher Networks ◽

Leibler Divergence ◽

Deep Networks ◽

Speed Performance ◽

Knowledge Distillation ◽

Teacher Network

Abstract Vehicle re-identification is a challenging task that matches vehicle images captured by different cameras. Recent vehicle re-identification approaches exploit complex deep networks to learn viewpoint robust features for obtaining accurate re-identification results, which causes large computations in their testing phases to restrict the vehicle re-identification speed. In this paper, we propose a viewpoint robust knowledge distillation (VRKD) method for accelerating vehicle re-identification. The VRKD method consists of a complex teacher network and a simple student network. Specifically, the teacher network uses quadruple directional deep networks to learn viewpoint robust features. The student network only contains a shallow backbone sub-network and a global average pooling layer. The student network distills viewpoint robust knowledge from the teacher network via minimizing the Kullback-Leibler divergence between the posterior probability distributions resulted from the student and teacher networks. As a result, the vehicle re-identification speed is significantly accelerated since only the student network of small testing computations is demanded. Experiments on VeRi776 and VehicleID datasets show that the proposed VRKD method outperforms many state-of-the-art vehicle re-identification approaches with better accurate and speed performance.

Download Full-text

Gender and Handedness Prediction from Offline Handwriting Using Convolutional Neural Networks

Complexity ◽

10.1155/2018/3891624 ◽

2018 ◽

Vol 2018 ◽

pp. 1-14 ◽

Cited By ~ 5

Author(s):

Ángel Morera ◽

Ángel Sánchez ◽

José Francisco Vélez ◽

Ana Belén Moreno

Keyword(s):

Neural Networks ◽

Experimental Study ◽

Convolutional Neural Networks ◽

Deep Neural Networks ◽

Classification Problem ◽

Gender Classification ◽

Network Configuration ◽

Classification Problems ◽

Design Complexity ◽

Average Accuracy

Demographic handwriting-based classification problems, such as gender and handedness categorizations, present interesting applications in disciplines like Forensic Biometrics. This work describes an experimental study on the suitability of deep neural networks to three automatic demographic problems: gender, handedness, and combined gender-and-handedness classifications, respectively. Our research was carried out on two public handwriting databases: the IAM dataset containing English texts and the KHATT one with Arabic texts. The considered problems present a high intrinsic difficulty when extracting specific relevant features for discriminating the involved subclasses. Our solution is based on convolutional neural networks since these models had proven better capabilities to extract good features when compared to hand-crafted ones. Our work also describes the first approach to the combined gender-and-handedness prediction, which has not been addressed before by other researchers. Moreover, the proposed solutions have been designed using a unique network configuration for the three considered demographic problems, which has the advantage of simplifying the design complexity and debugging of these deep architectures when handling related handwriting problems. Finally, the comparison of achieved results to those presented in related works revealed the best average accuracy in the gender classification problem for the considered datasets.

Download Full-text

Viewpoint robust knowledge distillation for accelerating vehicle re-identification

EURASIP Journal on Advances in Signal Processing ◽

10.1186/s13634-021-00767-x ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Yi Xie ◽

Fei Shen ◽

Jianqing Zhu ◽

Huanqiang Zeng

Keyword(s):

Posterior Probability ◽

State Of The Art ◽

Probability Distributions ◽

Global Average ◽

Teacher Networks ◽

Leibler Divergence ◽

Deep Networks ◽

Speed Performance ◽

Knowledge Distillation ◽

Teacher Network

AbstractVehicle re-identification is a challenging task that matches vehicle images captured by different cameras. Recent vehicle re-identification approaches exploit complex deep networks to learn viewpoint robust features for obtaining accurate re-identification results, which causes large computations in their testing phases to restrict the vehicle re-identification speed. In this paper, we propose a viewpoint robust knowledge distillation (VRKD) method for accelerating vehicle re-identification. The VRKD method consists of a complex teacher network and a simple student network. Specifically, the teacher network uses quadruple directional deep networks to learn viewpoint robust features. The student network only contains a shallow backbone sub-network and a global average pooling layer. The student network distills viewpoint robust knowledge from the teacher network via minimizing the Kullback-Leibler divergence between the posterior probability distributions resulted from the student and teacher networks. As a result, the vehicle re-identification speed is significantly accelerated since only the student network of small testing computations is demanded. Experiments on VeRi776 and VehicleID datasets show that the proposed VRKD method outperforms many state-of-the-art vehicle re-identification approaches with better accurate and speed performance.

Download Full-text

Novel evolutionary algorithms for supervised classification problems: an experimental study

Evolutionary Intelligence ◽

10.1007/s12065-010-0047-7 ◽

2011 ◽

Vol 4 (1) ◽

pp. 3-16 ◽

Cited By ~ 14

Author(s):

Pu Wang ◽

Thomas Weise ◽

Raymond Chiong

Keyword(s):

Experimental Study ◽

Evolutionary Algorithms ◽

Supervised Classification ◽

Classification Problems

Download Full-text

Progressive Blockwise Knowledge Distillation for Neural Network Acceleration

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/384 ◽

2018 ◽

Cited By ~ 5

Author(s):

Hui Wang ◽

Hanbin Zhao ◽

Xi Li ◽

Xu Tan

Keyword(s):

Neural Network ◽

Function Approximation ◽

State Of The Art ◽

Design Criterion ◽

Structure Design ◽

Model Accuracy ◽

Teacher Student ◽

Block Level ◽

Knowledge Distillation ◽

Teacher Network

As an important and challenging problem in machine learning and computer vision, neural network acceleration essentially aims to enhance the computational efficiency without sacrificing the model accuracy too much. In this paper, we propose a progressive blockwise learning scheme for teacher-student model distillation at the subnetwork block level. The proposed scheme is able to distill the knowledge of the entire teacher network by locally extracting the knowledge of each block in terms of progressive blockwise function approximation. Furthermore, we propose a structure design criterion for the student subnetwork block, which is able to effectively preserve the original receptive field from the teacher network. Experimental results demonstrate the effectiveness of the proposed scheme against the state-of-the-art approaches.

Download Full-text

Experimental study on the restoration from a tilt-azimuth series for improvement in resolution

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s042482010013955x ◽

1995 ◽

Vol 53 ◽

pp. 636-637

Author(s):

Norio Baba ◽

Norihiko Ichise ◽

Syunya Watanabe

Keyword(s):

Experimental Study ◽

Spherical Aberration ◽

Incident Beam ◽

Optical Parameters ◽

Beam Tilting ◽

Illumination Method ◽

Restoration Method ◽

Illumination Mode

The tilted beam illumination method is used to improve the resolution comparing with the axial illumination mode. Using this advantage, a restoration method of several tilted beam images covering the full azimuthal range was proposed by Saxton, and experimentally examined. To make this technique more reliable it seems that some practical problems still remain. In this report the restoration was attempted and the problems were considered. In our study, four problems were pointed out for the experiment of the restoration. (1) Accurate beam tilt adjustment to fit the incident beam to the coma-free axis for the symmetrical beam tilting over the full azimuthal range. (2) Accurate measurements of the optical parameters which are necessary to design the restoration filter. Even if the spherical aberration coefficient Cs is known with accuracy and the axial astigmatism is sufficiently compensated, at least the defocus value must be measured. (3) Accurate alignment of the tilt-azimuth series images.

Download Full-text