teacher network
Recently Published Documents


TOTAL DOCUMENTS

138
(FIVE YEARS 14)

H-INDEX

5
(FIVE YEARS 2)

2021 ◽  
Vol 12 (1) ◽  
pp. 76
Author(s):  
Ju-Ho Kim ◽  
Hye-Jin Shim ◽  
Jee-Weon Jung ◽  
Ha-Jin Yu

The majority of recent speaker verification tasks are studied under open-set evaluation scenarios considering real-world conditions. The characteristics of these tasks imply that the generalization towards unseen speakers is a critical capability. Thus, this study aims to improve the generalization of the system for the performance enhancement of speaker verification. To achieve this goal, we propose a novel supervised-learning-method-based speaker verification system using the mean teacher framework. The mean teacher network refers to the temporal averaging of deep neural network parameters, which can produce a more accurate, stable representations than fixed weights at the end of training and is conventionally used for semi-supervised learning. Leveraging the success of the mean teacher framework in many studies, the proposed supervised learning method exploits the mean teacher network as an auxiliary model for better training of the main model, the student network. By learning the reliable intermediate representations derived from the mean teacher network as well as one-hot speaker labels, the student network is encouraged to explore more discriminative embedding spaces. The experimental results demonstrate that the proposed method relatively reduces the equal error rate by 11.61%, compared to the baseline system.


Author(s):  
Yi Xie ◽  
Fei Shen ◽  
Jianqing Zhu ◽  
Huanqiang Zeng

AbstractVehicle re-identification is a challenging task that matches vehicle images captured by different cameras. Recent vehicle re-identification approaches exploit complex deep networks to learn viewpoint robust features for obtaining accurate re-identification results, which causes large computations in their testing phases to restrict the vehicle re-identification speed. In this paper, we propose a viewpoint robust knowledge distillation (VRKD) method for accelerating vehicle re-identification. The VRKD method consists of a complex teacher network and a simple student network. Specifically, the teacher network uses quadruple directional deep networks to learn viewpoint robust features. The student network only contains a shallow backbone sub-network and a global average pooling layer. The student network distills viewpoint robust knowledge from the teacher network via minimizing the Kullback-Leibler divergence between the posterior probability distributions resulted from the student and teacher networks. As a result, the vehicle re-identification speed is significantly accelerated since only the student network of small testing computations is demanded. Experiments on VeRi776 and VehicleID datasets show that the proposed VRKD method outperforms many state-of-the-art vehicle re-identification approaches with better accurate and speed performance.


2020 ◽  
Author(s):  
Yi Xie ◽  
Fei Shen ◽  
Jianqing Zhu ◽  
Huanqiang Zeng

Abstract Vehicle re-identification is a challenging task that matches vehicle images captured by different cameras. Recent vehicle re-identification approaches exploit complex deep networks to learn viewpoint robust features for obtaining accurate re-identification results, which causes large computations in their testing phases to restrict the vehicle re-identification speed. In this paper, we propose a viewpoint robust knowledge distillation (VRKD) method for accelerating vehicle re-identification. The VRKD method consists of a complex teacher network and a simple student network. Specifically, the teacher network uses quadruple directional deep networks to learn viewpoint robust features. The student network only contains a shallow backbone sub-network and a global average pooling layer. The student network distills viewpoint robust knowledge from the teacher network via minimizing the Kullback-Leibler divergence between the posterior probability distributions resulted from the student and teacher networks. As a result, the vehicle re-identification speed is significantly accelerated since only the student network of small testing computations is demanded. Experiments on VeRi776 and VehicleID datasets show that the proposed VRKD method outperforms many state-of-the-art vehicle re-identification approaches with better accurate and speed performance.


2020 ◽  
Vol 65 (2) ◽  
pp. 5
Author(s):  
A. Szijártó ◽  
P. Lehotay-Kéry ◽  
A. Kiss

For more complex classification problems it is inevitable that we use increasingly complex and cumbersome classifying models. However, often we do not have the space or processing power to deploy these models.Knowledge distillation is an effective way to improve the accuracy of an otherwise smaller, simpler model using a more complex teacher network or ensemble of networks. This way we can have a classifier with an accuracy that is comparable to the accuracy of the teacher while small enough to deploy.In this paper we evaluate certain features of this distilling method, while trying to improve its results. These experiments and examinations and the discovered properties may also help to further develop this operation.


2020 ◽  
Author(s):  
Hongwei Liu ◽  
Shuai Luo ◽  
Shuaibing Guo

BACKGROUND Otitis media (OM) is a common ear disease, which can induce hearing loss and even life-threatening. However, due to poor classification performance, insufficient data, and high computational costs, OM cannot be diagnosed accurately. OBJECTIVE An optimized multi-teacher knowledge distillation method is proposed to realize the early diagnosis of otitis media with insufficient data at a lower computational cost. METHODS Based on ensemble learning and conventional knowledge distillation method, an optimized multi-teacher knowledge distillation method is proposed. The framework of the method consists of a teacher network and a student network. The teacher network is responsible for learning from raw data and exporting prior knowledge, and the student network is responsible for the diagnosis task. The teacher network is composed of three components: VGG, ResNet, and Inception. Each component could be regarded as a teacher to learn knowledge. The student network consists of three identical lightweight CNNs (convolutional neural networks). Each CNN could be viewed as a student to obtain the knowledge from teachers and execute the diagnosis task. First, three teachers learn from raw data separately to obtain prior knowledge. Then, the student is trained based on the learned knowledge from a teacher. This is a knowledge transfer process that could compress the teacher network and reduce the computational costs. Next, to improve the diagnosis accuracy, the predicted results of three well-trained students are fused based on two contrastive methods: the voting-based knowledge fusion method and the average-based knowledge fusion method. Finally, the well-trained model forms and could be used for the diagnosis task. The validity of the proposed method is verified on a tympanic membrane data set. RESULTS The well-trained model achieves a good performance in the early diagnosis of OM at a lower computational cost. The training diagnosis accuracy of the average-based model reaches 99.02%, and the testing diagnosis accuracy reaches 97.38%, which exceeds that of any teacher. Compared with using the teacher network for the diagnosis task directly, the training time of the proposed well-trained model reduces by 64.37%, which greatly shortens the calculation time. Three deep and large teachers are compressed into a lightweight well-trained model, which greatly reduces the computational costs. CONCLUSIONS The optimized multi-teacher knowledge distillation method is suitable for the early diagnosis of OM with insufficient data. In addition, the method realizes model compression and reduces the computational costs.


2020 ◽  
Vol 34 (07) ◽  
pp. 11865-11873 ◽  
Author(s):  
Yongri Piao ◽  
Zhengkun Rong ◽  
Miao Zhang ◽  
Huchuan Lu

Light field saliency detection is becoming of increasing interest in recent years due to the significant improvements in challenging scenes by using abundant light field cues. However, high dimension of light field data poses computation-intensive and memory-intensive challenges, and light field data access is far less ubiquitous as RGB data. These may severely impede practical applications of light field saliency detection. In this paper, we introduce an asymmetrical two-stream architecture inspired by knowledge distillation to confront these challenges. First, we design a teacher network to learn to exploit focal slices for higher requirements on desktop computers and meanwhile transfer comprehensive focusness knowledge to the student network. Our teacher network is achieved relying on two tailor-made modules, namely multi-focusness recruiting module (MFRM) and multi-focusness screening module (MFSM), respectively. Second, we propose two distillation schemes to train a student network towards memory and computation efficiency while ensuring the performance. The proposed distillation schemes ensure better absorption of focusness knowledge and enable the student to replace the focal slices with a single RGB image in an user-friendly way. We conduct the experiments on three benchmark datasets and demonstrate that our teacher network achieves state-of-the-arts performance and student network (ResNet18) achieves Top-1 accuracies on HFUT-LFSD dataset and Top-4 on DUT-LFSD, which tremendously minimizes the model size by 56% and boosts the Frame Per Second (FPS) by 159%, compared with the best performing method.


Author(s):  
Chenglin Yang ◽  
Lingxi Xie ◽  
Siyuan Qiao ◽  
Alan L. Yuille

We focus on the problem of training a deep neural network in generations. The flowchart is that, in order to optimize the target network (student), another network (teacher) with the same architecture is first trained, and used to provide part of supervision signals in the next stage. While this strategy leads to a higher accuracy, many aspects (e.g., why teacher-student optimization helps) still need further explorations.This paper studies this problem from a perspective of controlling the strictness in training the teacher network. Existing approaches mostly used a hard distribution (e.g., one-hot vectors) in training, leading to a strict teacher which itself has a high accuracy, but we argue that the teacher needs to be more tolerant, although this often implies a lower accuracy. The implementation is very easy, with merely an extra loss term added to the teacher network, facilitating a few secondary classes to emerge and complement to the primary class. Consequently, the teacher provides a milder supervision signal (a less peaked distribution), and makes it possible for the student to learn from inter-class similarity and potentially lower the risk of over-fitting. Experiments are performed on standard image classification tasks (CIFAR100 and ILSVRC2012). Although the teacher network behaves less powerful, the students show a persistent ability growth and eventually achieve higher classification accuracies than other competitors. Model ensemble and transfer feature extraction also verify the effectiveness of our approach.


Author(s):  
Joonsang Yu ◽  
Sungbum Kang ◽  
Kiyoung Choi

This paper proposes network recasting as a general method for network architecture transformation. The primary goal of this method is to accelerate the inference process through the transformation, but there can be many other practical applications. The method is based on block-wise recasting; it recasts each source block in a pre-trained teacher network to a target block in a student network. For the recasting, a target block is trained such that its output activation approximates that of the source block. Such a block-by-block recasting in a sequential manner transforms the network architecture while preserving the accuracy. This method can be used to transform an arbitrary teacher network type to an arbitrary student network type. It can even generate a mixed-architecture network that consists of two or more types of block. The network recasting can generate a network with fewer parameters and/or activations, which reduce the inference time significantly. Naturally, it can be used for network compression by recasting a trained network into a smaller network of the same type. Our experiments show that it outperforms previous compression approaches in terms of actual speedup on a GPU.


Sign in / Sign up

Export Citation Format

Share Document