teacher network Latest Research Papers

The majority of recent speaker verification tasks are studied under open-set evaluation scenarios considering real-world conditions. The characteristics of these tasks imply that the generalization towards unseen speakers is a critical capability. Thus, this study aims to improve the generalization of the system for the performance enhancement of speaker verification. To achieve this goal, we propose a novel supervised-learning-method-based speaker verification system using the mean teacher framework. The mean teacher network refers to the temporal averaging of deep neural network parameters, which can produce a more accurate, stable representations than fixed weights at the end of training and is conventionally used for semi-supervised learning. Leveraging the success of the mean teacher framework in many studies, the proposed supervised learning method exploits the mean teacher network as an auxiliary model for better training of the main model, the student network. By learning the reliable intermediate representations derived from the mean teacher network as well as one-hot speaker labels, the student network is encouraged to explore more discriminative embedding spaces. The experimental results demonstrate that the proposed method relatively reduces the equal error rate by 11.61%, compared to the baseline system.

Download Full-text

Viewpoint robust knowledge distillation for accelerating vehicle re-identification

EURASIP Journal on Advances in Signal Processing ◽

10.1186/s13634-021-00767-x ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Yi Xie ◽

Fei Shen ◽

Jianqing Zhu ◽

Huanqiang Zeng

Keyword(s):

Posterior Probability ◽

State Of The Art ◽

Probability Distributions ◽

Global Average ◽

Teacher Networks ◽

Leibler Divergence ◽

Deep Networks ◽

Speed Performance ◽

Knowledge Distillation ◽

Teacher Network

AbstractVehicle re-identification is a challenging task that matches vehicle images captured by different cameras. Recent vehicle re-identification approaches exploit complex deep networks to learn viewpoint robust features for obtaining accurate re-identification results, which causes large computations in their testing phases to restrict the vehicle re-identification speed. In this paper, we propose a viewpoint robust knowledge distillation (VRKD) method for accelerating vehicle re-identification. The VRKD method consists of a complex teacher network and a simple student network. Specifically, the teacher network uses quadruple directional deep networks to learn viewpoint robust features. The student network only contains a shallow backbone sub-network and a global average pooling layer. The student network distills viewpoint robust knowledge from the teacher network via minimizing the Kullback-Leibler divergence between the posterior probability distributions resulted from the student and teacher networks. As a result, the vehicle re-identification speed is significantly accelerated since only the student network of small testing computations is demanded. Experiments on VeRi776 and VehicleID datasets show that the proposed VRKD method outperforms many state-of-the-art vehicle re-identification approaches with better accurate and speed performance.

Download Full-text

Classification and Localization Consistency Regularized Student-Teacher Network for Semi-supervised Cervical Cell Detection

2021 IEEE 34th International Symposium on Computer-Based Medical Systems (CBMS) ◽

10.1109/cbms52027.2021.00079 ◽

2021 ◽

Author(s):

Menglu Zhang ◽

Xuechen Li ◽

Linlin Shen

Keyword(s):

Student Teacher ◽

Cell Detection ◽

Cervical Cell ◽

Teacher Network

Download Full-text

Viewpoint Robust Knowledge Distillation for Accelerating Vehicle Re-identification

10.21203/rs.3.rs-104548/v1 ◽

2020 ◽

Author(s):

Yi Xie ◽

Fei Shen ◽

Jianqing Zhu ◽

Huanqiang Zeng

Keyword(s):

Posterior Probability ◽

State Of The Art ◽

Probability Distributions ◽

Global Average ◽

Teacher Networks ◽

Leibler Divergence ◽

Deep Networks ◽

Speed Performance ◽

Knowledge Distillation ◽

Teacher Network

Abstract Vehicle re-identification is a challenging task that matches vehicle images captured by different cameras. Recent vehicle re-identification approaches exploit complex deep networks to learn viewpoint robust features for obtaining accurate re-identification results, which causes large computations in their testing phases to restrict the vehicle re-identification speed. In this paper, we propose a viewpoint robust knowledge distillation (VRKD) method for accelerating vehicle re-identification. The VRKD method consists of a complex teacher network and a simple student network. Specifically, the teacher network uses quadruple directional deep networks to learn viewpoint robust features. The student network only contains a shallow backbone sub-network and a global average pooling layer. The student network distills viewpoint robust knowledge from the teacher network via minimizing the Kullback-Leibler divergence between the posterior probability distributions resulted from the student and teacher networks. As a result, the vehicle re-identification speed is significantly accelerated since only the student network of small testing computations is demanded. Experiments on VeRi776 and VehicleID datasets show that the proposed VRKD method outperforms many state-of-the-art vehicle re-identification approaches with better accurate and speed performance.

Download Full-text

Experimental Study of Some Properties of Knowledge Distillation

Studia Universitatis Babeș-Bolyai Informatica ◽

10.24193/subbi.2020.2.01 ◽

2020 ◽

Vol 65 (2) ◽

pp. 5

Author(s):

A. Szijártó ◽

P. Lehotay-Kéry ◽

A. Kiss

Keyword(s):

Experimental Study ◽

Classification Problems ◽

Processing Power ◽

Knowledge Distillation ◽

Teacher Network

For more complex classification problems it is inevitable that we use increasingly complex and cumbersome classifying models. However, often we do not have the space or processing power to deploy these models.Knowledge distillation is an effective way to improve the accuracy of an otherwise smaller, simpler model using a more complex teacher network or ensemble of networks. This way we can have a classifier with an accuracy that is comparable to the accuracy of the teacher while small enough to deploy.In this paper we evaluate certain features of this distilling method, while trying to improve its results. These experiments and examinations and the discovered properties may also help to further develop this operation.

Download Full-text

An Optimized Multi-teacher Knowledge Distillation Method: Application to Early Diagnosis of Otitis Media (Preprint)

10.2196/preprints.22690 ◽

2020 ◽

Author(s):

Hongwei Liu ◽

Shuai Luo ◽

Shuaibing Guo

Keyword(s):

Early Diagnosis ◽

Teacher Knowledge ◽

Otitis Media ◽

Computational Cost ◽

Insufficient Data ◽

Distillation Method ◽

Computational Costs ◽

Knowledge Distillation ◽

Diagnosis Accuracy ◽

Teacher Network

BACKGROUND Otitis media (OM) is a common ear disease, which can induce hearing loss and even life-threatening. However, due to poor classification performance, insufficient data, and high computational costs, OM cannot be diagnosed accurately. OBJECTIVE An optimized multi-teacher knowledge distillation method is proposed to realize the early diagnosis of otitis media with insufficient data at a lower computational cost. METHODS Based on ensemble learning and conventional knowledge distillation method, an optimized multi-teacher knowledge distillation method is proposed. The framework of the method consists of a teacher network and a student network. The teacher network is responsible for learning from raw data and exporting prior knowledge, and the student network is responsible for the diagnosis task. The teacher network is composed of three components: VGG, ResNet, and Inception. Each component could be regarded as a teacher to learn knowledge. The student network consists of three identical lightweight CNNs (convolutional neural networks). Each CNN could be viewed as a student to obtain the knowledge from teachers and execute the diagnosis task. First, three teachers learn from raw data separately to obtain prior knowledge. Then, the student is trained based on the learned knowledge from a teacher. This is a knowledge transfer process that could compress the teacher network and reduce the computational costs. Next, to improve the diagnosis accuracy, the predicted results of three well-trained students are fused based on two contrastive methods: the voting-based knowledge fusion method and the average-based knowledge fusion method. Finally, the well-trained model forms and could be used for the diagnosis task. The validity of the proposed method is verified on a tympanic membrane data set. RESULTS The well-trained model achieves a good performance in the early diagnosis of OM at a lower computational cost. The training diagnosis accuracy of the average-based model reaches 99.02%, and the testing diagnosis accuracy reaches 97.38%, which exceeds that of any teacher. Compared with using the teacher network for the diagnosis task directly, the training time of the proposed well-trained model reduces by 64.37%, which greatly shortens the calculation time. Three deep and large teachers are compressed into a lightweight well-trained model, which greatly reduces the computational costs. CONCLUSIONS The optimized multi-teacher knowledge distillation method is suitable for the early diagnosis of OM with insufficient data. In addition, the method realizes model compression and reduces the computational costs.

Download Full-text

Exploit and Replace: An Asymmetrical Two-Stream Architecture for Versatile Light Field Saliency Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6860 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11865-11873 ◽

Cited By ~ 3

Author(s):

Yongri Piao ◽

Zhengkun Rong ◽

Miao Zhang ◽

Huchuan Lu

Keyword(s):

Field Data ◽

Light Field ◽

Saliency Detection ◽

Data Access ◽

Practical Applications ◽

Computation Efficiency ◽

The Arts ◽

Benchmark Datasets ◽

Teacher Network ◽

Stream Architecture

Light field saliency detection is becoming of increasing interest in recent years due to the significant improvements in challenging scenes by using abundant light field cues. However, high dimension of light field data poses computation-intensive and memory-intensive challenges, and light field data access is far less ubiquitous as RGB data. These may severely impede practical applications of light field saliency detection. In this paper, we introduce an asymmetrical two-stream architecture inspired by knowledge distillation to confront these challenges. First, we design a teacher network to learn to exploit focal slices for higher requirements on desktop computers and meanwhile transfer comprehensive focusness knowledge to the student network. Our teacher network is achieved relying on two tailor-made modules, namely multi-focusness recruiting module (MFRM) and multi-focusness screening module (MFSM), respectively. Second, we propose two distillation schemes to train a student network towards memory and computation efficiency while ensuring the performance. The proposed distillation schemes ensure better absorption of focusness knowledge and enable the student to replace the focal slices with a single RGB image in an user-friendly way. We conduct the experiments on three benchmark datasets and demonstrate that our teacher network achieves state-of-the-arts performance and student network (ResNet18) achieves Top-1 accuracies on HFUT-LFSD dataset and Top-4 on DUT-LFSD, which tremendously minimizes the model size by 56% and boosts the Frame Per Second (FPS) by 159%, compared with the best performing method.

Download Full-text

Important Parameter Optimized Flow-based Transfer Learning Technique Supporting Heterogeneous Teacher Network Based on Deep Learning

The Journal of Korean Institute of Information Technology ◽

10.14801/jkiit.2020.18.3.21 ◽

2020 ◽

Vol 18 (3) ◽

pp. 21-29

Author(s):

Kwihoon Kim ◽

Ji-Hoon Bae

Keyword(s):

Deep Learning ◽

Transfer Learning ◽

Learning Technique ◽

Teacher Network

Download Full-text

Training Deep Neural Networks in Generations: A More Tolerant Teacher Educates Better Students

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015628 ◽

2019 ◽

Vol 33 ◽

pp. 5628-5635 ◽

Cited By ~ 10

Author(s):

Chenglin Yang ◽

Lingxi Xie ◽

Siyuan Qiao ◽

Alan L. Yuille

Keyword(s):

Deep Neural Networks ◽

Model Ensemble ◽

Teacher Student ◽

Lower Accuracy ◽

Teacher Needs ◽

Target Network ◽

Classification Tasks ◽

Teacher Network ◽

Secondary Classes ◽

Loss Term

We focus on the problem of training a deep neural network in generations. The flowchart is that, in order to optimize the target network (student), another network (teacher) with the same architecture is first trained, and used to provide part of supervision signals in the next stage. While this strategy leads to a higher accuracy, many aspects (e.g., why teacher-student optimization helps) still need further explorations.This paper studies this problem from a perspective of controlling the strictness in training the teacher network. Existing approaches mostly used a hard distribution (e.g., one-hot vectors) in training, leading to a strict teacher which itself has a high accuracy, but we argue that the teacher needs to be more tolerant, although this often implies a lower accuracy. The implementation is very easy, with merely an extra loss term added to the teacher network, facilitating a few secondary classes to emerge and complement to the primary class. Consequently, the teacher provides a milder supervision signal (a less peaked distribution), and makes it possible for the student to learn from inter-class similarity and potentially lower the risk of over-fitting. Experiments are performed on standard image classification tasks (CIFAR100 and ILSVRC2012). Although the teacher network behaves less powerful, the students show a persistent ability growth and eventually achieve higher classification accuracies than other competitors. Model ensemble and transfer feature extraction also verify the effectiveness of our approach.

Download Full-text

Network Recasting: A Universal Method for Network Architecture Transformation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015701 ◽

2019 ◽

Vol 33 ◽

pp. 5701-5708

Author(s):

Joonsang Yu ◽

Sungbum Kang ◽

Kiyoung Choi

Keyword(s):

Network Architecture ◽

Universal Method ◽

Practical Applications ◽

Trained Teacher ◽

Source Block ◽

Teacher Network ◽

Network Compression ◽

Target Block ◽

Network Type ◽

General Method

This paper proposes network recasting as a general method for network architecture transformation. The primary goal of this method is to accelerate the inference process through the transformation, but there can be many other practical applications. The method is based on block-wise recasting; it recasts each source block in a pre-trained teacher network to a target block in a student network. For the recasting, a target block is trained such that its output activation approximates that of the source block. Such a block-by-block recasting in a sequential manner transforms the network architecture while preserving the accuracy. This method can be used to transform an arbitrary teacher network type to an arbitrary student network type. It can even generate a mixed-architecture network that consists of two or more types of block. The network recasting can generate a network with fewer parameters and/or activations, which reduce the inference time significantly. Naturally, it can be used for network compression by recasting a trained network into a smaller network of the same type. Our experiments show that it outperforms previous compression approaches in terms of actual speedup on a GPU.

Download Full-text

teacher network
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Supervised Learning Method for Improving the Generalization of Speaker Verification Systems by Learning Metrics from a Mean Teacher

Viewpoint robust knowledge distillation for accelerating vehicle re-identification

Classification and Localization Consistency Regularized Student-Teacher Network for Semi-supervised Cervical Cell Detection

Viewpoint Robust Knowledge Distillation for Accelerating Vehicle Re-identification

Experimental Study of Some Properties of Knowledge Distillation

An Optimized Multi-teacher Knowledge Distillation Method: Application to Early Diagnosis of Otitis Media (Preprint)

Exploit and Replace: An Asymmetrical Two-Stream Architecture for Versatile Light Field Saliency Detection

Important Parameter Optimized Flow-based Transfer Learning Technique Supporting Heterogeneous Teacher Network Based on Deep Learning

Training Deep Neural Networks in Generations: A More Tolerant Teacher Educates Better Students

Network Recasting: A Universal Method for Network Architecture Transformation

Export Citation Format

teacher networkRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Supervised Learning Method for Improving the Generalization of Speaker Verification Systems by Learning Metrics from a Mean Teacher

Viewpoint robust knowledge distillation for accelerating vehicle re-identification

Classification and Localization Consistency Regularized Student-Teacher Network for Semi-supervised Cervical Cell Detection

Viewpoint Robust Knowledge Distillation for Accelerating Vehicle Re-identification

Experimental Study of Some Properties of Knowledge Distillation

An Optimized Multi-teacher Knowledge Distillation Method: Application to Early Diagnosis of Otitis Media (Preprint)

Exploit and Replace: An Asymmetrical Two-Stream Architecture for Versatile Light Field Saliency Detection

Important Parameter Optimized Flow-based Transfer Learning Technique Supporting Heterogeneous Teacher Network Based on Deep Learning

Training Deep Neural Networks in Generations: A More Tolerant Teacher Educates Better Students

Network Recasting: A Universal Method for Network Architecture Transformation

teacher network
Recently Published Documents