teacher model
Recently Published Documents


TOTAL DOCUMENTS

79
(FIVE YEARS 37)

H-INDEX

9
(FIVE YEARS 2)

2021 ◽  
Vol 12 (1) ◽  
pp. 44
Author(s):  
Seokjin Lee ◽  
Minhan Kim ◽  
Seunghyeon Shin ◽  
Seungjae Baek ◽  
Sooyoung Park ◽  
...  

In recent acoustic scene classification (ASC) models, various auxiliary methods to enhance performance have been applied, e.g., subsystem ensembles and data augmentations. Particularly, the ensembles of several submodels may be effective in the ASC models, but there is a problem with increasing the size of the model because it contains several submodels. Therefore, it is hard to be used in model-complexity-limited ASC tasks. In this paper, we would like to find the performance enhancement method while taking advantage of the model ensemble technique without increasing the model size. Our method is proposed based on a mean-teacher model, which is developed for consistency learning in semi-supervised learning. Because our problem is supervised learning, which is different from the purpose of the conventional mean-teacher model, we modify detailed strategies to maximize the consistency learning performance. To evaluate the effectiveness of our method, experiments were performed with an ASC database from the Detection and Classification of Acoustic Scenes and Events 2021 Task 1A. The small-sized ASC model with our proposed method improved the log loss performance up to 1.009 and the F1-score performance by 67.12%, whereas the vanilla ASC model showed a log loss of 1.052 and an F1-score of 65.79%.


2021 ◽  
Vol 98 (3) ◽  
pp. 412-435
Author(s):  
Joseph E. Blado

Abstract Recently, social epistemologists have sought to establish what the governing epistemic relationship should be between novices and experts. In this article, the author argues for, and expands upon, Helen De Cruz’s expert-as-teacher model. For although this model is vulnerable to significant challenges, the author proposes that a specifically extended version can sufficiently overcome these challenges (call this the “extended-expert-as-teacher” model, or the “EEAT” model). First, the author shows the respective weaknesses of three influential models in the literature. Then, he argues the expert-as-teacher model can overcome its weaknesses by adding what he calls the “Authority Clause”, “Advisor Clause”, and “Ex Post Facto Clause” of the EEAT model. After developing a robust account of these clauses, the author entertains three major objections. First, he responds to the charge that the EEAT model is little better than the expert-as-authority model. Second, he responds to a double-counting objection. Lastly, he responds to a pragmatic objection from complexity.


2021 ◽  
Vol 50 (2) ◽  
pp. 102-115
Author(s):  
Alexius Chia ◽  
◽  
Stefanie Chye ◽  
Bee-Leng Chua ◽  
◽  
...  

This concept paper describes the changes made to Singapore’s initial teacher preparation (ITP) programmes with a specific focus on its thinking teacher model (NIE, 2009) – a model of teacher agency and an approach to ITP that requires self-reflection on roles and practice, understanding theories and research, and adapting to changing learner needs (Tan & Liu, 2015). An important component of this model is a ‘meta’ course which all pre-service teachers are required to undergo. This ‘meta’ course called Professional Practice and Inquiry (PPI) initiative – which was introduced to develop reflective professionals – cuts through the entire ITP programme providing them with both a framework and a platform to curate their understandings across all their courses, reflect deeply about teaching and learning and highlight their best work. This paper demonstrates, by the use of vignettes from their reflective pieces, how the goals and various components made possible by the PPI initiative provided the impetus for English pre-service teachers to develop into autonomous thinking teachers.


Author(s):  
Taehyeon Kim ◽  
Jaehoon Oh ◽  
Nak Yil Kim ◽  
Sangwook Cho ◽  
Se-Young Yun

Knowledge distillation (KD), transferring knowledge from a cumbersome teacher model to a lightweight student model, has been investigated to design efficient neural architectures. Generally, the objective function of KD is the Kullback-Leibler (KL) divergence loss between the softened probability distributions of the teacher model and the student model with the temperature scaling hyperparameter τ. Despite its widespread use, few studies have discussed how such softening influences generalization. Here, we theoretically show that the KL divergence loss focuses on the logit matching when τ increases and the label matching when τ goes to 0 and empirically show that the logit matching is positively correlated to performance improvement in general. From this observation, we consider an intuitive KD loss function, the mean squared error (MSE) between the logit vectors, so that the student model can directly learn the logit of the teacher model. The MSE loss outperforms the KL divergence loss, explained by the penultimate layer representations difference between the two losses. Furthermore, we show that sequential distillation can improve performance and that KD, using the KL divergence loss with small τ particularly, mitigates the label noise. The code to reproduce the experiments is publicly available online at https://github.com/jhoon-oh/kd_data/.


Author(s):  
Wanyun Cui ◽  
Sen Yan

Knowledge distillation uses both real hard labels and soft labels predicted by teacher model as supervision. Intuitively, we expect the soft label probabilities and hard label probabilities to be concordant. However, in the real knowledge distillations, we found critical rank violations between hard labels and soft labels for augmented samples. For example, for an augmented sample x = 0.7 * cat + 0.3 * panda, a meaningful soft label distribution should have the same rank: P(cat|x)>P(panda|x)>P(other|x). But real teacher models usually violate the rank: P(tiger|x)>P(panda|x)>P(cat|x). We attribute the rank violations to the increased difficulty of understanding augmented samples for the teacher model. Empirically, we found the violations injuries the knowledge transfer. In this paper, we denote eliminating rank violations in data augmentation for knowledge distillation as isotonic data augmentation (IDA). We use isotonic regression (IR) -- a classic statistical algorithm -- to eliminate the rank violations. We show that IDA can be modeled as a tree-structured IR problem and gives an O(c*log(c)) optimal algorithm, where c is the number of labels. In order to further reduce the time complexity of the optimal algorithm, we also proposed a GPU-friendly approximation algorithm with linear time complexity. We have verified on variant datasets and data augmentation baselines that (1) the rank violation is a general phenomenon for data augmentation in knowledge distillation. And (2) our proposed IDA algorithms effectively increases the accuracy of knowledge distillation by solving the ranking violations.


Author(s):  
Chao Ye ◽  
Huaidong Zhang ◽  
Xuemiao Xu ◽  
Weiwei Cai ◽  
Jing Qin ◽  
...  

Deep neural networks have been shown to be very powerful tools for object detection in various scenes. Their remarkable performance, however, heavily depends on the availability of a large number of high quality labeled data, which are time-consuming and costly to acquire for scenes with densely packed objects. We present a novel semi-supervised approach to addressing this problem, which is designed based on a common teacher-student model, integrated with a novel intersection-over-union (IoU) aware consistency loss and a new proposal consistency loss. The IoU-aware consistency loss evaluates the IoU over the prediction pairs of the teacher model and the student model, which enforces the prediction of the student model to approach closely to that of the teacher model. The IoU-aware consistency loss also reweights the importance of different prediction pairs to suppress the low-confident pairs. The proposal consistency loss ensures proposal consistency between the two models, making it possible to involve the region proposal network in the training process with unlabeled data. We also construct a new dataset, namely RebarDSC, containing 2,125 rebar images annotated with 350,348 bounding boxes in total (164.9 annotations per image average), to evaluate the proposed method. Extensive experiments are conducted over both the RebarDSC dataset and the famous large public dataset SKU-110K. Experimental results corroborate that the proposed method is able to improve the object detection performance in densely packed scenes, consistently outperforming state-of-the-art approaches. Dataset is available in https://github.com/Armin1337/RebarDSC.


2021 ◽  
Author(s):  
Thrivikram GL ◽  
Vidya Ganesh ◽  
T V Sethuraman ◽  
Satheesh K. Perepu

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Mingyong Li ◽  
Qiqi Li ◽  
Lirong Tang ◽  
Shuang Peng ◽  
Yan Ma ◽  
...  

Cross-modal hashing encodes heterogeneous multimedia data into compact binary code to achieve fast and flexible retrieval across different modalities. Due to its low storage cost and high retrieval efficiency, it has received widespread attention. Supervised deep hashing significantly improves search performance and usually yields more accurate results, but requires a lot of manual annotation of the data. In contrast, unsupervised deep hashing is difficult to achieve satisfactory performance due to the lack of reliable supervisory information. To solve this problem, inspired by knowledge distillation, we propose a novel unsupervised knowledge distillation cross-modal hashing method based on semantic alignment (SAKDH), which can reconstruct the similarity matrix using the hidden correlation information of the pretrained unsupervised teacher model, and the reconstructed similarity matrix can be used to guide the supervised student model. Specifically, firstly, the teacher model adopted an unsupervised semantic alignment hashing method, which can construct a modal fusion similarity matrix. Secondly, under the supervision of teacher model distillation information, the student model can generate more discriminative hash codes. Experimental results on two extensive benchmark datasets (MIRFLICKR-25K and NUS-WIDE) show that compared to several representative unsupervised cross-modal hashing methods, the mean average precision (MAP) of our proposed method has achieved a significant improvement. It fully reflects its effectiveness in large-scale cross-modal data retrieval.


2021 ◽  
pp. 102-110
Author(s):  
Dominic Scott ◽  
R. Edward Freeman

For Plato, the model of the leader as shepherd suggests protectiveness and care. But he also realizes that it could be used to make the opposite point: the leader only appears to protect the flock; his ulterior motive is to exploit them for profit. As a result, the model divides into two: the good shepherd, who cares only for the flock, and the bad one, who seeks to exploit them, like a tyrant. Another problem is that the model elevates the leader to the level of a higher species, who merely shouts commands, quite unlike the teacher model described in the previous two chapters. Consequently, Plato later abandons the shepherd in the Statesman. The second part considers an example of the ‘bad’ shepherd, Travis Kalanick, CEO of Uber Taxis, whose attitude to his drivers shows similarities with the shepherd who professes to care for his flock, but merely seeks to exploit them.


Sign in / Sign up

Export Citation Format

Share Document