Knowledge distillation in deep learning and its applications

Ensemble Learning of Lightweight Deep Learning Models Using Knowledge Distillation for Image Classification

Mathematics ◽

10.3390/math8101652 ◽

2020 ◽

Vol 8 (10) ◽

pp. 1652

Author(s):

Jaeyong Kang ◽

Jeonghwan Gwak

Keyword(s):

Deep Learning ◽

Image Classification ◽

Limited Resources ◽

Ensemble Model ◽

Learning Models ◽

Model Compression ◽

Knowledge Distillation ◽

Feature Based ◽

Classification Tasks ◽

Teacher Model

In recent years, deep learning models have been used successfully in almost every field including both industry and academia, especially for computer vision tasks. However, these models are huge in size, with millions (and billions) of parameters, and thus cannot be deployed on the systems and devices with limited resources (e.g., embedded systems and mobile phones). To tackle this, several techniques on model compression and acceleration have been proposed. As a representative type of them, knowledge distillation suggests a way to effectively learn a small student model from large teacher model(s). It has attracted increasing attention since it showed its promising performance. In the work, we propose an ensemble model that combines feature-based, response-based, and relation-based lightweight knowledge distillation models for simple image classification tasks. In our knowledge distillation framework, we use ResNet−20 as a student network and ResNet−110 as a teacher network. Experimental results demonstrate that our proposed ensemble model outperforms other knowledge distillation models as well as the large teacher model for image classification tasks, with less computational power than the teacher model.

Download Full-text

Towards Continuous Authentication on Mobile Phones using Deep Learning Models

Procedia Computer Science ◽

10.1016/j.procs.2019.08.027 ◽

2019 ◽

Vol 155 ◽

pp. 177-184 ◽

Cited By ~ 5

Author(s):

Hasan Can Volaka ◽

Gulfem Alptekin ◽

Okan Engin Basar ◽

Mustafa Isbilen ◽

Ozlem Durmaz Incel

Keyword(s):

Deep Learning ◽

Mobile Phones ◽

Learning Models ◽

Continuous Authentication

Download Full-text

Novel Model Based on Stacked Autoencoders with Sample-Wise Strategy for Fault Diagnosis

Mathematical Problems in Engineering ◽

10.1155/2019/8985657 ◽

2019 ◽

Vol 2019 ◽

pp. 1-10

Author(s):

Diehao Kong ◽

Xuefeng Yan

Keyword(s):

Fault Diagnosis ◽

Chemical Engineering ◽

Ground Truth ◽

Student Model ◽

Teacher Student ◽

Stacked Autoencoders ◽

Knowledge Distillation ◽

New Perspective ◽

Current Student ◽

Teacher Model

Autoencoders are used for fault diagnosis in chemical engineering. To improve their performance, experts have paid close attention to regularized strategies and the creation of new and effective cost functions. However, existing methods are modified on the basis of only one model. This study provides a new perspective for strengthening the fault diagnosis model, which attempts to gain useful information from a model (teacher model) and applies it to a new model (student model). It pretrains the teacher model by fitting ground truth labels and then uses a sample-wise strategy to transfer knowledge from the teacher model. Finally, the knowledge and the ground truth labels are used to train the student model that is identical to the teacher model in terms of structure. The current student model is then used as the teacher of next student model. After step-by-step teacher-student reconfiguration and training, the optimal model is selected for fault diagnosis. Besides, knowledge distillation is applied in training procedures. The proposed method is applied to several benchmarked problems to prove its effectiveness.

Download Full-text

Deep Learning Methods for Flood Mapping: A Review of Existing Applications and Future Research Directions

10.5194/hess-2021-614 ◽

2021 ◽

Author(s):

Roberto Bentivoglio ◽

Elvin Isufi ◽

Sebastian Nicolaas Jonkman ◽

Riccardo Taormina

Keyword(s):

Deep Learning ◽

Real Time ◽

Case Studies ◽

Flood Risk ◽

Numerical Models ◽

Flood Mapping ◽

Future Research ◽

Learning Models ◽

Research Directions ◽

Future Research Directions

Abstract. Deep Learning techniques have been increasingly used in flood risk management to overcome the limitations of accurate, yet slow, numerical models, and to improve the results of traditional methods for flood mapping. In this paper, we review 45 recent publications to outline the state-of-the-art of the field, identify knowledge gaps, and propose future research directions. The review focuses on the type of deep learning models used for various flood mapping applications, the flood types considered, the spatial scale of the studied events, and the data used for model development. The results show that models based on convolutional layers are usually more accurate as they leverage inductive biases to better process the spatial characteristics of the flooding events. Traditional models based on fully-connected layers, instead, provide accurate results when coupled with other statistical models. Deep learning models showed increased accuracy when compared to traditional approaches and increased speed when compared to numerical methods. While there exist several applications in flood susceptibility, inundation, and hazard mapping, more work is needed to understand how deep learning can assist real-time flood warning during an emergency, and how it can be employed to estimate flood risk. A major challenge lies in developing deep learning models that can generalize to unseen case studies and sites. Furthermore, all reviewed models and their outputs, are deterministic, with limited considerations for uncertainties in outcomes and probabilistic predictions. The authors argue that these identified gaps can be addressed by exploiting recent fundamental advancements in deep learning or by taking inspiration from developments in other applied areas. Models based on graph neural networks and neural operators can work with arbitrarily structured data and thus should be capable of generalizing across different case studies and could account for complex interactions with the natural and built environment. Neural operators can also speed up numerical models while preserving the underlying physical equations and could thus be used for reliable real-time warning. Similarly, probabilistic models can be built by resorting to Deep Gaussian Processes.

Download Full-text

On Determining Suitable Embedded Devices for Deep Learning Models

10.3233/faia210147 ◽

2021 ◽

Author(s):

Daniel Padilla ◽

Hatem A. Rashwan ◽

Domènec Savi Puig

Keyword(s):

Computer Vision ◽

Deep Learning ◽

Embedded Systems ◽

State Of The Art ◽

Image Data ◽

Important Change ◽

Learning Models ◽

Embedded Devices ◽

And Performance ◽

High Level

Deep learning (DL) networks have proven to be crucial in commercial solutions with computer vision challenges due to their abilities to extract high-level abstractions of the image data and their capabilities of being easily adapted to many applications. As a result, DL methodologies had become a de facto standard for computer vision problems yielding many new kinds of research, approaches and applications. Recently, the commercial sector is also driving to use of embedded systems to be able to execute DL models, which has caused an important change on the DL panorama and the embedded systems themselves. Consequently, in this paper, we attempt to study the state of the art of embedded systems, such as GPUs, FPGAs and Mobile SoCs, that are able to use DL techniques, to modernize the stakeholders with the new systems available in the market. Besides, we aim at helping them to determine which of these systems can be beneficial and suitable for their applications in terms of upgradeability, price, deployment and performance.

Download Full-text

Deep Learning for Video Classification: A Review

10.36227/techrxiv.15172920 ◽

2021 ◽

Author(s):

Atiq Rehman ◽

Samir Brahim Belhaouari

Keyword(s):

Deep Learning ◽

Network Architecture ◽

Classification Task ◽

Future Research ◽

Video Classification ◽

Learning Models ◽

Learning Methods ◽

Research Directions ◽

Concise Review ◽

State Of Art

<div><div><div><p>Video classification task has gained a significant success in the recent years. Specifically, the topic has gained more attention after the emergence of deep learning models as a successful tool for automatically classifying videos. In recognition to the importance of video classification task and to summarize the success of deep learning models for this task, this paper presents a very comprehensive and concise review on the topic. There are a number of existing reviews and survey papers related to video classification in the scientific literature. However, the existing review papers are either outdated, and therefore, do not include the recent state-of-art works or they have some limitations. In order to provide an updated and concise review, this paper highlights the key findings based on the existing deep learning models. The key findings are also discussed in a way to provide future research directions. This review mainly focuses on the type of network architecture used, the evaluation criteria to measure the success, and the data sets used. To make the review self- contained, the emergence of deep learning methods towards automatic video classification and the state-of-art deep learning methods are well explained and summarized. Moreover, a clear insight of the newly developed deep learning architectures and the traditional approaches is provided, and the critical challenges based on the benchmarks are highlighted for evaluating the technical progress of these methods. The paper also summarizes the benchmark datasets and the performance evaluation matrices for video classification. Based on the compact, complete, and concise review, the paper proposes new research directions to solve the challenging video classification problem.</p></div></div></div>

Download Full-text

A survey on GAN acceleration using memory compression techniques

Journal of Engineering and Applied Science ◽

10.1186/s44147-021-00045-5 ◽

2021 ◽

Vol 68 (1) ◽

Author(s):

Dina Tantawy ◽

Mohamed Zahran ◽

Amr Wassal

Keyword(s):

Deep Learning ◽

Data Transfer ◽

Lossy Compression ◽

Research Field ◽

Low Rank ◽

Generative Adversarial Networks ◽

Learning Models ◽

Memory Compression ◽

Knowledge Distillation ◽

Rank Factorization

AbstractSince its invention, generative adversarial networks (GANs) have shown outstanding results in many applications. GANs are powerful, yet resource-hungry deep learning models. The main difference between GANs and ordinary deep learning models is the nature of their output and training instability. For example, GANs output can be a whole image versus other models detecting objects or classifying images. Thus, the architecture and numeric precision of the network affect the quality and speed of the solution. Hence, accelerating GANs is pivotal. Data transfer is considered the main source of energy consumption, that is why memory compression is a very efficient technique to accelerate and optimize GANs. Two main types of memory compression exist: lossless and lossy ones. Lossless compression techniques are general among all models; thus, we will focus in this paper on lossy techniques. Lossy compression techniques are further classified into (a) pruning, (b) knowledge distillation, (c) low-rank factorization, (d) lowering numeric precision, and (e) encoding. In this paper, we survey lossy compression techniques for CNN-based GANs. Our findings showed the superiority of knowledge distillation over pruning alone and the gaps in the research field that needs to be explored like encoding and different combination of compression techniques.

Download Full-text

Deep Unsupervised Hashing for Large-Scale Cross-Modal Retrieval Using Knowledge Distillation Model

Computational Intelligence and Neuroscience ◽

10.1155/2021/5107034 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Mingyong Li ◽

Qiqi Li ◽

Lirong Tang ◽

Shuang Peng ◽

Yan Ma ◽

...

Keyword(s):

Large Scale ◽

Data Retrieval ◽

Multimedia Data ◽

Search Performance ◽

Similarity Matrix ◽

Student Model ◽

Deep Hashing ◽

Knowledge Distillation ◽

Semantic Alignment ◽

Teacher Model

Cross-modal hashing encodes heterogeneous multimedia data into compact binary code to achieve fast and flexible retrieval across different modalities. Due to its low storage cost and high retrieval efficiency, it has received widespread attention. Supervised deep hashing significantly improves search performance and usually yields more accurate results, but requires a lot of manual annotation of the data. In contrast, unsupervised deep hashing is difficult to achieve satisfactory performance due to the lack of reliable supervisory information. To solve this problem, inspired by knowledge distillation, we propose a novel unsupervised knowledge distillation cross-modal hashing method based on semantic alignment (SAKDH), which can reconstruct the similarity matrix using the hidden correlation information of the pretrained unsupervised teacher model, and the reconstructed similarity matrix can be used to guide the supervised student model. Specifically, firstly, the teacher model adopted an unsupervised semantic alignment hashing method, which can construct a modal fusion similarity matrix. Secondly, under the supervision of teacher model distillation information, the student model can generate more discriminative hash codes. Experimental results on two extensive benchmark datasets (MIRFLICKR-25K and NUS-WIDE) show that compared to several representative unsupervised cross-modal hashing methods, the mean average precision (MAP) of our proposed method has achieved a significant improvement. It fully reflects its effectiveness in large-scale cross-modal data retrieval.

Download Full-text

Deep Learning for Video Classification: A Review

10.36227/techrxiv.15172920.v1 ◽

2021 ◽

Author(s):

Atiq Rehman ◽

Samir Brahim Belhaouari

Keyword(s):

Deep Learning ◽

Network Architecture ◽

Classification Task ◽

Future Research ◽

Video Classification ◽

Learning Models ◽

Learning Methods ◽

Research Directions ◽

Concise Review ◽

State Of Art

<div><div><div><p>Video classification task has gained a significant success in the recent years. Specifically, the topic has gained more attention after the emergence of deep learning models as a successful tool for automatically classifying videos. In recognition to the importance of video classification task and to summarize the success of deep learning models for this task, this paper presents a very comprehensive and concise review on the topic. There are a number of existing reviews and survey papers related to video classification in the scientific literature. However, the existing review papers are either outdated, and therefore, do not include the recent state-of-art works or they have some limitations. In order to provide an updated and concise review, this paper highlights the key findings based on the existing deep learning models. The key findings are also discussed in a way to provide future research directions. This review mainly focuses on the type of network architecture used, the evaluation criteria to measure the success, and the data sets used. To make the review self- contained, the emergence of deep learning methods towards automatic video classification and the state-of-art deep learning methods are well explained and summarized. Moreover, a clear insight of the newly developed deep learning architectures and the traditional approaches is provided, and the critical challenges based on the benchmarks are highlighted for evaluating the technical progress of these methods. The paper also summarizes the benchmark datasets and the performance evaluation matrices for video classification. Based on the compact, complete, and concise review, the paper proposes new research directions to solve the challenging video classification problem.</p></div></div></div>

Download Full-text

Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/362 ◽

2021 ◽

Author(s):

Taehyeon Kim ◽

Jaehoon Oh ◽

Nak Yil Kim ◽

Sangwook Cho ◽

Se-Young Yun

Keyword(s):

Mean Squared Error ◽

Probability Distributions ◽

Student Model ◽

Kl Divergence ◽

Squared Error ◽

Leibler Divergence ◽

Temperature Scaling ◽

Knowledge Distillation ◽

The Mean ◽

Teacher Model

Knowledge distillation (KD), transferring knowledge from a cumbersome teacher model to a lightweight student model, has been investigated to design efficient neural architectures. Generally, the objective function of KD is the Kullback-Leibler (KL) divergence loss between the softened probability distributions of the teacher model and the student model with the temperature scaling hyperparameter τ. Despite its widespread use, few studies have discussed how such softening influences generalization. Here, we theoretically show that the KL divergence loss focuses on the logit matching when τ increases and the label matching when τ goes to 0 and empirically show that the logit matching is positively correlated to performance improvement in general. From this observation, we consider an intuitive KD loss function, the mean squared error (MSE) between the logit vectors, so that the student model can directly learn the logit of the teacher model. The MSE loss outperforms the KL divergence loss, explained by the penultimate layer representations difference between the two losses. Furthermore, we show that sequential distillation can improve performance and that KD, using the KL divergence loss with small τ particularly, mitigates the label noise. The code to reproduce the experiments is publicly available online at https://github.com/jhoon-oh/kd_data/.

Download Full-text