Cross-layer knowledge distillation with KL divergence and offline ensemble for compressing deep neural network

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2021.16 ◽

2021 ◽

Vol 10 ◽

Author(s):

Hsing-Hung Chou ◽

Ching-Te Chiu ◽

Yi-Ping Liao

Keyword(s):

Semantic Segmentation ◽

Experimental Results ◽

Cross Layer ◽

Compression Rate ◽

Student Model ◽

Kl Divergence ◽

Knowledge Distillation ◽

Student Models ◽

Trained Teachers ◽

High Level

Deep neural networks (DNN) have solved many tasks, including image classification, object detection, and semantic segmentation. However, when there are huge parameters and high level of computation associated with a DNN model, it becomes difficult to deploy on mobile devices. To address this difficulty, we propose an efficient compression method that can be split into three parts. First, we propose a cross-layer matrix to extract more features from the teacher's model. Second, we adopt Kullback Leibler (KL) Divergence in an offline environment to make the student model find a wider robust minimum. Finally, we propose the offline ensemble pre-trained teachers to teach a student model. To address dimension mismatch between teacher and student models, we adopt a $1\times 1$ convolution and two-stage knowledge distillation to release this constraint. We conducted experiments with VGG and ResNet models, using the CIFAR-100 dataset. With VGG-11 as the teacher's model and VGG-6 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.57% with a $2.08\times$ compression rate and 3.5x computation rate. With ResNet-32 as the teacher's model and ResNet-8 as the student's model, experimental results showed that Top-1 accuracy increased by 4.38% with a $6.11\times$ compression rate and $5.27\times$ computation rate. In addition, we conducted experiments using the ImageNet $64\times 64$ dataset. With MobileNet-16 as the teacher's model and MobileNet-9 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.98% with a $1.59\times$ compression rate and $2.05\times$ computation rate.

Download Full-text

Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/362 ◽

2021 ◽

Author(s):

Taehyeon Kim ◽

Jaehoon Oh ◽

Nak Yil Kim ◽

Sangwook Cho ◽

Se-Young Yun

Keyword(s):

Mean Squared Error ◽

Probability Distributions ◽

Student Model ◽

Kl Divergence ◽

Squared Error ◽

Leibler Divergence ◽

Temperature Scaling ◽

Knowledge Distillation ◽

The Mean ◽

Teacher Model

Knowledge distillation (KD), transferring knowledge from a cumbersome teacher model to a lightweight student model, has been investigated to design efficient neural architectures. Generally, the objective function of KD is the Kullback-Leibler (KL) divergence loss between the softened probability distributions of the teacher model and the student model with the temperature scaling hyperparameter τ. Despite its widespread use, few studies have discussed how such softening influences generalization. Here, we theoretically show that the KL divergence loss focuses on the logit matching when τ increases and the label matching when τ goes to 0 and empirically show that the logit matching is positively correlated to performance improvement in general. From this observation, we consider an intuitive KD loss function, the mean squared error (MSE) between the logit vectors, so that the student model can directly learn the logit of the teacher model. The MSE loss outperforms the KL divergence loss, explained by the penultimate layer representations difference between the two losses. Furthermore, we show that sequential distillation can improve performance and that KD, using the KL divergence loss with small τ particularly, mitigates the label noise. The code to reproduce the experiments is publicly available online at https://github.com/jhoon-oh/kd_data/.

Download Full-text

Online Knowledge Distillation with Diverse Peers

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5746 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3430-3437

Author(s):

Defang Chen ◽

Jian-Ping Mei ◽

Can Wang ◽

Yan Feng ◽

Chun Chen

Keyword(s):

Knowledge Transfer ◽

State Of The Art ◽

High Capacity ◽

Group Leader ◽

Student Model ◽

Aggregation Functions ◽

Knowledge Distillation ◽

Group Members ◽

Student Models ◽

Soft Targets

Distillation is an effective knowledge-transfer technique that uses predicted distributions of a powerful teacher model as soft targets to train a less-parameterized student model. A pre-trained high capacity teacher, however, is not always available. Recently proposed online variants use the aggregated intermediate predictions of multiple student models as targets to train each student model. Although group-derived targets give a good recipe for teacher-free distillation, group members are homogenized quickly with simple aggregation functions, leading to early saturated solutions. In this work, we propose Online Knowledge Distillation with Diverse peers (OKDDip), which performs two-level distillation during training with multiple auxiliary peers and one group leader. In the first-level distillation, each auxiliary peer holds an individual set of aggregation weights generated with an attention-based mechanism to derive its own targets from predictions of other auxiliary peers. Learning from distinct target distributions helps to boost peer diversity for effectiveness of group-based distillation. The second-level distillation is performed to transfer the knowledge in the ensemble of auxiliary peers further to the group leader, i.e., the model used for inference. Experimental results show that the proposed framework consistently gives better performance than state-of-the-art approaches without sacrificing training or inference complexity, demonstrating the effectiveness of the proposed two-level distillation framework.

Download Full-text

Multi-Sensor Fusion for Aerial Robots in Industrial GNSS-Denied Environments

Applied Sciences ◽

10.3390/app11093921 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3921

Author(s):

Paloma Carrasco ◽

Francisco Cuesta ◽

Rafael Caballero ◽

Francisco J. Perez-Grau ◽

Antidio Viguria

Keyword(s):

Sensor Fusion ◽

Computational Efficiency ◽

Probabilistic Approach ◽

Laser Scanner ◽

Industrial Applications ◽

Experimental Results ◽

Added Value ◽

Aerial Robots ◽

3D Localization ◽

High Level

The use of unmanned aerial robots has increased exponentially in recent years, and the relevance of industrial applications in environments with degraded satellite signals is rising. This article presents a solution for the 3D localization of aerial robots in such environments. In order to truly use these versatile platforms for added-value cases in these scenarios, a high level of reliability is required. Hence, the proposed solution is based on a probabilistic approach that makes use of a 3D laser scanner, radio sensors, a previously built map of the environment and input odometry, to obtain pose estimations that are computed onboard the aerial platform. Experimental results show the feasibility of the approach in terms of accuracy, robustness and computational efficiency.

Download Full-text

Deep Learning Based Pavement Inspection Using Self-Reconfigurable Robot

Sensors ◽

10.3390/s21082595 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2595

Author(s):

Balakrishnan Ramalingam ◽

Abdullah Aamir Hayat ◽

Mohan Rajesh Elara ◽

Braulio Félix Gómez ◽

Lim Yi ◽

...

Keyword(s):

Deep Learning ◽

Semantic Segmentation ◽

High Accuracy ◽

Experimental Results ◽

Mobile Mapping ◽

Mapping System ◽

Mobile Mapping System ◽

Reconfigurable Robot ◽

Nvidia Gpu ◽

Inspection Task

The pavement inspection task, which mainly includes crack and garbage detection, is essential and carried out frequently. The human-based or dedicated system approach for inspection can be easily carried out by integrating with the pavement sweeping machines. This work proposes a deep learning-based pavement inspection framework for self-reconfigurable robot named Panthera. Semantic segmentation framework SegNet was adopted to segment the pavement region from other objects. Deep Convolutional Neural Network (DCNN) based object detection is used to detect and localize pavement defects and garbage. Furthermore, Mobile Mapping System (MMS) was adopted for the geotagging of the defects. The proposed system was implemented and tested with the Panthera robot having NVIDIA GPU cards. The experimental results showed that the proposed technique identifies the pavement defects and litters or garbage detection with high accuracy. The experimental results on the crack and garbage detection are presented. It is found that the proposed technique is suitable for deployment in real-time for garbage detection and, eventually, sweeping or cleaning tasks.

Download Full-text

Nlkd: Using Coarse Annotations For Semantic Segmentation Based on Knowledge Distillation

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414355 ◽

2021 ◽

Author(s):

Dong Liang ◽

Yun Du ◽

Han Sun ◽

Liyan Zhang ◽

Ningzhong Liu ◽

...

Keyword(s):

Semantic Segmentation ◽

Knowledge Distillation

Download Full-text

Wireless channel model using stochastic high-level Petri nets for cross-layer performance analysis in orthogonal frequency-division multiplexing system

IET Communications ◽

10.1049/iet-com.2014.0394 ◽

2014 ◽

Vol 8 (16) ◽

pp. 2871-2880 ◽

Cited By ~ 5

Author(s):

Lei Lei ◽

Huijian Wang ◽

Chuang Lin ◽

Zhangdui Zhong

Keyword(s):

Performance Analysis ◽

Petri Nets ◽

Orthogonal Frequency Division Multiplexing ◽

Channel Model ◽

Wireless Channel ◽

Cross Layer ◽

Frequency Division Multiplexing ◽

Frequency Division ◽

Wireless Channel Model ◽

High Level

Download Full-text

Cross-layer Approach to Assess FMEA on Critical Systems and Evaluate High-Level Model Realism

10.1109/vlsi-soc53125.2021.9607001 ◽

2021 ◽

Author(s):

Julie Roux ◽

Katell Morin-Allory ◽

Vincent Beroulle ◽

Regis Leveugle ◽

Lilian Bossuet ◽

...

Keyword(s):

Cross Layer ◽

Critical Systems ◽

Level Model ◽

Model Realism ◽

High Level

Download Full-text

EXPLORING ALS AND DIM DATA FOR SEMANTIC SEGMENTATION USING CNNS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-1-347-2018 ◽

2018 ◽

Vol XLII-1 ◽

pp. 347-354 ◽

Cited By ~ 5

Author(s):

F. Politz ◽

M. Sester

Keyword(s):

Point Cloud ◽

Laser Scanning ◽

Semantic Segmentation ◽

Point Clouds ◽

Good Alternative ◽

Aerial Images ◽

Learning Approaches ◽

Advantages And Disadvantages ◽

Sensing Applications ◽

High Level

Abstract. Over the past years, the algorithms for dense image matching (DIM) to obtain point clouds from aerial images improved significantly. Consequently, DIM point clouds are now a good alternative to the established Airborne Laser Scanning (ALS) point clouds for remote sensing applications. In order to derive high-level applications such as digital terrain models or city models, each point within a point cloud must be assigned a class label. Usually, ALS and DIM are labelled with different classifiers due to their varying characteristics. In this work, we explore both point cloud types in a fully convolutional encoder-decoder network, which learns to classify ALS as well as DIM point clouds. As input, we project the point clouds onto a 2D image raster plane and calculate the minimal, average and maximal height values for each raster cell. The network then differentiates between the classes ground, non-ground, building and no data. We test our network in six training setups using only one point cloud type, both point clouds as well as several transfer-learning approaches. We quantitatively and qualitatively compare all results and discuss the advantages and disadvantages of all setups. The best network achieves an overall accuracy of 96% in an ALS and 83% in a DIM test set.

Download Full-text

Towards High-Level Intrinsic Exploration in Reinforcement Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/733 ◽

2020 ◽

Author(s):

Nicolas Bougie ◽

Ryutaro Ichise

Keyword(s):

Reinforcement Learning ◽

Time Horizon ◽

State Of The Art ◽

Experimental Results ◽

Prior Work ◽

Extrinsic Rewards ◽

Intrinsic Reward ◽

Long Time ◽

End To End ◽

High Level

Deep reinforcement learning (DRL) methods traditionally struggle with tasks where environment rewards are sparse or delayed, which entails that exploration remains one of the key challenges of DRL. Instead of solely relying on extrinsic rewards, many state-of-the-art methods use intrinsic curiosity as exploration signal. While they hold promise of better local exploration, discovering global exploration strategies is beyond the reach of current methods. We propose a novel end-to-end intrinsic reward formulation that introduces high-level exploration in reinforcement learning. Our curiosity signal is driven by a fast reward that deals with local exploration and a slow reward that incentivizes long-time horizon exploration strategies. We formulate curiosity as the error in an agent’s ability to reconstruct the observations given their contexts. Experimental results show that this high-level exploration enables our agents to outperform prior work in several Atari games.

Download Full-text

Influence of the Laminate Configurations of Transparent Armor on its Ballistic Protection

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.608.253 ◽

2014 ◽

Vol 608 ◽

pp. 253-258 ◽

Cited By ~ 4

Author(s):

Priawthida Jantharat ◽

Ryan C. McCuiston ◽

Chaiwut Gamonpilas ◽

Sujarinee Kochawattana

Keyword(s):

Experimental Results ◽

Front Face ◽

Light Weight ◽

Ballistic Performance ◽

User Requirement ◽

Element Analysis ◽

Minimum Requirement ◽

National Institute Of Justice ◽

Ballistic Protection ◽

High Level

The ballistic performance of transparent armors has been continuously developed for an application on security purposes. Generally, ballistic performance of the laminated glass increases with its thickness and weight while the user requirement prefers high level of ballistic protection with thin and light weight body. In this study, fabrication of light weight glass-PVB transparent armors with the level-3 protection according to the National Institute of Justice (NIJ) standard was attempted. The ballistic performances of various configurations of glass-PVB laminates were determined against 7.62 mm ammunitions. Results from fragmentation analysis indicated the influence of glass-sheet-arrangement in the armor structures on the ballistic damages. The minimum requirement on the thickness of front-face layer was also discussed. To verify the experimental results, finite element analysis was performed on all laminated systems. It was found that the results from computational analysis were in reasonable agreement with the experimental results.

Download Full-text