scholarly journals Cross-layer knowledge distillation with KL divergence and offline ensemble for compressing deep neural network

Author(s):  
Hsing-Hung Chou ◽  
Ching-Te Chiu ◽  
Yi-Ping Liao

Deep neural networks (DNN) have solved many tasks, including image classification, object detection, and semantic segmentation. However, when there are huge parameters and high level of computation associated with a DNN model, it becomes difficult to deploy on mobile devices. To address this difficulty, we propose an efficient compression method that can be split into three parts. First, we propose a cross-layer matrix to extract more features from the teacher's model. Second, we adopt Kullback Leibler (KL) Divergence in an offline environment to make the student model find a wider robust minimum. Finally, we propose the offline ensemble pre-trained teachers to teach a student model. To address dimension mismatch between teacher and student models, we adopt a $1\times 1$ convolution and two-stage knowledge distillation to release this constraint. We conducted experiments with VGG and ResNet models, using the CIFAR-100 dataset. With VGG-11 as the teacher's model and VGG-6 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.57% with a $2.08\times$ compression rate and 3.5x computation rate. With ResNet-32 as the teacher's model and ResNet-8 as the student's model, experimental results showed that Top-1 accuracy increased by 4.38% with a $6.11\times$ compression rate and $5.27\times$ computation rate. In addition, we conducted experiments using the ImageNet $64\times 64$ dataset. With MobileNet-16 as the teacher's model and MobileNet-9 as the student's model, experimental results showed that the Top-1 accuracy increased by 3.98% with a $1.59\times$ compression rate and $2.05\times$ computation rate.

Author(s):  
Taehyeon Kim ◽  
Jaehoon Oh ◽  
Nak Yil Kim ◽  
Sangwook Cho ◽  
Se-Young Yun

Knowledge distillation (KD), transferring knowledge from a cumbersome teacher model to a lightweight student model, has been investigated to design efficient neural architectures. Generally, the objective function of KD is the Kullback-Leibler (KL) divergence loss between the softened probability distributions of the teacher model and the student model with the temperature scaling hyperparameter τ. Despite its widespread use, few studies have discussed how such softening influences generalization. Here, we theoretically show that the KL divergence loss focuses on the logit matching when τ increases and the label matching when τ goes to 0 and empirically show that the logit matching is positively correlated to performance improvement in general. From this observation, we consider an intuitive KD loss function, the mean squared error (MSE) between the logit vectors, so that the student model can directly learn the logit of the teacher model. The MSE loss outperforms the KL divergence loss, explained by the penultimate layer representations difference between the two losses. Furthermore, we show that sequential distillation can improve performance and that KD, using the KL divergence loss with small τ particularly, mitigates the label noise. The code to reproduce the experiments is publicly available online at https://github.com/jhoon-oh/kd_data/.


2020 ◽  
Vol 34 (04) ◽  
pp. 3430-3437
Author(s):  
Defang Chen ◽  
Jian-Ping Mei ◽  
Can Wang ◽  
Yan Feng ◽  
Chun Chen

Distillation is an effective knowledge-transfer technique that uses predicted distributions of a powerful teacher model as soft targets to train a less-parameterized student model. A pre-trained high capacity teacher, however, is not always available. Recently proposed online variants use the aggregated intermediate predictions of multiple student models as targets to train each student model. Although group-derived targets give a good recipe for teacher-free distillation, group members are homogenized quickly with simple aggregation functions, leading to early saturated solutions. In this work, we propose Online Knowledge Distillation with Diverse peers (OKDDip), which performs two-level distillation during training with multiple auxiliary peers and one group leader. In the first-level distillation, each auxiliary peer holds an individual set of aggregation weights generated with an attention-based mechanism to derive its own targets from predictions of other auxiliary peers. Learning from distinct target distributions helps to boost peer diversity for effectiveness of group-based distillation. The second-level distillation is performed to transfer the knowledge in the ensemble of auxiliary peers further to the group leader, i.e., the model used for inference. Experimental results show that the proposed framework consistently gives better performance than state-of-the-art approaches without sacrificing training or inference complexity, demonstrating the effectiveness of the proposed two-level distillation framework.


2021 ◽  
Vol 11 (9) ◽  
pp. 3921
Author(s):  
Paloma Carrasco ◽  
Francisco Cuesta ◽  
Rafael Caballero ◽  
Francisco J. Perez-Grau ◽  
Antidio Viguria

The use of unmanned aerial robots has increased exponentially in recent years, and the relevance of industrial applications in environments with degraded satellite signals is rising. This article presents a solution for the 3D localization of aerial robots in such environments. In order to truly use these versatile platforms for added-value cases in these scenarios, a high level of reliability is required. Hence, the proposed solution is based on a probabilistic approach that makes use of a 3D laser scanner, radio sensors, a previously built map of the environment and input odometry, to obtain pose estimations that are computed onboard the aerial platform. Experimental results show the feasibility of the approach in terms of accuracy, robustness and computational efficiency.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2595
Author(s):  
Balakrishnan Ramalingam ◽  
Abdullah Aamir Hayat ◽  
Mohan Rajesh Elara ◽  
Braulio Félix Gómez ◽  
Lim Yi ◽  
...  

The pavement inspection task, which mainly includes crack and garbage detection, is essential and carried out frequently. The human-based or dedicated system approach for inspection can be easily carried out by integrating with the pavement sweeping machines. This work proposes a deep learning-based pavement inspection framework for self-reconfigurable robot named Panthera. Semantic segmentation framework SegNet was adopted to segment the pavement region from other objects. Deep Convolutional Neural Network (DCNN) based object detection is used to detect and localize pavement defects and garbage. Furthermore, Mobile Mapping System (MMS) was adopted for the geotagging of the defects. The proposed system was implemented and tested with the Panthera robot having NVIDIA GPU cards. The experimental results showed that the proposed technique identifies the pavement defects and litters or garbage detection with high accuracy. The experimental results on the crack and garbage detection are presented. It is found that the proposed technique is suitable for deployment in real-time for garbage detection and, eventually, sweeping or cleaning tasks.


Author(s):  
Julie Roux ◽  
Katell Morin-Allory ◽  
Vincent Beroulle ◽  
Regis Leveugle ◽  
Lilian Bossuet ◽  
...  

Author(s):  
F. Politz ◽  
M. Sester

<p><strong>Abstract.</strong> Over the past years, the algorithms for dense image matching (DIM) to obtain point clouds from aerial images improved significantly. Consequently, DIM point clouds are now a good alternative to the established Airborne Laser Scanning (ALS) point clouds for remote sensing applications. In order to derive high-level applications such as digital terrain models or city models, each point within a point cloud must be assigned a class label. Usually, ALS and DIM are labelled with different classifiers due to their varying characteristics. In this work, we explore both point cloud types in a fully convolutional encoder-decoder network, which learns to classify ALS as well as DIM point clouds. As input, we project the point clouds onto a 2D image raster plane and calculate the minimal, average and maximal height values for each raster cell. The network then differentiates between the classes ground, non-ground, building and no data. We test our network in six training setups using only one point cloud type, both point clouds as well as several transfer-learning approaches. We quantitatively and qualitatively compare all results and discuss the advantages and disadvantages of all setups. The best network achieves an overall accuracy of 96<span class="thinspace"></span>% in an ALS and 83<span class="thinspace"></span>% in a DIM test set.</p>


Author(s):  
Nicolas Bougie ◽  
Ryutaro Ichise

Deep reinforcement learning (DRL) methods traditionally struggle with tasks where environment rewards are sparse or delayed, which entails that exploration remains one of the key challenges of DRL. Instead of solely relying on extrinsic rewards, many state-of-the-art methods use intrinsic curiosity as exploration signal. While they hold promise of better local exploration, discovering global exploration strategies is beyond the reach of current methods. We propose a novel end-to-end intrinsic reward formulation that introduces high-level exploration in reinforcement learning. Our curiosity signal is driven by a fast reward that deals with local exploration and a slow reward that incentivizes long-time horizon exploration strategies. We formulate curiosity as the error in an agent’s ability to reconstruct the observations given their contexts. Experimental results show that this high-level exploration enables our agents to outperform prior work in several Atari games.


2014 ◽  
Vol 608 ◽  
pp. 253-258 ◽  
Author(s):  
Priawthida Jantharat ◽  
Ryan C. McCuiston ◽  
Chaiwut Gamonpilas ◽  
Sujarinee Kochawattana

The ballistic performance of transparent armors has been continuously developed for an application on security purposes. Generally, ballistic performance of the laminated glass increases with its thickness and weight while the user requirement prefers high level of ballistic protection with thin and light weight body. In this study, fabrication of light weight glass-PVB transparent armors with the level-3 protection according to the National Institute of Justice (NIJ) standard was attempted. The ballistic performances of various configurations of glass-PVB laminates were determined against 7.62 mm ammunitions. Results from fragmentation analysis indicated the influence of glass-sheet-arrangement in the armor structures on the ballistic damages. The minimum requirement on the thickness of front-face layer was also discussed. To verify the experimental results, finite element analysis was performed on all laminated systems. It was found that the results from computational analysis were in reasonable agreement with the experimental results.


Sign in / Sign up

Export Citation Format

Share Document