Light Multi-Segment Activation for Model Compression

Zhenhui Xu; Guolin Ke; Jia Zhang; Jiang Bian; Tie-Yan Liu

doi:10.1609/aaai.v34i04.6128

Light Multi-Segment Activation for Model Compression

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6128 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6542-6549

Author(s):

Zhenhui Xu ◽

Guolin Ke ◽

Jia Zhang ◽

Jiang Bian ◽

Tie-Yan Liu

Keyword(s):

State Of The Art ◽

Model Complexity ◽

Student Model ◽

Model Accuracy ◽

Compression Performance ◽

Model Compression ◽

Comparable Performance ◽

Knowledge Distillation ◽

Resource Cost ◽

Strict Requirement

Model compression has become necessary when applying neural networks (NN) into many real application tasks that can accept slightly-reduced model accuracy but with strict tolerance to model complexity. Recently, Knowledge Distillation, which distills the knowledge from well-trained and highly complex teacher model into a compact student model, has been widely used for model compression. However, under the strict requirement on the resource cost, it is quite challenging to make student model achieve comparable performance with the teacher one, essentially due to the drastically-reduced expressiveness ability of the compact student model. Inspired by the nature of the expressiveness ability in NN, we propose to use multi-segment activation, which can significantly improve the expressiveness ability with very little cost, in the compact student model. Specifically, we propose a highly efficient multi-segment activation, called Light Multi-segment Activation (LMA), which can rapidly produce multiple linear regions with very few parameters by leveraging the statistical information. With using LMA, the compact student model is capable of achieving much better performance effectively and efficiently, than the ReLU-equipped one with same model complexity. Furthermore, the proposed method is compatible with other model compression techniques, such as quantization, which means they can be used jointly for better compression performance. Experiments on state-of-the-art NN architectures over the real-world tasks demonstrate the effectiveness and extensibility of the LMA.

Download Full-text

Online Knowledge Distillation with Diverse Peers

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5746 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3430-3437

Author(s):

Defang Chen ◽

Jian-Ping Mei ◽

Can Wang ◽

Yan Feng ◽

Chun Chen

Keyword(s):

Knowledge Transfer ◽

State Of The Art ◽

High Capacity ◽

Group Leader ◽

Student Model ◽

Aggregation Functions ◽

Knowledge Distillation ◽

Group Members ◽

Student Models ◽

Soft Targets

Distillation is an effective knowledge-transfer technique that uses predicted distributions of a powerful teacher model as soft targets to train a less-parameterized student model. A pre-trained high capacity teacher, however, is not always available. Recently proposed online variants use the aggregated intermediate predictions of multiple student models as targets to train each student model. Although group-derived targets give a good recipe for teacher-free distillation, group members are homogenized quickly with simple aggregation functions, leading to early saturated solutions. In this work, we propose Online Knowledge Distillation with Diverse peers (OKDDip), which performs two-level distillation during training with multiple auxiliary peers and one group leader. In the first-level distillation, each auxiliary peer holds an individual set of aggregation weights generated with an attention-based mechanism to derive its own targets from predictions of other auxiliary peers. Learning from distinct target distributions helps to boost peer diversity for effectiveness of group-based distillation. The second-level distillation is performed to transfer the knowledge in the ensemble of auxiliary peers further to the group leader, i.e., the model used for inference. Experimental results show that the proposed framework consistently gives better performance than state-of-the-art approaches without sacrificing training or inference complexity, demonstrating the effectiveness of the proposed two-level distillation framework.

Download Full-text

KDAS-ReID: Architecture Search for Person Re-Identification via Distilled Knowledge with Dynamic Temperature

Algorithms ◽

10.3390/a14050137 ◽

2021 ◽

Vol 14 (5) ◽

pp. 137

Author(s):

Zhou Lei ◽

Kangkang Yang ◽

Kai Jiang ◽

Shengbo Chen

Keyword(s):

State Of The Art ◽

Identification Algorithm ◽

Student Model ◽

Deep Convolutional Neural Networks ◽

Fast Speed ◽

Training Stage ◽

Knowledge Distillation ◽

And Training ◽

Better Than ◽

Teacher Model

Person re-Identification(Re-ID) based on deep convolutional neural networks (CNNs) achieves remarkable success with its fast speed. However, prevailing Re-ID models are usually built upon backbones that manually design for classification. In order to automatically design an effective Re-ID architecture, we propose a pedestrian re-identification algorithm based on knowledge distillation, called KDAS-ReID. When the knowledge of the teacher model is transferred to the student model, the importance of knowledge in the teacher model will gradually decrease with the improvement of the performance of the student model. Therefore, instead of applying the distillation loss function directly, we consider using dynamic temperatures during the search stage and training stage. Specifically, we start searching and training at a high temperature and gradually reduce the temperature to 1 so that the student model can better learn from the teacher model through soft targets. Extensive experiments demonstrate that KDAS-ReID performs not only better than other state-of-the-art Re-ID models on three benchmarks, but also better than the teacher model based on the ResNet-50 backbone.

Download Full-text

Progressive Blockwise Knowledge Distillation for Neural Network Acceleration

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/384 ◽

2018 ◽

Cited By ~ 5

Author(s):

Hui Wang ◽

Hanbin Zhao ◽

Xi Li ◽

Xu Tan

Keyword(s):

Neural Network ◽

Function Approximation ◽

State Of The Art ◽

Design Criterion ◽

Structure Design ◽

Model Accuracy ◽

Teacher Student ◽

Block Level ◽

Knowledge Distillation ◽

Teacher Network

As an important and challenging problem in machine learning and computer vision, neural network acceleration essentially aims to enhance the computational efficiency without sacrificing the model accuracy too much. In this paper, we propose a progressive blockwise learning scheme for teacher-student model distillation at the subnetwork block level. The proposed scheme is able to distill the knowledge of the entire teacher network by locally extracting the knowledge of each block in terms of progressive blockwise function approximation. Furthermore, we propose a structure design criterion for the student subnetwork block, which is able to effectively preserve the original receptive field from the teacher network. Experimental results demonstrate the effectiveness of the proposed scheme against the state-of-the-art approaches.

Download Full-text

Knowledge Distillation Beyond Model Compression

2020 25th International Conference on Pattern Recognition (ICPR) ◽

10.1109/icpr48806.2021.9413016 ◽

2021 ◽

Author(s):

Fahad Sarfraz ◽

Elahe Arani ◽

Bahram Zonooz

Keyword(s):

Model Compression ◽

Knowledge Distillation

Download Full-text

Robust CNN Compression Framework for Security-Sensitive Embedded Systems

Applied Sciences ◽

10.3390/app11031093 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1093

Author(s):

Jeonghyun Lee ◽

Sangkyun Lee

Keyword(s):

Embedded Systems ◽

Optimization Problem ◽

State Of The Art ◽

Classification Problems ◽

Proximal Gradient Method ◽

Knowledge Distillation ◽

New Type ◽

Adversarial Examples ◽

Adversarial Training ◽

Memory Efficient

Convolutional neural networks (CNNs) have achieved tremendous success in solving complex classification problems. Motivated by this success, there have been proposed various compression methods for downsizing the CNNs to deploy them on resource-constrained embedded systems. However, a new type of vulnerability of compressed CNNs known as the adversarial examples has been discovered recently, which is critical for security-sensitive systems because the adversarial examples can cause malfunction of CNNs and can be crafted easily in many cases. In this paper, we proposed a compression framework to produce compressed CNNs robust against such adversarial examples. To achieve the goal, our framework uses both pruning and knowledge distillation with adversarial training. We formulate our framework as an optimization problem and provide a solution algorithm based on the proximal gradient method, which is more memory-efficient than the popular ADMM-based compression approaches. In experiments, we show that our framework can improve the trade-off between adversarial robustness and compression rate compared to the existing state-of-the-art adversarial pruning approach.

Download Full-text

Communication Failure Resilient Distributed Neural Network for Edge Devices

Electronics ◽

10.3390/electronics10141614 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1614

Author(s):

Jonghun Jeong ◽

Jong Sung Park ◽

Hoeseok Yang

Keyword(s):

Neural Network ◽

Neural Networks ◽

High Performance ◽

State Of The Art ◽

Wearable Devices ◽

Communication Failure ◽

Canadian Institute ◽

Multiple Devices ◽

Knowledge Distillation ◽

Partitioning Technique

Recently, the necessity to run high-performance neural networks (NN) is increasing even in resource-constrained embedded systems such as wearable devices. However, due to the high computational and memory requirements of the NN applications, it is typically infeasible to execute them on a single device. Instead, it has been proposed to run a single NN application cooperatively on top of multiple devices, a so-called distributed neural network. In the distributed neural network, workloads of a single big NN application are distributed over multiple tiny devices. While the computation overhead could effectively be alleviated by this approach, the existing distributed NN techniques, such as MoDNN, still suffer from large traffics between the devices and vulnerability to communication failures. In order to get rid of such big communication overheads, a knowledge distillation based distributed NN, called Network of Neural Networks (NoNN), was proposed, which partitions the filters in the final convolutional layer of the original NN into multiple independent subsets and derives smaller NNs out of each subset. However, NoNN also has limitations in that the partitioning result may be unbalanced and it considerably compromises the correlation between filters in the original NN, which may result in an unacceptable accuracy degradation in case of communication failure. In this paper, in order to overcome these issues, we propose to enhance the partitioning strategy of NoNN in two aspects. First, we enhance the redundancy of the filters that are used to derive multiple smaller NNs by means of averaging to increase the immunity of the distributed NN to communication failure. Second, we propose a novel partitioning technique, modified from Eigenvector-based partitioning, to preserve the correlation between filters as much as possible while keeping the consistent number of filters distributed to each device. Throughout extensive experiments with the CIFAR-100 (Canadian Institute For Advanced Research-100) dataset, it has been observed that the proposed approach maintains high inference accuracy (over 70%, 1.53× improvement over the state-of-the-art approach), on average, even when a half of eight devices in a distributed NN fail to deliver their partial inference results.

Download Full-text

Combine-Net: An Improved Filter Pruning Algorithm

Information ◽

10.3390/info12070264 ◽

2021 ◽

Vol 12 (7) ◽

pp. 264

Author(s):

Jinghan Wang ◽

Guangyue Li ◽

Wenzhao Zhang

Keyword(s):

Structured Model ◽

Compression Algorithms ◽

Pruning Algorithm ◽

Model Compression ◽

The Neural Network ◽

Empirical Determination ◽

Knowledge Distillation ◽

Resource Constrained Devices ◽

Constrained Devices

The powerful performance of deep learning is evident to all. With the deepening of research, neural networks have become more complex and not easily generalized to resource-constrained devices. The emergence of a series of model compression algorithms makes artificial intelligence on edge possible. Among them, structured model pruning is widely utilized because of its versatility. Structured pruning prunes the neural network itself and discards some relatively unimportant structures to compress the model’s size. However, in the previous pruning work, problems such as evaluation errors of networks, empirical determination of pruning rate, and low retraining efficiency remain. Therefore, we propose an accurate, objective, and efficient pruning algorithm—Combine-Net, introducing Adaptive BN to eliminate evaluation errors, the Kneedle algorithm to determine the pruning rate objectively, and knowledge distillation to improve the efficiency of retraining. Results show that, without precision loss, Combine-Net achieves 95% parameter compression and 83% computation compression on VGG16 on CIFAR10, 71% of parameter compression and 41% computation compression on ResNet50 on CIFAR100. Experiments on different datasets and models have proved that Combine-Net can efficiently compress the neural network’s parameters and computation.

Download Full-text

BISWSRBS: A Winograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern and Mixed Precision Quantization

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3467476 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1-28

Author(s):

Tao Yang ◽

Zhezhi He ◽

Tengchuan Kou ◽

Qingzheng Li ◽

Qi Han ◽

...

Keyword(s):

High Performance ◽

State Of The Art ◽

The State ◽

Optimization Approach ◽

Quantization Scheme ◽

Model Accuracy ◽

Sparsity Pattern ◽

Computing Platform ◽

Energy Efficiency Improvement ◽

Mixed Precision

Field-programmable Gate Array (FPGA) is a high-performance computing platform for Convolution Neural Networks (CNNs) inference. Winograd algorithm, weight pruning, and quantization are widely adopted to reduce the storage and arithmetic overhead of CNNs on FPGAs. Recent studies strive to prune the weights in the Winograd domain, however, resulting in irregular sparse patterns and leading to low parallelism and reduced utilization of resources. Besides, there are few works to discuss a suitable quantization scheme for Winograd. In this article, we propose a regular sparse pruning pattern in the Winograd-based CNN, namely, Sub-row-balanced Sparsity (SRBS) pattern, to overcome the challenge of the irregular sparse pattern. Then, we develop a two-step hardware co-optimization approach to improve the model accuracy using the SRBS pattern. Based on the pruned model, we implement a mixed precision quantization to further reduce the computational complexity of bit operations. Finally, we design an FPGA accelerator that takes both the advantage of the SRBS pattern to eliminate low-parallelism computation and the irregular memory accesses, as well as the mixed precision quantization to get a layer-wise bit width. Experimental results on VGG16/VGG-nagadomi with CIFAR-10 and ResNet-18/34/50 with ImageNet show up to 11.8×/8.67× and 8.17×/8.31×/10.6× speedup, 12.74×/9.19× and 8.75×/8.81×/11.1× energy efficiency improvement, respectively, compared with the state-of-the-art dense Winograd accelerator [20] with negligible loss of model accuracy. We also show that our design has 4.11× speedup compared with the state-of-the-art sparse Winograd accelerator [19] on VGG16.

Download Full-text

Data-Free Ensemble Knowledge Distillation for Privacy-conscious Multimedia Model Compression

10.1145/3474085.3475329 ◽

2021 ◽

Author(s):

Zhiwei Hao ◽

Yong Luo ◽

Han Hu ◽

Jianping An ◽

Yonggang Wen

Keyword(s):

Multimedia Model ◽

Model Compression ◽

Knowledge Distillation

Download Full-text

Revisiting knowledge distillation for light-weight visual object detection

Transactions of the Institute of Measurement and Control ◽

10.1177/01423312211022877 ◽

2021 ◽

Vol 43 (13) ◽

pp. 2888-2898

Author(s):

Tianze Gao ◽

Yunfeng Gao ◽

Yu Li ◽

Peiyuan Qin

Keyword(s):

Object Detection ◽

Essential Element ◽

Detection Algorithm ◽

Positive Sample ◽

Detection Methods ◽

Visual Object ◽

Light Weight ◽

Model Compression ◽

Novel Approach ◽

Knowledge Distillation

An essential element for intelligent perception in mechatronic and robotic systems (M&RS) is the visual object detection algorithm. With the ever-increasing advance of artificial neural networks (ANN), researchers have proposed numerous ANN-based visual object detection methods that have proven to be effective. However, networks with cumbersome structures do not befit the real-time scenarios in M&RS, necessitating the techniques of model compression. In the paper, a novel approach to training light-weight visual object detection networks is developed by revisiting knowledge distillation. Traditional knowledge distillation methods are oriented towards image classification is not compatible with object detection. Therefore, a variant of knowledge distillation is developed and adapted to a state-of-the-art keypoint-based visual detection method. Two strategies named as positive sample retaining and early distribution softening are employed to yield a natural adaption. The mutual consistency between teacher model and student model is further promoted through a hint-based distillation. By extensive controlled experiments, the proposed method is testified to be effective in enhancing the light-weight network’s performance by a large margin.

Download Full-text