scholarly journals AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates

2020 ◽  
Vol 34 (04) ◽  
pp. 4876-4883 ◽  
Author(s):  
Ning Liu ◽  
Xiaolong Ma ◽  
Zhiyuan Xu ◽  
Yanzhi Wang ◽  
Jian Tang ◽  
...  

Structured weight pruning is a representative model compression technique of DNNs to reduce the storage and computation requirements and accelerate inference. An automatic hyperparameter determination process is necessary due to the large number of flexible hyperparameters. This work proposes AutoCompress, an automatic structured pruning framework with the following key performance improvements: (i) effectively incorporate the combination of structured pruning schemes in the automatic process; (ii) adopt the state-of-art ADMM-based structured weight pruning as the core algorithm, and propose an innovative additional purification step for further weight reduction without accuracy loss; and (iii) develop effective heuristic search method enhanced by experience-based guided search, replacing the prior deep reinforcement learning technique which has underlying incompatibility with the target pruning problem. Extensive experiments on CIFAR-10 and ImageNet datasets demonstrate that AutoCompress is the key to achieve ultra-high pruning rates on the number of weights and FLOPs that cannot be achieved before. As an example, AutoCompress outperforms the prior work on automatic model compression by up to 33× in pruning rate (120× reduction in the actual parameter count) under the same accuracy. Significant inference speedup has been observed from the AutoCompress framework on actual measurements on smartphone. We release models of this work at anonymous link: http://bit.ly/2VZ63dS.

2020 ◽  
Vol 34 (04) ◽  
pp. 6623-6630
Author(s):  
Li Yang ◽  
Zhezhi He ◽  
Deliang Fan

Deep convolutional neural network (DNN) has demonstrated phenomenal success and been widely used in many computer vision tasks. However, its enormous model size and high computing complexity prohibits its wide deployment into resource limited embedded system, such as FPGA and mGPU. As the two most widely adopted model compression techniques, weight pruning and quantization compress DNN model through introducing weight sparsity (i.e., forcing partial weights as zeros) and quantizing weights into limited bit-width values, respectively. Although there are works attempting to combine the weight pruning and quantization, we still observe disharmony between weight pruning and quantization, especially when more aggressive compression schemes (e.g., Structured pruning and low bit-width quantization) are used. In this work, taking FPGA as the test computing platform and Processing Elements (PE) as the basic parallel computing unit, we first propose a PE-wise structured pruning scheme, which introduces weight sparsification with considering of the architecture of PE. In addition, we integrate it with an optimized weight ternarization approach which quantizes weights into ternary values ({-1,0,+1}), thus converting the dominant convolution operations in DNN from multiplication-and-accumulation (MAC) to addition-only, as well as compressing the original model (from 32-bit floating point to 2-bit ternary representation) by at least 16 times. Then, we investigate and solve the coexistence issue between PE-wise Structured pruning and ternarization, through proposing a Weight Penalty Clipping (WPC) technique with self-adapting threshold. Our experiment shows that the fusion of our proposed techniques can achieve the best state-of-the-art ∼21× PE-wise structured compression rate with merely 1.74%/0.94% (top-1/top-5) accuracy degradation of ResNet-18 on ImageNet dataset.


2021 ◽  
pp. 1-21
Author(s):  
Andrei C. Apostol ◽  
Maarten C. Stol ◽  
Patrick Forré

We propose a novel pruning method which uses the oscillations around 0, i.e. sign flips, that a weight has undergone during training in order to determine its saliency. Our method can perform pruning before the network has converged, requires little tuning effort due to having good default values for its hyperparameters, and can directly target the level of sparsity desired by the user. Our experiments, performed on a variety of object classification architectures, show that it is competitive with existing methods and achieves state-of-the-art performance for levels of sparsity of 99.6 % and above for 2 out of 3 of the architectures tested. Moreover, we demonstrate that our method is compatible with quantization, another model compression technique. For reproducibility, we release our code at https://github.com/AndreiXYZ/flipout.


2017 ◽  
Vol 164 ◽  
pp. 16-26 ◽  
Author(s):  
Andrew Holliday ◽  
Mohammadamin Barekatain ◽  
Johannes Laurmaa ◽  
Chetak Kandaswamy ◽  
Helmut Prendinger

2010 ◽  
Vol 4 (2) ◽  
pp. 29-36
Author(s):  
Jerzy Mikulik ◽  
Mirosław Zajdel

Assignment Problem (AP), which is well known combinatorial problem, has been studied extensively in the course of many operational and technical researches. It has been shown to be NP-hard for three or more dimensions and a few non-deterministic methods have been proposed to solve it. This paper pays attention on new heuristic search method for the n-dimensional assignment problem, based on swarm intelligence and comparing results with those obtained by other scientists. It indicates possible direction of solutions of problems and presents a way of behaviour using ant algorithm for multidimensional optimization complex systems. Results of researches in the form of computational simulations outcomes are presented.


Author(s):  
Yijue Wang ◽  
Chenghong Wang ◽  
Zigeng Wang ◽  
Shanglin Zhou ◽  
Hang Liu ◽  
...  

The large model size, high computational operations, and vulnerability against membership inference attack (MIA) have impeded deep learning or deep neural networks (DNNs) popularity, especially on mobile devices. To address the challenge, we envision that the weight pruning technique will help DNNs against MIA while reducing model storage and computational operation. In this work, we propose a pruning algorithm, and we show that the proposed algorithm can find a subnetwork that can prevent privacy leakage from MIA and achieves competitive accuracy with the original DNNs. We also verify our theoretical insights with experiments. Our experimental results illustrate that the attack accuracy using model compression is up to 13.6% and 10% lower than that of the baseline and Min-Max game, accordingly.


Author(s):  
Sanbao Su ◽  
Chen Zou ◽  
Weijiang Kong ◽  
Jie Han ◽  
Weikang Qian

2021 ◽  
Vol 17 (11) ◽  
pp. 155014772110570
Author(s):  
Yiting Li ◽  
Liyuan Sun ◽  
Jianping Gou ◽  
Lan Du ◽  
Weihua Ou

Deep neural networks have achieved a great success in a variety of applications, such as self-driving cars and intelligent robotics. Meanwhile, knowledge distillation has received increasing attention as an effective model compression technique for training very efficient deep models. The performance of the student network obtained through knowledge distillation heavily depends on whether the transfer of the teacher’s knowledge can effectively guide the student training. However, most existing knowledge distillation schemes require a large teacher network pre-trained on large-scale data sets, which can increase the difficulty of knowledge distillation in different applications. In this article, we propose a feature fusion-based collaborative learning for knowledge distillation. Specifically, during knowledge distillation, it enables networks to learn from each other using the feature/response-based knowledge in different network layers. We concatenate the features learned by the teacher and the student networks to obtain a more representative feature map for knowledge transfer. In addition, we also introduce a network regularization method to further improve the model performance by providing a positive knowledge during training. Experiments and ablation studies on two widely used data sets demonstrate that the proposed method, feature fusion-based collaborative learning, significantly outperforms recent state-of-the-art knowledge distillation methods.


Sign in / Sign up

Export Citation Format

Share Document