AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates

Structured weight pruning is a representative model compression technique of DNNs to reduce the storage and computation requirements and accelerate inference. An automatic hyperparameter determination process is necessary due to the large number of flexible hyperparameters. This work proposes AutoCompress, an automatic structured pruning framework with the following key performance improvements: (i) effectively incorporate the combination of structured pruning schemes in the automatic process; (ii) adopt the state-of-art ADMM-based structured weight pruning as the core algorithm, and propose an innovative additional purification step for further weight reduction without accuracy loss; and (iii) develop effective heuristic search method enhanced by experience-based guided search, replacing the prior deep reinforcement learning technique which has underlying incompatibility with the target pruning problem. Extensive experiments on CIFAR-10 and ImageNet datasets demonstrate that AutoCompress is the key to achieve ultra-high pruning rates on the number of weights and FLOPs that cannot be achieved before. As an example, AutoCompress outperforms the prior work on automatic model compression by up to 33× in pruning rate (120× reduction in the actual parameter count) under the same accuracy. Significant inference speedup has been observed from the AutoCompress framework on actual measurements on smartphone. We release models of this work at anonymous link: http://bit.ly/2VZ63dS.

Download Full-text

Harmonious Coexistence of Structured Weight Pruning and Ternarization for Deep Neural Networks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6138 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6623-6630

Author(s):

Li Yang ◽

Zhezhi He ◽

Deliang Fan

Keyword(s):

Embedded System ◽

Processing Elements ◽

Computing Platform ◽

Model Compression ◽

Computing Unit ◽

Resource Limited ◽

Adopted Model ◽

Weight Penalty ◽

Model Size ◽

Weight Pruning

Deep convolutional neural network (DNN) has demonstrated phenomenal success and been widely used in many computer vision tasks. However, its enormous model size and high computing complexity prohibits its wide deployment into resource limited embedded system, such as FPGA and mGPU. As the two most widely adopted model compression techniques, weight pruning and quantization compress DNN model through introducing weight sparsity (i.e., forcing partial weights as zeros) and quantizing weights into limited bit-width values, respectively. Although there are works attempting to combine the weight pruning and quantization, we still observe disharmony between weight pruning and quantization, especially when more aggressive compression schemes (e.g., Structured pruning and low bit-width quantization) are used. In this work, taking FPGA as the test computing platform and Processing Elements (PE) as the basic parallel computing unit, we first propose a PE-wise structured pruning scheme, which introduces weight sparsification with considering of the architecture of PE. In addition, we integrate it with an optimized weight ternarization approach which quantizes weights into ternary values ({-1,0,+1}), thus converting the dominant convolution operations in DNN from multiplication-and-accumulation (MAC) to addition-only, as well as compressing the original model (from 32-bit floating point to 2-bit ternary representation) by at least 16 times. Then, we investigate and solve the coexistence issue between PE-wise Structured pruning and ternarization, through proposing a Weight Penalty Clipping (WPC) technique with self-adapting threshold. Our experiment shows that the fusion of our proposed techniques can achieve the best state-of-the-art ∼21× PE-wise structured compression rate with merely 1.74%/0.94% (top-1/top-5) accuracy degradation of ResNet-18 on ImageNet dataset.

Download Full-text

Pruning by leveraging training dynamics

AI Communications ◽

10.3233/aic-210127 ◽

2021 ◽

pp. 1-21

Author(s):

Andrei C. Apostol ◽

Maarten C. Stol ◽

Patrick Forré

Keyword(s):

State Of The Art ◽

Object Classification ◽

Compression Technique ◽

Pruning Method ◽

Model Compression ◽

Art Performance

We propose a novel pruning method which uses the oscillations around 0, i.e. sign flips, that a weight has undergone during training in order to determine its saliency. Our method can perform pruning before the network has converged, requires little tuning effort due to having good default values for its hyperparameters, and can directly target the level of sparsity desired by the user. Our experiments, performed on a variety of object classification architectures, show that it is competitive with existing methods and achieves state-of-the-art performance for levels of sparsity of 99.6 % and above for 2 out of 3 of the architectures tested. Moreover, we demonstrate that our method is compatible with quantization, another model compression technique. For reproducibility, we release our code at https://github.com/AndreiXYZ/flipout.

Download Full-text

Speedup of deep learning ensembles for semantic segmentation using a model compression technique

Computer Vision and Image Understanding ◽

10.1016/j.cviu.2017.05.004 ◽

2017 ◽

Vol 164 ◽

pp. 16-26 ◽

Cited By ~ 8

Author(s):

Andrew Holliday ◽

Mohammadamin Barekatain ◽

Johannes Laurmaa ◽

Chetak Kandaswamy ◽

Helmut Prendinger

Keyword(s):

Deep Learning ◽

Semantic Segmentation ◽

Compression Technique ◽

Model Compression ◽

Learning Ensembles

Download Full-text

A Survey of Heuristic Search Method of Multimodal Optimum Point

Learning Systems and Intelligent Robots ◽

10.1007/978-1-4684-2106-4_7 ◽

1974 ◽

pp. 145-169 ◽

Cited By ~ 1

Author(s):

Moriya Oda ◽

Kahei Nakamura ◽

B. F. Womack

Keyword(s):

Heuristic Search ◽

Search Method ◽

Optimum Point ◽

Heuristic Search Method

Download Full-text

A simple heuristic search method for the automatic generation of neural-based game artificial intelligence architectures in Ms. Pac-Man

10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010) ◽

10.1109/isspa.2010.5605407 ◽

2010 ◽

Author(s):

Tse Guan Tan ◽

Jason Teo ◽

Patricia Anthony

Keyword(s):

Artificial Intelligence ◽

Heuristic Search ◽

Automatic Generation ◽

Search Method ◽

Simple Heuristic ◽

Heuristic Search Method

Download Full-text

A heuristic search method of adaptive interpolation filters in motion compensated predictive video coding

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). ◽

10.1109/icassp.2003.1199492 ◽

2003 ◽

Author(s):

Huipin Zhang ◽

F. Bossen

Keyword(s):

Video Coding ◽

Heuristic Search ◽

Search Method ◽

Interpolation Filters ◽

Adaptive Interpolation ◽

Heuristic Search Method

Download Full-text

Ant Algorithm for AP-N Aimed at Optimization of Complex Systems

Decision Making in Manufacturing and Services ◽

10.7494/dmms.2010.4.2.29 ◽

2010 ◽

Vol 4 (2) ◽

pp. 29-36

Author(s):

Jerzy Mikulik ◽

Mirosław Zajdel

Keyword(s):

Complex Systems ◽

Heuristic Search ◽

Assignment Problem ◽

Combinatorial Problem ◽

Search Method ◽

Ant Algorithm ◽

Computational Simulations ◽

Heuristic Search Method ◽

Deterministic Methods ◽

Multidimensional Optimization

Assignment Problem (AP), which is well known combinatorial problem, has been studied extensively in the course of many operational and technical researches. It has been shown to be NP-hard for three or more dimensions and a few non-deterministic methods have been proposed to solve it. This paper pays attention on new heuristic search method for the n-dimensional assignment problem, based on swarm intelligence and comparing results with those obtained by other scientists. It indicates possible direction of solutions of problems and presents a way of behaviour using ant algorithm for multidimensional optimization complex systems. Results of researches in the form of computational simulations outcomes are presented.

Download Full-text

Against Membership Inference Attack: Pruning is All You Need

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/432 ◽

2021 ◽

Author(s):

Yijue Wang ◽

Chenghong Wang ◽

Zigeng Wang ◽

Shanglin Zhou ◽

Hang Liu ◽

...

Keyword(s):

Deep Neural Networks ◽

Pruning Algorithm ◽

Privacy Leakage ◽

Model Compression ◽

Computational Operation ◽

Model Size ◽

Inference Attack ◽

Weight Pruning ◽

Pruning Technique ◽

Large Model

The large model size, high computational operations, and vulnerability against membership inference attack (MIA) have impeded deep learning or deep neural networks (DNNs) popularity, especially on mobile devices. To address the challenge, we envision that the weight pruning technique will help DNNs against MIA while reducing model storage and computational operation. In this work, we propose a pruning algorithm, and we show that the proposed algorithm can find a subnetwork that can prevent privacy leakage from MIA and achieves competitive accuracy with the original DNNs. We also verify our theoretical insights with experiments. Our experimental results illustrate that the attack accuracy using model compression is up to 13.6% and 10% lower than that of the baseline and Min-Max game, accordingly.

Download Full-text

A Novel Heuristic Search Method for Two-Level Approximate Logic Synthesis

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ◽

10.1109/tcad.2018.2890532 ◽

2020 ◽

Vol 39 (3) ◽

pp. 654-669 ◽

Cited By ~ 2

Author(s):

Sanbao Su ◽

Chen Zou ◽

Weijiang Kong ◽

Jie Han ◽

Weikang Qian

Keyword(s):

Heuristic Search ◽

Logic Synthesis ◽

Search Method ◽

Heuristic Search Method

Download Full-text

Feature fusion-based collaborative learning for knowledge distillation

International Journal of Distributed Sensor Networks ◽

10.1177/15501477211057037 ◽

2021 ◽

Vol 17 (11) ◽

pp. 155014772110570

Author(s):

Yiting Li ◽

Liyuan Sun ◽

Jianping Gou ◽

Lan Du ◽

Weihua Ou

Keyword(s):

Collaborative Learning ◽

Large Scale ◽

Feature Fusion ◽

Model Performance ◽

Data Sets ◽

Great Success ◽

Compression Technique ◽

Model Compression ◽

Knowledge Distillation ◽

Self Driving Cars

Deep neural networks have achieved a great success in a variety of applications, such as self-driving cars and intelligent robotics. Meanwhile, knowledge distillation has received increasing attention as an effective model compression technique for training very efficient deep models. The performance of the student network obtained through knowledge distillation heavily depends on whether the transfer of the teacher’s knowledge can effectively guide the student training. However, most existing knowledge distillation schemes require a large teacher network pre-trained on large-scale data sets, which can increase the difficulty of knowledge distillation in different applications. In this article, we propose a feature fusion-based collaborative learning for knowledge distillation. Specifically, during knowledge distillation, it enables networks to learn from each other using the feature/response-based knowledge in different network layers. We concatenate the features learned by the teacher and the student networks to obtain a more representative feature map for knowledge transfer. In addition, we also introduce a network regularization method to further improve the model performance by providing a positive knowledge during training. Experiments and ablation studies on two widely used data sets demonstrate that the proposed method, feature fusion-based collaborative learning, significantly outperforms recent state-of-the-art knowledge distillation methods.

Download Full-text