weight pruning Latest Research Papers

Recurrent Neural Networks (RNNs) have been widely applied in various fields. However, in real-world application, because most devices like mobile phones are limited to the storage capacity when processing real-time information, an over-parameterized model always slows down the system speed and is not suitable to be employed. In our proposed temperature control system, the RNN-based control model processes the real-time temperature signals. It is necessary to compress the trained model with acceptable loss of control performance for further implementation in the actual controller when the system resource is limited. Inspired by the layer-wise neuron pruning method, in this paper, we apply the nonlinear reconstruction error (NRE) guided layer-wise weight pruning method on the RNN-based temperature control system. The control system is established based on MATLAB/Simulink. In order to compress the model size to save the memory capacity of temperature controller devices, we first prove the validity of the proposed reference-model (ref-model) guided RNN model for real-time online data processing on an actual temperature object; relative experiments are implemented based on a digital signal processor. On this basis, we then verified the NRE guided layer-wise weight pruning method on the well-trained temperature control model. Compared with the classical pruning method, experiment results indicate that the pruned control model based on NRE guided layer-wise weight pruning can effectively achieve the high accuracy at targeted sparsity of the network.

Download Full-text

A Unified DNN Weight Pruning Framework Using Reweighted Optimization Methods

10.1109/dac18074.2021.9586152 ◽

2021 ◽

Author(s):

Tianyun Zhang ◽

Xiaolong Ma ◽

Zheng Zhan ◽

Shanglin Zhou ◽

Caiwen Ding ◽

...

Keyword(s):

Optimization Methods ◽

Weight Pruning

Download Full-text

Exploiting Activation Sparsity for Fast CNN Inference on Mobile GPUs

ACM Transactions on Embedded Computing Systems ◽

10.1145/3477008 ◽

2021 ◽

Vol 20 (5s) ◽

pp. 1-25

Author(s):

Chanyoung Oh ◽

Junhyuk So ◽

Sumin Kim ◽

Youngmin Yi

Keyword(s):

Deep Learning ◽

Mobile Platforms ◽

The Past ◽

Research Themes ◽

Active Research ◽

Weight Pruning ◽

Convolution Method ◽

Mobile Gpus

Over the past several years, the need for on-device deep learning has been rapidly increasing, and efficient CNN inference on mobile platforms has been actively researched. Sparsity exploitation has been one of the most active research themes, but the studies mostly focus on weight sparsity by weight pruning. Activation sparsity, on the contrary, requires compression at runtime for every input tensor. Hence, the research on activation sparsity mainly targets NPUs that can efficiently process this with their own hardware logic. In this paper, we observe that it is difficult to accelerate CNN inference on mobile GPUs with natural activation sparsity and that the widely used CSR-based sparse convolution is not sufficiently effective due to the compression overhead. We propose several novel sparsification methods that can boost activation sparsity without harming accuracy. In particular, we selectively sparsify some layers with an extremely high sparsity and adopt sparse convolution or dense convolution depending on the layers. Further, we present an efficient sparse convolution method without compression and demonstrate that it can be faster than the CSR implementation. With ResNet-50, we achieved 1.88 speedup compared to TFLite on a Mali-G76 GPU.

Download Full-text

Online Weight Pruning Via Adaptive Sparsity Loss

10.1109/icip42928.2021.9506301 ◽

2021 ◽

Author(s):

George Retsinas ◽

Athena Elafrou ◽

Georgios Goumas ◽

Petros Maragos

Keyword(s):

Weight Pruning

Download Full-text

Stochastic Weight Pruning and the Role of Regularization in Shaping Network Structure

Neurocomputing ◽

10.1016/j.neucom.2021.08.007 ◽

2021 ◽

Author(s):

Yael Ben-Guigui ◽

Jacob Goldberger ◽

Tammy Riklin-Raviv

Keyword(s):

Network Structure ◽

Weight Pruning

Download Full-text

Against Membership Inference Attack: Pruning is All You Need

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/432 ◽

2021 ◽

Author(s):

Yijue Wang ◽

Chenghong Wang ◽

Zigeng Wang ◽

Shanglin Zhou ◽

Hang Liu ◽

...

Keyword(s):

Deep Neural Networks ◽

Pruning Algorithm ◽

Privacy Leakage ◽

Model Compression ◽

Computational Operation ◽

Model Size ◽

Inference Attack ◽

Weight Pruning ◽

Pruning Technique ◽

Large Model

The large model size, high computational operations, and vulnerability against membership inference attack (MIA) have impeded deep learning or deep neural networks (DNNs) popularity, especially on mobile devices. To address the challenge, we envision that the weight pruning technique will help DNNs against MIA while reducing model storage and computational operation. In this work, we propose a pruning algorithm, and we show that the proposed algorithm can find a subnetwork that can prevent privacy leakage from MIA and achieves competitive accuracy with the original DNNs. We also verify our theoretical insights with experiments. Our experimental results illustrate that the attack accuracy using model compression is up to 13.6% and 10% lower than that of the baseline and Min-Max game, accordingly.

Download Full-text

Towards Fast and Accurate Multi-Person Pose Estimation on Mobile Devices

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/715 ◽

2021 ◽

Author(s):

Xuan Shen ◽

Geng Yuan ◽

Wei Niu ◽

Xiaolong Ma ◽

Jiexiong Guan ◽

...

Keyword(s):

Mobile Devices ◽

Pose Estimation ◽

Rapid Development ◽

Autonomous Driving ◽

Abnormal Behavior ◽

Mobile Platforms ◽

Abnormal Behavior Detection ◽

Increasing Demand ◽

Weight Pruning ◽

And Behavior

The rapid development of autonomous driving, abnormal behavior detection, and behavior recognition makes an increasing demand for multi-person pose estimation-based applications, especially on mobile platforms. However, to achieve high accuracy, state-of-the-art methods tend to have a large model size and complex post-processing algorithm, which costs intense computation and long end-to-end latency. To solve this problem, we propose an architecture optimization and weight pruning framework to accelerate inference of multi-person pose estimation on mobile devices. With our optimization framework, we achieve up to 2.51X faster model inference speed with higher accuracy compared to representative lightweight multi-person pose estimator.

Download Full-text

Automatic Mixed-Precision Quantization Search of BERT

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/472 ◽

2021 ◽

Author(s):

Changsheng Zhao ◽

Ting Hua ◽

Yilin Shen ◽

Qian Lou ◽

Hongxia Jin

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Language Models ◽

Model Compression ◽

Mixed Precision ◽

Knowledge Distillation ◽

Model Size ◽

Orthogonal Methods ◽

Weight Pruning

Pre-trained language models such as BERT have shown remarkable effectiveness in various natural language processing tasks. However, these models usually contain millions of parameters, which prevent them from the practical deployment on resource-constrained devices. Knowledge distillation, Weight pruning, and Quantization are known to be the main directions in model compression. However, compact models obtained through knowledge distillation may suffer from significant accuracy drop even for a relatively small compression ratio. On the other hand, there are only a few attempts based on quantization designed for natural language processing tasks, and they usually require manual setting on hyper-parameters. In this paper, we proposed an automatic mixed-precision quantization framework designed for BERT that can conduct quantization and pruning simultaneously. Specifically, our proposed method leverages Differentiable Neural Architecture Search to assign scale and precision for parameters in each sub-group automatically, and at the same pruning out redundant groups of parameters. Extensive evaluations on BERT downstream tasks reveal that our proposed method beats baselines by providing the same performance with much smaller model size. We also show the possibility of obtaining the extremely light-weight model by combining our solution with orthogonal methods such as DistilBERT.

Download Full-text

Enabling Retrain-free Deep Neural Network Pruning Using Surrogate Lagrangian Relaxation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/344 ◽

2021 ◽

Author(s):

Deniz Gurevin ◽

Mikhail Bragin ◽

Caiwen Ding ◽

Shanglin Zhou ◽

Lynn Pepin ◽

...

Keyword(s):

Lagrangian Relaxation ◽

Lane Detection ◽

Fine Tuning ◽

Model Parameters ◽

Optimization Approach ◽

Model Accuracy ◽

Accuracy Requirement ◽

Network Pruning ◽

The Arts ◽

Weight Pruning

Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks. However, the typical three-stage pipeline, i.e., training, pruning and retraining (fine-tuning) significantly increases the overall training trails. In this paper, we develop a systematic weight-pruning optimization approach based on Surrogate Lagrangian relaxation (SLR), which is tailored to overcome difficulties caused by the discrete nature of the weight-pruning problem while ensuring fast convergence. We further accelerate the convergence of the SLR by using quadratic penalties. Model parameters obtained by SLR during the training phase are much closer to their optimal values as compared to those obtained by other state-of-the-art methods. We evaluate the proposed method on image classification tasks using CIFAR-10 and ImageNet, as well as object detection tasks using COCO 2014 and Ultra-Fast-Lane-Detection using TuSimple lane detection dataset. Experimental results demonstrate that our SLR-based weight-pruning optimization approach achieves higher compression rate than state-of-the-arts under the same accuracy requirement. It also achieves a high model accuracy even at the hard-pruning stage without retraining (reduces the traditional three-stage pruning to two-stage). Given a limited budget of retraining epochs, our approach quickly recovers the model accuracy.

Download Full-text

Dynamic Regularization on Activation Sparsity for Neural Network Efficiency Improvement

ACM Journal on Emerging Technologies in Computing Systems ◽

10.1145/3447776 ◽

2021 ◽

Vol 17 (4) ◽

pp. 1-16

Author(s):

Qing Yang ◽

Jiachen Mao ◽

Zuoguan Wang ◽

“Helen” Li Hai

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Low Cost ◽

Data Communication ◽

Network Efficiency ◽

Computation Cost ◽

Execution Speed ◽

Static Feature ◽

Weight Pruning ◽

Sparsity Level

When deploying deep neural networks in embedded systems, it is crucial to decrease the model size and computational complexity for improving the execution speed and efficiency. In addition to conventional compression techniques, e.g., weight pruning and quantization, removing unimportant activations can also dramatically reduce the amount of data communication and the computation cost. Unlike weight parameters, the pattern of activations is directly related to input data and thereby changes dynamically. To regulate the dynamic activation sparsity (DAS), in this work, we propose a generic low-cost approach based on winners-take-all (WTA) dropout technique. The network enhanced by the proposed WTA dropout, namely DASNet , features structured activation sparsity with an improved sparsity level. Compared to the static feature map pruning methods, DASNets provide better computation cost reduction. The WTA dropout technique can be easily applied in deep neural networks without incurring additional training variables. More importantly, DASNet can be seamlessly integrated with other compression techniques, such as weight pruning and quantization, without compromising accuracy. Our experiments on various networks and datasets present significant runtime speedups with negligible accuracy losses.

Download Full-text

weight pruning
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Recurrent Neural Network-Based Temperature Control System Weight Pruning Based on Nonlinear Reconstruction Error

A Unified DNN Weight Pruning Framework Using Reweighted Optimization Methods

Exploiting Activation Sparsity for Fast CNN Inference on Mobile GPUs

Online Weight Pruning Via Adaptive Sparsity Loss

Stochastic Weight Pruning and the Role of Regularization in Shaping Network Structure

Against Membership Inference Attack: Pruning is All You Need

Towards Fast and Accurate Multi-Person Pose Estimation on Mobile Devices

Automatic Mixed-Precision Quantization Search of BERT

Enabling Retrain-free Deep Neural Network Pruning Using Surrogate Lagrangian Relaxation

Dynamic Regularization on Activation Sparsity for Neural Network Efficiency Improvement

Export Citation Format

weight pruningRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Recurrent Neural Network-Based Temperature Control System Weight Pruning Based on Nonlinear Reconstruction Error

A Unified DNN Weight Pruning Framework Using Reweighted Optimization Methods

Exploiting Activation Sparsity for Fast CNN Inference on Mobile GPUs

Online Weight Pruning Via Adaptive Sparsity Loss

Stochastic Weight Pruning and the Role of Regularization in Shaping Network Structure

Against Membership Inference Attack: Pruning is All You Need

Towards Fast and Accurate Multi-Person Pose Estimation on Mobile Devices

Automatic Mixed-Precision Quantization Search of BERT

Enabling Retrain-free Deep Neural Network Pruning Using Surrogate Lagrangian Relaxation

Dynamic Regularization on Activation Sparsity for Neural Network Efficiency Improvement

weight pruning
Recently Published Documents