Harmonious Coexistence of Structured Weight Pruning and Ternarization for Deep Neural Networks

Deep convolutional neural network (DNN) has demonstrated phenomenal success and been widely used in many computer vision tasks. However, its enormous model size and high computing complexity prohibits its wide deployment into resource limited embedded system, such as FPGA and mGPU. As the two most widely adopted model compression techniques, weight pruning and quantization compress DNN model through introducing weight sparsity (i.e., forcing partial weights as zeros) and quantizing weights into limited bit-width values, respectively. Although there are works attempting to combine the weight pruning and quantization, we still observe disharmony between weight pruning and quantization, especially when more aggressive compression schemes (e.g., Structured pruning and low bit-width quantization) are used. In this work, taking FPGA as the test computing platform and Processing Elements (PE) as the basic parallel computing unit, we first propose a PE-wise structured pruning scheme, which introduces weight sparsification with considering of the architecture of PE. In addition, we integrate it with an optimized weight ternarization approach which quantizes weights into ternary values ({-1,0,+1}), thus converting the dominant convolution operations in DNN from multiplication-and-accumulation (MAC) to addition-only, as well as compressing the original model (from 32-bit floating point to 2-bit ternary representation) by at least 16 times. Then, we investigate and solve the coexistence issue between PE-wise Structured pruning and ternarization, through proposing a Weight Penalty Clipping (WPC) technique with self-adapting threshold. Our experiment shows that the fusion of our proposed techniques can achieve the best state-of-the-art ∼21× PE-wise structured compression rate with merely 1.74%/0.94% (top-1/top-5) accuracy degradation of ResNet-18 on ImageNet dataset.

Download Full-text

Against Membership Inference Attack: Pruning is All You Need

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/432 ◽

2021 ◽

Author(s):

Yijue Wang ◽

Chenghong Wang ◽

Zigeng Wang ◽

Shanglin Zhou ◽

Hang Liu ◽

...

Keyword(s):

Deep Neural Networks ◽

Pruning Algorithm ◽

Privacy Leakage ◽

Model Compression ◽

Computational Operation ◽

Model Size ◽

Inference Attack ◽

Weight Pruning ◽

Pruning Technique ◽

Large Model

The large model size, high computational operations, and vulnerability against membership inference attack (MIA) have impeded deep learning or deep neural networks (DNNs) popularity, especially on mobile devices. To address the challenge, we envision that the weight pruning technique will help DNNs against MIA while reducing model storage and computational operation. In this work, we propose a pruning algorithm, and we show that the proposed algorithm can find a subnetwork that can prevent privacy leakage from MIA and achieves competitive accuracy with the original DNNs. We also verify our theoretical insights with experiments. Our experimental results illustrate that the attack accuracy using model compression is up to 13.6% and 10% lower than that of the baseline and Min-Max game, accordingly.

Download Full-text

Automatic Mixed-Precision Quantization Search of BERT

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/472 ◽

2021 ◽

Author(s):

Changsheng Zhao ◽

Ting Hua ◽

Yilin Shen ◽

Qian Lou ◽

Hongxia Jin

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Language Models ◽

Model Compression ◽

Mixed Precision ◽

Knowledge Distillation ◽

Model Size ◽

Orthogonal Methods ◽

Weight Pruning

Pre-trained language models such as BERT have shown remarkable effectiveness in various natural language processing tasks. However, these models usually contain millions of parameters, which prevent them from the practical deployment on resource-constrained devices. Knowledge distillation, Weight pruning, and Quantization are known to be the main directions in model compression. However, compact models obtained through knowledge distillation may suffer from significant accuracy drop even for a relatively small compression ratio. On the other hand, there are only a few attempts based on quantization designed for natural language processing tasks, and they usually require manual setting on hyper-parameters. In this paper, we proposed an automatic mixed-precision quantization framework designed for BERT that can conduct quantization and pruning simultaneously. Specifically, our proposed method leverages Differentiable Neural Architecture Search to assign scale and precision for parameters in each sub-group automatically, and at the same pruning out redundant groups of parameters. Extensive evaluations on BERT downstream tasks reveal that our proposed method beats baselines by providing the same performance with much smaller model size. We also show the possibility of obtaining the extremely light-weight model by combining our solution with orthogonal methods such as DistilBERT.

Download Full-text

Security Framework for Smart Visual Sensor Networks

Countering Cyber Attacks and Preserving the Integrity and Availability of Critical Systems - Advances in Digital Crime, Forensics, and Cyber Terrorism ◽

10.4018/978-1-5225-8241-0.ch012 ◽

2019 ◽

pp. 230-252

Author(s):

G. Suseela ◽

Y. Asnath Victy Phamila

Keyword(s):

Sensor Networks ◽

Embedded System ◽

Image Encryption ◽

Image Sensor ◽

Visual Sensor ◽

Visual Sensor Network ◽

Partial Encryption ◽

Visual Sensor Networks ◽

Resource Limited ◽

Scalar Data

Due to the significance of image data over the scalar data, the camera-integrated wireless sensor networks have attained the focus of researchers in the field of smart visual sensor networks. These networks are inexpensive and found wide application in surveillance and monitoring systems. The challenge is that these systems are resource deprived systems. The visual sensor node is typically an embedded system made up of a light weight processor, low memory, low bandwidth transceiver, and low-cost image sensor unit. As these networks carry sensitive information of the surveillance region, security and privacy protection are critical needs of the VSN. Due to resource limited nature of the VSN, the image encryption is crooked into an optimally lower issue, and many findings of image security in VSN are based on selective or partial encryption systems. The secure transmission of images is more trivial. Thus, in this chapter, a security frame work of smart visual sensor network built using energy-efficient image encryption and coding systems designed for VSN is presented.

Download Full-text

QoS Scheduling with Opportunistic Spectrum Access for Multimedia

Advances in Wireless Technologies and Telecommunication - Cognitive Radio and Interference Management ◽

10.4018/978-1-4666-2005-6.ch009 ◽

2012 ◽

pp. 162-178

Author(s):

Pavol Polacek ◽

Chih-Wei Huang

Keyword(s):

User Experience ◽

Research Effort ◽

Research Area ◽

Opportunistic Spectrum Access ◽

System Capacity ◽

Research Trend ◽

Spectrum Access ◽

Computing Platform ◽

Resource Limited ◽

Application Data

Thanks to the advances of multimedia application, mobile computing platform, and wireless communication technology, the research area has attracted serious attention in order to seamlessly provide interactive and ubiquitous user experience. To make it happen, the pursuit of higher system capacity in resource limited wireless networks is never-ending. Cognitive radio (CR) represents an exciting new communication paradigm with advantages on spectrum management so as to heighten channel utilization and capacity. The bandwidth demanding multimedia applications are excellent candidates to fully exploit the potential of CR. However, the research effort has been focused mainly on spectrum access while the application specific performance has been much less touched. The research considering both spectrum access and application data scheduling is emerging for maximal user experience. In this chapter, the authors first discuss advances in opportunistic spectrum access (OSA) strategies as well as multimedia QoS scheduling schemes, and then introduce the research trend on joint access and scheduling frameworks.

Download Full-text

A low power hearing aid computing platform using lightweight processing elements

2012 IEEE International Symposium on Circuits and Systems ◽

10.1109/iscas.2012.6271888 ◽

2012 ◽

Cited By ~ 4

Author(s):

Kuo-Chiang Chang ◽

Yu-Wen Chen ◽

Yu-Ting Kuo ◽

Chih-Wei Liu

Keyword(s):

Low Power ◽

Hearing Aid ◽

Processing Elements ◽

Computing Platform

Download Full-text

Super-Resolution Model Quantized in Multi-Precision

Electronics ◽

10.3390/electronics10172176 ◽

2021 ◽

Vol 10 (17) ◽

pp. 2176

Author(s):

Jingyu Liu ◽

Qiong Wang ◽

Dunbo Zhang ◽

Li Shen

Keyword(s):

Super Resolution ◽

Original Model ◽

Data Mapping ◽

Model Compression ◽

Sensitive Stage ◽

Resolution Model ◽

Model Size ◽

Model Training ◽

And Storage ◽

Computing Capacity

Deep learning has achieved outstanding results in various tasks in machine learning under the background of rapid increase in equipment’s computing capacity. However, while achieving higher performance and effects, model size is larger, training and inference time longer, the memory and storage occupancy increasing, the computing efficiency shrinking, and the energy consumption augmenting. Consequently, it’s difficult to let these models run on edge devices such as micro and mobile devices. Model compression technology is gradually emerging and researched, for instance, model quantization. Quantization aware training can take more accuracy loss resulting from data mapping in model training into account, which clamps and approximates the data when updating parameters, and introduces quantization errors into the model loss function. In quantization, we found that some stages of the two super-resolution model networks, SRGAN and ESRGAN, showed sensitivity to quantization, which greatly reduced the performance. Therefore, we use higher-bits integer quantization for the sensitive stage, and train the model together in quantization aware training. Although model size was sacrificed a little, the accuracy approaching the original model was achieved. The ESRGAN model was still reduced by nearly 67.14% and SRGAN model was reduced by nearly 68.48%, and the inference time was reduced by nearly 30.48% and 39.85% respectively. What’s more, the PI values of SRGAN and ESRGAN are 2.1049 and 2.2075 respectively.

Download Full-text

Security Framework for Smart Visual Sensor Networks

Research Anthology on Blockchain Technology in Business, Healthcare, Education, and Government ◽

10.4018/978-1-7998-5351-0.ch024 ◽

2021 ◽

pp. 406-423

Author(s):

G. Suseela ◽

Y. Asnath Victy Phamila

Keyword(s):

Sensor Networks ◽

Embedded System ◽

Image Encryption ◽

Image Sensor ◽

Visual Sensor ◽

Visual Sensor Network ◽

Partial Encryption ◽

Visual Sensor Networks ◽

Resource Limited ◽

Scalar Data

Download Full-text

Emerging 5G IoT Smart System Based on Edge-to-Cloud Computing Platform

International Journal of e-Collaboration ◽

10.4018/ijec.2021100109 ◽

2021 ◽

Vol 17 (4) ◽

pp. 122-131

Author(s):

V. R. Niveditha ◽

D. Usha ◽

P. S. Rajakumar ◽

B. Dwarakanath ◽

Magesh S.

Keyword(s):

Radio Frequency Identification ◽

High Efficiency ◽

Internet Communication ◽

Embedded Devices ◽

Smart System ◽

Computing Platform ◽

Resource Limited ◽

New Ideas ◽

Frequency Identification ◽

Cloud Computing Platform

Security over internet communication has now become difficult as technology is increasingly more effective and faster, particularly in resource limited devices such as wireless sensors, embedded devices, internet of things (IoT), radio frequency identification (RFID) tags, etc. However, IoT is expected to connect billions of computers as a hopeful technology for the future. Hence, security, privacy, and authentication services must protect the communication in IoT. There are several recent considerations, such as restricted computing capacity, register width, RAM size, specific operating environment, ROM size, etc. that have compelled IoT to utilize conventional measures of security. These technologies require greater data speeds, high throughput, expanded power, lower bandwidth, and high efficiency. In addition, IoT has transformed the world in light of these new ideas by offering smooth communication between heterogeneous networks (HetNets).

Download Full-text

Evaluation of Communication Induced Checkpointing in Resource Constrained Embedded Systems

Volume 3: 2011 ASME/IEEE International Conference on Mechatronic and Embedded Systems and Applications, Parts A and B ◽

10.1115/detc2011-48634 ◽

2011 ◽

Cited By ~ 1

Author(s):

Belal H. Sababha ◽

Osamah A. Rawashdeh

Keyword(s):

Embedded Systems ◽

Embedded System ◽

Area Network ◽

Resource Constrained ◽

Promising Technique ◽

Network Bandwidth ◽

Resource Limited ◽

Embedded Applications ◽

And Task

Reconfiguration-Based Fault-Tolerance is one approach for developing dependable safety-critical embedded applications. This approach, compared to traditional hardware and software redundancy, is a promising technique that may achieve the required dependability with a significant reduction in cost in terms of size, weight, price, and power consumption. Reconfiguration necessitates using proper checkpointing protocols to support state reservation and task migration. One of the most common approaches is to use Communication Induced Checkpointing (CIC) protocols, which are well developed and understood for large parallel and information systems, but not much has been done for resource limited embedded systems. This paper implements four common CIC protocols in a resource constrained distributed embedded system with a Controller Area Network (CAN) backbone. An example feedback control system implementation is used for a case study. The four implemented protocols are described and performances are contrasted. The paper compares the protocols in terms of network bandwidth consumptions, CPU usages, checkpointing times, and checkpoint sizes in additional to the traditional measures of forced to local checkpoint rations and total number of checkpoints.

Download Full-text

Energy-saving cloud computing platform based on micro-embedded system

16th International Conference on Advanced Communication Technology ◽

10.1109/icact.2014.6779060 ◽

2014 ◽

Cited By ~ 2

Author(s):

Wen-Hsu Hsieh ◽

San-Peng Kao ◽

Kuang-Hung Tan ◽

Jiann-Liang Chen

Keyword(s):

Cloud Computing ◽

Embedded System ◽

Energy Saving ◽

Computing Platform ◽

Cloud Computing Platform

Download Full-text