scholarly journals Harmonious Coexistence of Structured Weight Pruning and Ternarization for Deep Neural Networks

2020 ◽  
Vol 34 (04) ◽  
pp. 6623-6630
Author(s):  
Li Yang ◽  
Zhezhi He ◽  
Deliang Fan

Deep convolutional neural network (DNN) has demonstrated phenomenal success and been widely used in many computer vision tasks. However, its enormous model size and high computing complexity prohibits its wide deployment into resource limited embedded system, such as FPGA and mGPU. As the two most widely adopted model compression techniques, weight pruning and quantization compress DNN model through introducing weight sparsity (i.e., forcing partial weights as zeros) and quantizing weights into limited bit-width values, respectively. Although there are works attempting to combine the weight pruning and quantization, we still observe disharmony between weight pruning and quantization, especially when more aggressive compression schemes (e.g., Structured pruning and low bit-width quantization) are used. In this work, taking FPGA as the test computing platform and Processing Elements (PE) as the basic parallel computing unit, we first propose a PE-wise structured pruning scheme, which introduces weight sparsification with considering of the architecture of PE. In addition, we integrate it with an optimized weight ternarization approach which quantizes weights into ternary values ({-1,0,+1}), thus converting the dominant convolution operations in DNN from multiplication-and-accumulation (MAC) to addition-only, as well as compressing the original model (from 32-bit floating point to 2-bit ternary representation) by at least 16 times. Then, we investigate and solve the coexistence issue between PE-wise Structured pruning and ternarization, through proposing a Weight Penalty Clipping (WPC) technique with self-adapting threshold. Our experiment shows that the fusion of our proposed techniques can achieve the best state-of-the-art ∼21× PE-wise structured compression rate with merely 1.74%/0.94% (top-1/top-5) accuracy degradation of ResNet-18 on ImageNet dataset.

Author(s):  
Yijue Wang ◽  
Chenghong Wang ◽  
Zigeng Wang ◽  
Shanglin Zhou ◽  
Hang Liu ◽  
...  

The large model size, high computational operations, and vulnerability against membership inference attack (MIA) have impeded deep learning or deep neural networks (DNNs) popularity, especially on mobile devices. To address the challenge, we envision that the weight pruning technique will help DNNs against MIA while reducing model storage and computational operation. In this work, we propose a pruning algorithm, and we show that the proposed algorithm can find a subnetwork that can prevent privacy leakage from MIA and achieves competitive accuracy with the original DNNs. We also verify our theoretical insights with experiments. Our experimental results illustrate that the attack accuracy using model compression is up to 13.6% and 10% lower than that of the baseline and Min-Max game, accordingly.


Author(s):  
Changsheng Zhao ◽  
Ting Hua ◽  
Yilin Shen ◽  
Qian Lou ◽  
Hongxia Jin

Pre-trained language models such as BERT have shown remarkable effectiveness in various natural language processing tasks. However, these models usually contain millions of parameters, which prevent them from the practical deployment on resource-constrained devices. Knowledge distillation, Weight pruning, and Quantization are known to be the main directions in model compression. However, compact models obtained through knowledge distillation may suffer from significant accuracy drop even for a relatively small compression ratio. On the other hand, there are only a few attempts based on quantization designed for natural language processing tasks, and they usually require manual setting on hyper-parameters. In this paper, we proposed an automatic mixed-precision quantization framework designed for BERT that can conduct quantization and pruning simultaneously. Specifically, our proposed method leverages Differentiable Neural Architecture Search to assign scale and precision for parameters in each sub-group automatically, and at the same pruning out redundant groups of parameters. Extensive evaluations on BERT downstream tasks reveal that our proposed method beats baselines by providing the same performance with much smaller model size. We also show the possibility of obtaining the extremely light-weight model by combining our solution with orthogonal methods such as DistilBERT.


Author(s):  
G. Suseela ◽  
Y. Asnath Victy Phamila

Due to the significance of image data over the scalar data, the camera-integrated wireless sensor networks have attained the focus of researchers in the field of smart visual sensor networks. These networks are inexpensive and found wide application in surveillance and monitoring systems. The challenge is that these systems are resource deprived systems. The visual sensor node is typically an embedded system made up of a light weight processor, low memory, low bandwidth transceiver, and low-cost image sensor unit. As these networks carry sensitive information of the surveillance region, security and privacy protection are critical needs of the VSN. Due to resource limited nature of the VSN, the image encryption is crooked into an optimally lower issue, and many findings of image security in VSN are based on selective or partial encryption systems. The secure transmission of images is more trivial. Thus, in this chapter, a security frame work of smart visual sensor network built using energy-efficient image encryption and coding systems designed for VSN is presented.


Author(s):  
Pavol Polacek ◽  
Chih-Wei Huang

Thanks to the advances of multimedia application, mobile computing platform, and wireless communication technology, the research area has attracted serious attention in order to seamlessly provide interactive and ubiquitous user experience. To make it happen, the pursuit of higher system capacity in resource limited wireless networks is never-ending. Cognitive radio (CR) represents an exciting new communication paradigm with advantages on spectrum management so as to heighten channel utilization and capacity. The bandwidth demanding multimedia applications are excellent candidates to fully exploit the potential of CR. However, the research effort has been focused mainly on spectrum access while the application specific performance has been much less touched. The research considering both spectrum access and application data scheduling is emerging for maximal user experience. In this chapter, the authors first discuss advances in opportunistic spectrum access (OSA) strategies as well as multimedia QoS scheduling schemes, and then introduce the research trend on joint access and scheduling frameworks.


Electronics ◽  
2021 ◽  
Vol 10 (17) ◽  
pp. 2176
Author(s):  
Jingyu Liu ◽  
Qiong Wang ◽  
Dunbo Zhang ◽  
Li Shen

Deep learning has achieved outstanding results in various tasks in machine learning under the background of rapid increase in equipment’s computing capacity. However, while achieving higher performance and effects, model size is larger, training and inference time longer, the memory and storage occupancy increasing, the computing efficiency shrinking, and the energy consumption augmenting. Consequently, it’s difficult to let these models run on edge devices such as micro and mobile devices. Model compression technology is gradually emerging and researched, for instance, model quantization. Quantization aware training can take more accuracy loss resulting from data mapping in model training into account, which clamps and approximates the data when updating parameters, and introduces quantization errors into the model loss function. In quantization, we found that some stages of the two super-resolution model networks, SRGAN and ESRGAN, showed sensitivity to quantization, which greatly reduced the performance. Therefore, we use higher-bits integer quantization for the sensitive stage, and train the model together in quantization aware training. Although model size was sacrificed a little, the accuracy approaching the original model was achieved. The ESRGAN model was still reduced by nearly 67.14% and SRGAN model was reduced by nearly 68.48%, and the inference time was reduced by nearly 30.48% and 39.85% respectively. What’s more, the PI values of SRGAN and ESRGAN are 2.1049 and 2.2075 respectively.


Author(s):  
G. Suseela ◽  
Y. Asnath Victy Phamila

Due to the significance of image data over the scalar data, the camera-integrated wireless sensor networks have attained the focus of researchers in the field of smart visual sensor networks. These networks are inexpensive and found wide application in surveillance and monitoring systems. The challenge is that these systems are resource deprived systems. The visual sensor node is typically an embedded system made up of a light weight processor, low memory, low bandwidth transceiver, and low-cost image sensor unit. As these networks carry sensitive information of the surveillance region, security and privacy protection are critical needs of the VSN. Due to resource limited nature of the VSN, the image encryption is crooked into an optimally lower issue, and many findings of image security in VSN are based on selective or partial encryption systems. The secure transmission of images is more trivial. Thus, in this chapter, a security frame work of smart visual sensor network built using energy-efficient image encryption and coding systems designed for VSN is presented.


2021 ◽  
Vol 17 (4) ◽  
pp. 122-131
Author(s):  
V. R. Niveditha ◽  
D. Usha ◽  
P. S. Rajakumar ◽  
B. Dwarakanath ◽  
Magesh S.

Security over internet communication has now become difficult as technology is increasingly more effective and faster, particularly in resource limited devices such as wireless sensors, embedded devices, internet of things (IoT), radio frequency identification (RFID) tags, etc. However, IoT is expected to connect billions of computers as a hopeful technology for the future. Hence, security, privacy, and authentication services must protect the communication in IoT. There are several recent considerations, such as restricted computing capacity, register width, RAM size, specific operating environment, ROM size, etc. that have compelled IoT to utilize conventional measures of security. These technologies require greater data speeds, high throughput, expanded power, lower bandwidth, and high efficiency. In addition, IoT has transformed the world in light of these new ideas by offering smooth communication between heterogeneous networks (HetNets).


Author(s):  
Belal H. Sababha ◽  
Osamah A. Rawashdeh

Reconfiguration-Based Fault-Tolerance is one approach for developing dependable safety-critical embedded applications. This approach, compared to traditional hardware and software redundancy, is a promising technique that may achieve the required dependability with a significant reduction in cost in terms of size, weight, price, and power consumption. Reconfiguration necessitates using proper checkpointing protocols to support state reservation and task migration. One of the most common approaches is to use Communication Induced Checkpointing (CIC) protocols, which are well developed and understood for large parallel and information systems, but not much has been done for resource limited embedded systems. This paper implements four common CIC protocols in a resource constrained distributed embedded system with a Controller Area Network (CAN) backbone. An example feedback control system implementation is used for a case study. The four implemented protocols are described and performances are contrasted. The paper compares the protocols in terms of network bandwidth consumptions, CPU usages, checkpointing times, and checkpoint sizes in additional to the traditional measures of forced to local checkpoint rations and total number of checkpoints.


Sign in / Sign up

Export Citation Format

Share Document