HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression

The powerful performance of deep learning is evident to all. With the deepening of research, neural networks have become more complex and not easily generalized to resource-constrained devices. The emergence of a series of model compression algorithms makes artificial intelligence on edge possible. Among them, structured model pruning is widely utilized because of its versatility. Structured pruning prunes the neural network itself and discards some relatively unimportant structures to compress the model’s size. However, in the previous pruning work, problems such as evaluation errors of networks, empirical determination of pruning rate, and low retraining efficiency remain. Therefore, we propose an accurate, objective, and efficient pruning algorithm—Combine-Net, introducing Adaptive BN to eliminate evaluation errors, the Kneedle algorithm to determine the pruning rate objectively, and knowledge distillation to improve the efficiency of retraining. Results show that, without precision loss, Combine-Net achieves 95% parameter compression and 83% computation compression on VGG16 on CIFAR10, 71% of parameter compression and 41% computation compression on ResNet50 on CIFAR100. Experiments on different datasets and models have proved that Combine-Net can efficiently compress the neural network’s parameters and computation.

Download Full-text

Data-Free Ensemble Knowledge Distillation for Privacy-conscious Multimedia Model Compression

10.1145/3474085.3475329 ◽

2021 ◽

Author(s):

Zhiwei Hao ◽

Yong Luo ◽

Han Hu ◽

Jianping An ◽

Yonggang Wen

Keyword(s):

Multimedia Model ◽

Model Compression ◽

Knowledge Distillation

Download Full-text

Revisiting knowledge distillation for light-weight visual object detection

Transactions of the Institute of Measurement and Control ◽

10.1177/01423312211022877 ◽

2021 ◽

Vol 43 (13) ◽

pp. 2888-2898

Author(s):

Tianze Gao ◽

Yunfeng Gao ◽

Yu Li ◽

Peiyuan Qin

Keyword(s):

Object Detection ◽

Essential Element ◽

Detection Algorithm ◽

Positive Sample ◽

Detection Methods ◽

Visual Object ◽

Light Weight ◽

Model Compression ◽

Novel Approach ◽

Knowledge Distillation

An essential element for intelligent perception in mechatronic and robotic systems (M&RS) is the visual object detection algorithm. With the ever-increasing advance of artificial neural networks (ANN), researchers have proposed numerous ANN-based visual object detection methods that have proven to be effective. However, networks with cumbersome structures do not befit the real-time scenarios in M&RS, necessitating the techniques of model compression. In the paper, a novel approach to training light-weight visual object detection networks is developed by revisiting knowledge distillation. Traditional knowledge distillation methods are oriented towards image classification is not compatible with object detection. Therefore, a variant of knowledge distillation is developed and adapted to a state-of-the-art keypoint-based visual detection method. Two strategies named as positive sample retaining and early distribution softening are employed to yield a natural adaption. The mutual consistency between teacher model and student model is further promoted through a hint-based distillation. By extensive controlled experiments, the proposed method is testified to be effective in enhancing the light-weight network’s performance by a large margin.

Download Full-text

Tri-training for Dependency Parsing Domain Adaptation

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3488367 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-17

Author(s):

Shu Jiang ◽

Zuchao Li ◽

Hai Zhao ◽

Bao-Liang Lu ◽

Rui Wang

Keyword(s):

Transfer Learning ◽

High Performance ◽

Domain Adaptation ◽

Language Model ◽

Training Methods ◽

Dependency Parsing ◽

Cross Domain ◽

Cross Lingual ◽

Domain Transfer ◽

Domain Transfer Learning

In recent years, the research on dependency parsing focuses on improving the accuracy of the domain-specific (in-domain) test datasets and has made remarkable progress. However, there are innumerable scenarios in the real world that are not covered by the dataset, namely, the out-of-domain dataset. As a result, parsers that perform well on the in-domain data usually suffer from significant performance degradation on the out-of-domain data. Therefore, to adapt the existing in-domain parsers with high performance to a new domain scenario, cross-domain transfer learning methods are essential to solve the domain problem in parsing. This paper examines two scenarios for cross-domain transfer learning: semi-supervised and unsupervised cross-domain transfer learning. Specifically, we adopt a pre-trained language model BERT for training on the source domain (in-domain) data at the subword level and introduce self-training methods varied from tri-training for these two scenarios. The evaluation results on the NLPCC-2019 shared task and universal dependency parsing task indicate the effectiveness of the adopted approaches on cross-domain transfer learning and show the potential of self-learning to cross-lingual transfer learning.

Download Full-text