Knowledge Distillation from Bert in Pre-Training and Fine-Tuning for Polyphone Disambiguation

Zero-Shot Learning (ZSL) handles the problem that some testing classes never appear in training set. Existing ZSL methods are designed for learning from a fixed training set, which do not have the ability to capture and accumulate the knowledge of multiple training sets, causing them infeasible to many real-world applications. In this paper, we propose a new ZSL setting, named as Lifelong Zero-Shot Learning (LZSL), which aims to accumulate the knowledge during the learning from multiple datasets and recognize unseen classes of all trained datasets. Besides, a novel method is conducted to realize LZSL, which effectively alleviates the Catastrophic Forgetting in the continuous training process. Specifically, considering those datasets containing different semantic embeddings, we utilize Variational Auto-Encoder to obtain unified semantic representations. Then, we leverage selective retraining strategy to preserve the trained weights of previous tasks and avoid negative transfer when fine-tuning the entire model. Finally, knowledge distillation is employed to transfer knowledge from previous training stages to current stage. We also design the LZSL evaluation protocol and the challenging benchmarks. Extensive experiments on these benchmarks indicate that our method tackles LZSL problem effectively, while existing ZSL methods fail.

Download Full-text

Knowledge Distillation Inspired Fine-Tuning Of Tucker Decomposed CNNS and Adversarial Robustness Analysis

2020 IEEE International Conference on Image Processing (ICIP) ◽

10.1109/icip40778.2020.9190672 ◽

2020 ◽

Author(s):

Ranajoy Sadhukhan ◽

Avinab Saha ◽

Jayanta Mukhopadhyay ◽

Amit Patra

Keyword(s):

Robustness Analysis ◽

Fine Tuning ◽

Knowledge Distillation

Download Full-text

Acquiring Knowledge from Pre-Trained Model to Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6465 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9266-9273

Author(s):

Rongxiang Weng ◽

Heng Yu ◽

Shujian Huang ◽

Shanbo Cheng ◽

Weihua Luo

Keyword(s):

Machine Translation ◽

Large Scale ◽

Fine Tuning ◽

Great Success ◽

Training Process ◽

Neural Machine Translation ◽

Language Knowledge ◽

Knowledge Distillation ◽

Training Objective ◽

Natural Language Process

Pre-training and fine-tuning have achieved great success in natural language process field. The standard paradigm of exploiting them includes two steps: first, pre-training a model, e.g. BERT, with a large scale unlabeled monolingual data. Then, fine-tuning the pre-trained model with labeled data from downstream tasks. However, in neural machine translation (NMT), we address the problem that the training objective of the bilingual task is far different from the monolingual pre-trained model. This gap leads that only using fine-tuning in NMT can not fully utilize prior language knowledge. In this paper, we propose an Apt framework for acquiring knowledge from pre-trained model to NMT. The proposed approach includes two modules: 1). a dynamic fusion mechanism to fuse task-specific features adapted from general knowledge into NMT network, 2). a knowledge distillation paradigm to learn language knowledge continuously during the NMT training process. The proposed approach could integrate suitable knowledge from pre-trained models to improve the NMT. Experimental results on WMT English to German, German to English and Chinese to English machine translation tasks show that our model outperforms strong baselines and the fine-tuning counterparts.

Download Full-text

Adaptive Teacher Finetune: Towards high-performance knowledge distillation through adaptive fine-tuning

Journal of Physics Conference Series ◽

10.1088/1742-6596/1982/1/012084 ◽

2021 ◽

Vol 1982 (1) ◽

pp. 012084

Author(s):

Zhenyan Hou ◽

Wenxuan Fan

Keyword(s):

High Performance ◽

Fine Tuning ◽

Knowledge Distillation

Download Full-text

Transformer-Based ASR Incorporating Time-Reduction Layer and Fine-Tuning with Self-Knowledge Distillation

10.21437/interspeech.2021-1743 ◽

2021 ◽

Author(s):

Md. Akmal Haidar ◽

Chao Xing ◽

Mehdi Rezagholizadeh

Keyword(s):

Fine Tuning ◽

Knowledge Distillation ◽

Self Knowledge ◽

Time Reduction

Download Full-text

Adversarial Feature Alignment: Avoid Catastrophic Forgetting in Incremental Task Lifelong Learning

Neural Computation ◽

10.1162/neco_a_01232 ◽

2019 ◽

Vol 31 (11) ◽

pp. 2266-2291 ◽

Cited By ~ 4

Author(s):

Xin Yao ◽

Tianchi Huang ◽

Chenglei Wu ◽

Rui-Xiao Zhang ◽

Lifeng Sun

Keyword(s):

Lifelong Learning ◽

Historical Memory ◽

Fine Tuning ◽

Model Parameters ◽

Semantic Features ◽

Task Sequence ◽

Knowledge Distillation ◽

And Performance ◽

High Level ◽

Feature Alignment

Humans are able to master a variety of knowledge and skills with ongoing learning. By contrast, dramatic performance degradation is observed when new tasks are added to an existing neural network model. This phenomenon, termed catastrophic forgetting, is one of the major roadblocks that prevent deep neural networks from achieving human-level artificial intelligence. Several research efforts (e.g., lifelong or continual learning algorithms) have proposed to tackle this problem. However, they either suffer from an accumulating drop in performance as the task sequence grows longer, or require storing an excessive number of model parameters for historical memory, or cannot obtain competitive performance on the new tasks. In this letter, we focus on the incremental multitask image classification scenario. Inspired by the learning process of students, who usually decompose complex tasks into easier goals, we propose an adversarial feature alignment method to avoid catastrophic forgetting. In our design, both the low-level visual features and high-level semantic features serve as soft targets and guide the training process in multiple stages, which provide sufficient supervised information of the old tasks and help to reduce forgetting. Due to the knowledge distillation and regularization phenomena, the proposed method gains even better performance than fine-tuning on the new tasks, which makes it stand out from other methods. Extensive experiments in several typical lifelong learning scenarios demonstrate that our method outperforms the state-of-the-art methods in both accuracy on new tasks and performance preservation on old tasks.

Download Full-text

Video Based Person Re-Identification Through Selective Knowledge Distillation

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1952179 ◽

2019 ◽

pp. 699-703

Author(s):

Gudavalli Sai Abhilash ◽

Kantheti Rajesh ◽

Jangam Dileep Shaleem ◽

Grandi Sai Sarath ◽

Palli R Krishna Prasad

Keyword(s):

Computational Cost ◽

Low Complexity ◽

Fine Tuning ◽

Low Resolution ◽

Facial Cues ◽

Significant Performance ◽

Knowledge Distillation ◽

Computational Resources ◽

The Cost ◽

Low Computational Cost

The creation and deployment of face recognition models need to identify low-resolution faces with extremely low computational cost. To address this problem, a feasible solution is compressing a complex face model to achieve higher speed and lower memory at the cost of minimal performance drop. Inspired by that, this paper proposes a learning approach to recognize low-resolution faces via selective knowledge distillation in live video. In this approach, a two-stream convolution neural network (CNN) is first initialized to recognize high-resolution faces and resolution-degraded faces with a teacher stream and a student stream, respectively. The teacher stream is represented by a complex CNN for high-accuracy recognition, and the student stream is represented by a much simpler CNN for low-complexity recognition. To avoid significant performance drop at the student stream, we then selectively distil the most informative facial features from the teacher stream by solving a sparse graph optimization problem, which are then used to regularize the fine- tuning process of the student stream. In this way, the student stream is actually trained by simultaneously handling two tasks with limited computational resources approximating the most informative facial cues via feature regression, and recovering the missing facial cues via low-resolution face classification.

Download Full-text

Go From the General to the Particular: Multi-Domain Translation with Domain Transformation Networks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6461 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9233-9241

Author(s):

Yong Wang ◽

Longyue Wang ◽

Shuming Shi ◽

Victor O.K. Li ◽

Zhaopeng Tu

Keyword(s):

Unified Model ◽

Fine Tuning ◽

General Knowledge ◽

Specific Knowledge ◽

Adversarial Learning ◽

Domain Specific ◽

Domain Specific Knowledge ◽

Additional Domain ◽

Knowledge Distillation ◽

Domain Transformation

The key challenge of multi-domain translation lies in simultaneously encoding both the general knowledge shared across domains and the particular knowledge distinctive to each domain in a unified model. Previous work shows that the standard neural machine translation (NMT) model, trained on mixed-domain data, generally captures the general knowledge, but misses the domain-specific knowledge. In response to this problem, we augment NMT model with additional domain transformation networks to transform the general representations to domain-specific representations, which are subsequently fed to the NMT decoder. To guarantee the knowledge transformation, we also propose two complementary supervision signals by leveraging the power of knowledge distillation and adversarial learning. Experimental results on several language pairs, covering both balanced and unbalanced multi-domain translation, demonstrate the effectiveness and universality of the proposed approach. Encouragingly, the proposed unified model achieves comparable results with the fine-tuning approach that requires multiple models to preserve the particular knowledge. Further analyses reveal that the domain transformation networks successfully capture the domain-specific knowledge as expected.1

Download Full-text

Fine-Tuning Futures

ASHA Leader ◽

10.1044/leader.fq.22062017.np ◽

2017 ◽

Vol 22 (6) ◽

Author(s):

Christi Miller

Keyword(s):

Fine Tuning

Download Full-text

Food Safety Security: A new Concept for Enhancing Food Safety Measures

International Journal for Vitamin and Nutrition Research ◽

10.1024/0300-9831/a000114 ◽

2012 ◽

Vol 82 (3) ◽

pp. 216-222 ◽

Cited By ~ 4

Author(s):

Venkatesh Iyengar ◽

Ibrahim Elmadfa

Keyword(s):

Food Safety ◽

Hazard Analysis ◽

Warning System ◽

Shared Responsibility ◽

Fine Tuning ◽

Strategic Action ◽

Surveillance Network ◽

Measurement Systems ◽

Critical Control Points ◽

The Impact

The food safety security (FSS) concept is perceived as an early warning system for minimizing food safety (FS) breaches, and it functions in conjunction with existing FS measures. Essentially, the function of FS and FSS measures can be visualized in two parts: (i) the FS preventive measures as actions taken at the stem level, and (ii) the FSS interventions as actions taken at the root level, to enhance the impact of the implemented safety steps. In practice, along with FS, FSS also draws its support from (i) legislative directives and regulatory measures for enforcing verifiable, timely, and effective compliance; (ii) measurement systems in place for sustained quality assurance; and (iii) shared responsibility to ensure cohesion among all the stakeholders namely, policy makers, regulators, food producers, processors and distributors, and consumers. However, the functional framework of FSS differs from that of FS by way of: (i) retooling the vulnerable segments of the preventive features of existing FS measures; (ii) fine-tuning response systems to efficiently preempt the FS breaches; (iii) building a long-term nutrient and toxicant surveillance network based on validated measurement systems functioning in real time; (iv) focusing on crisp, clear, and correct communication that resonates among all the stakeholders; and (v) developing inter-disciplinary human resources to meet ever-increasing FS challenges. Important determinants of FSS include: (i) strengthening international dialogue for refining regulatory reforms and addressing emerging risks; (ii) developing innovative and strategic action points for intervention {in addition to Hazard Analysis and Critical Control Points (HACCP) procedures]; and (iii) introducing additional science-based tools such as metrology-based measurement systems.

Download Full-text