Knowledge Distillation from Bert in Pre-Training and Fine-Tuning for Polyphone Disambiguation

Author(s):  
Hao Sun ◽  
Xu Tan ◽  
Jun-Wei Gan ◽  
Sheng Zhao ◽  
Dongxu Han ◽  
...  
Author(s):  
Kun Wei ◽  
Cheng Deng ◽  
Xu Yang

Zero-Shot Learning (ZSL) handles the problem that some testing classes never appear in training set. Existing ZSL methods are designed for learning from a fixed training set, which do not have the ability to capture and accumulate the knowledge of multiple training sets, causing them infeasible to many real-world applications. In this paper, we propose a new ZSL setting, named as Lifelong Zero-Shot Learning (LZSL), which aims to accumulate the knowledge during the learning from multiple datasets and recognize unseen classes of all trained datasets. Besides, a novel method is conducted to realize LZSL, which effectively alleviates the Catastrophic Forgetting in the continuous training process. Specifically, considering those datasets containing different semantic embeddings, we utilize Variational Auto-Encoder to obtain unified semantic representations. Then, we leverage selective retraining strategy to preserve the trained weights of previous tasks and avoid negative transfer when fine-tuning the entire model. Finally, knowledge distillation is employed to transfer knowledge from previous training stages to current stage. We also design the LZSL evaluation protocol and the challenging benchmarks. Extensive experiments on these benchmarks indicate that our method tackles LZSL problem effectively, while existing ZSL methods fail.


2020 ◽  
Vol 34 (05) ◽  
pp. 9266-9273
Author(s):  
Rongxiang Weng ◽  
Heng Yu ◽  
Shujian Huang ◽  
Shanbo Cheng ◽  
Weihua Luo

Pre-training and fine-tuning have achieved great success in natural language process field. The standard paradigm of exploiting them includes two steps: first, pre-training a model, e.g. BERT, with a large scale unlabeled monolingual data. Then, fine-tuning the pre-trained model with labeled data from downstream tasks. However, in neural machine translation (NMT), we address the problem that the training objective of the bilingual task is far different from the monolingual pre-trained model. This gap leads that only using fine-tuning in NMT can not fully utilize prior language knowledge. In this paper, we propose an Apt framework for acquiring knowledge from pre-trained model to NMT. The proposed approach includes two modules: 1). a dynamic fusion mechanism to fuse task-specific features adapted from general knowledge into NMT network, 2). a knowledge distillation paradigm to learn language knowledge continuously during the NMT training process. The proposed approach could integrate suitable knowledge from pre-trained models to improve the NMT. Experimental results on WMT English to German, German to English and Chinese to English machine translation tasks show that our model outperforms strong baselines and the fine-tuning counterparts.


2019 ◽  
Vol 31 (11) ◽  
pp. 2266-2291 ◽  
Author(s):  
Xin Yao ◽  
Tianchi Huang ◽  
Chenglei Wu ◽  
Rui-Xiao Zhang ◽  
Lifeng Sun

Humans are able to master a variety of knowledge and skills with ongoing learning. By contrast, dramatic performance degradation is observed when new tasks are added to an existing neural network model. This phenomenon, termed catastrophic forgetting, is one of the major roadblocks that prevent deep neural networks from achieving human-level artificial intelligence. Several research efforts (e.g., lifelong or continual learning algorithms) have proposed to tackle this problem. However, they either suffer from an accumulating drop in performance as the task sequence grows longer, or require storing an excessive number of model parameters for historical memory, or cannot obtain competitive performance on the new tasks. In this letter, we focus on the incremental multitask image classification scenario. Inspired by the learning process of students, who usually decompose complex tasks into easier goals, we propose an adversarial feature alignment method to avoid catastrophic forgetting. In our design, both the low-level visual features and high-level semantic features serve as soft targets and guide the training process in multiple stages, which provide sufficient supervised information of the old tasks and help to reduce forgetting. Due to the knowledge distillation and regularization phenomena, the proposed method gains even better performance than fine-tuning on the new tasks, which makes it stand out from other methods. Extensive experiments in several typical lifelong learning scenarios demonstrate that our method outperforms the state-of-the-art methods in both accuracy on new tasks and performance preservation on old tasks.


Author(s):  
Gudavalli Sai Abhilash ◽  
Kantheti Rajesh ◽  
Jangam Dileep Shaleem ◽  
Grandi Sai Sarath ◽  
Palli R Krishna Prasad

The creation and deployment of face recognition models need to identify low-resolution faces with extremely low computational cost. To address this problem, a feasible solution is compressing a complex face model to achieve higher speed and lower memory at the cost of minimal performance drop. Inspired by that, this paper proposes a learning approach to recognize low-resolution faces via selective knowledge distillation in live video. In this approach, a two-stream convolution neural network (CNN) is first initialized to recognize high-resolution faces and resolution-degraded faces with a teacher stream and a student stream, respectively. The teacher stream is represented by a complex CNN for high-accuracy recognition, and the student stream is represented by a much simpler CNN for low-complexity recognition. To avoid significant performance drop at the student stream, we then selectively distil the most informative facial features from the teacher stream by solving a sparse graph optimization problem, which are then used to regularize the fine- tuning process of the student stream. In this way, the student stream is actually trained by simultaneously handling two tasks with limited computational resources approximating the most informative facial cues via feature regression, and recovering the missing facial cues via low-resolution face classification.


2020 ◽  
Vol 34 (05) ◽  
pp. 9233-9241
Author(s):  
Yong Wang ◽  
Longyue Wang ◽  
Shuming Shi ◽  
Victor O.K. Li ◽  
Zhaopeng Tu

The key challenge of multi-domain translation lies in simultaneously encoding both the general knowledge shared across domains and the particular knowledge distinctive to each domain in a unified model. Previous work shows that the standard neural machine translation (NMT) model, trained on mixed-domain data, generally captures the general knowledge, but misses the domain-specific knowledge. In response to this problem, we augment NMT model with additional domain transformation networks to transform the general representations to domain-specific representations, which are subsequently fed to the NMT decoder. To guarantee the knowledge transformation, we also propose two complementary supervision signals by leveraging the power of knowledge distillation and adversarial learning. Experimental results on several language pairs, covering both balanced and unbalanced multi-domain translation, demonstrate the effectiveness and universality of the proposed approach. Encouragingly, the proposed unified model achieves comparable results with the fine-tuning approach that requires multiple models to preserve the particular knowledge. Further analyses reveal that the domain transformation networks successfully capture the domain-specific knowledge as expected.1


ASHA Leader ◽  
2017 ◽  
Vol 22 (6) ◽  
Author(s):  
Christi Miller
Keyword(s):  

2012 ◽  
Vol 82 (3) ◽  
pp. 216-222 ◽  
Author(s):  
Venkatesh Iyengar ◽  
Ibrahim Elmadfa

The food safety security (FSS) concept is perceived as an early warning system for minimizing food safety (FS) breaches, and it functions in conjunction with existing FS measures. Essentially, the function of FS and FSS measures can be visualized in two parts: (i) the FS preventive measures as actions taken at the stem level, and (ii) the FSS interventions as actions taken at the root level, to enhance the impact of the implemented safety steps. In practice, along with FS, FSS also draws its support from (i) legislative directives and regulatory measures for enforcing verifiable, timely, and effective compliance; (ii) measurement systems in place for sustained quality assurance; and (iii) shared responsibility to ensure cohesion among all the stakeholders namely, policy makers, regulators, food producers, processors and distributors, and consumers. However, the functional framework of FSS differs from that of FS by way of: (i) retooling the vulnerable segments of the preventive features of existing FS measures; (ii) fine-tuning response systems to efficiently preempt the FS breaches; (iii) building a long-term nutrient and toxicant surveillance network based on validated measurement systems functioning in real time; (iv) focusing on crisp, clear, and correct communication that resonates among all the stakeholders; and (v) developing inter-disciplinary human resources to meet ever-increasing FS challenges. Important determinants of FSS include: (i) strengthening international dialogue for refining regulatory reforms and addressing emerging risks; (ii) developing innovative and strategic action points for intervention {in addition to Hazard Analysis and Critical Control Points (HACCP) procedures]; and (iii) introducing additional science-based tools such as metrology-based measurement systems.


Sign in / Sign up

Export Citation Format

Share Document