Lifelong Zero-Shot Learning

Zero-Shot Learning (ZSL) handles the problem that some testing classes never appear in training set. Existing ZSL methods are designed for learning from a fixed training set, which do not have the ability to capture and accumulate the knowledge of multiple training sets, causing them infeasible to many real-world applications. In this paper, we propose a new ZSL setting, named as Lifelong Zero-Shot Learning (LZSL), which aims to accumulate the knowledge during the learning from multiple datasets and recognize unseen classes of all trained datasets. Besides, a novel method is conducted to realize LZSL, which effectively alleviates the Catastrophic Forgetting in the continuous training process. Specifically, considering those datasets containing different semantic embeddings, we utilize Variational Auto-Encoder to obtain unified semantic representations. Then, we leverage selective retraining strategy to preserve the trained weights of previous tasks and avoid negative transfer when fine-tuning the entire model. Finally, knowledge distillation is employed to transfer knowledge from previous training stages to current stage. We also design the LZSL evaluation protocol and the challenging benchmarks. Extensive experiments on these benchmarks indicate that our method tackles LZSL problem effectively, while existing ZSL methods fail.

Download Full-text

Embedding Undersampling Rotation Forest for Imbalanced Problem

Computational Intelligence and Neuroscience ◽

10.1155/2018/6798042 ◽

2018 ◽

Vol 2018 ◽

pp. 1-15 ◽

Cited By ~ 3

Author(s):

Huaping Guo ◽

Xiaoyu Diao ◽

Hongbing Liu

Keyword(s):

Imbalanced Data ◽

Feature Space ◽

Original Data ◽

Training Set ◽

Data Set ◽

Minority Class ◽

Rotation Forest ◽

Novel Method ◽

Individual Classifier ◽

The Cost

Rotation Forest is an ensemble learning approach achieving better performance comparing to Bagging and Boosting through building accurate and diverse classifiers using rotated feature space. However, like other conventional classifiers, Rotation Forest does not work well on the imbalanced data which are characterized as having much less examples of one class (minority class) than the other (majority class), and the cost of misclassifying minority class examples is often much more expensive than the contrary cases. This paper proposes a novel method called Embedding Undersampling Rotation Forest (EURF) to handle this problem (1) sampling subsets from the majority class and learning a projection matrix from each subset and (2) obtaining training sets by projecting re-undersampling subsets of the original data set to new spaces defined by the matrices and constructing an individual classifier from each training set. For the first method, undersampling is to force the rotation matrix to better capture the features of the minority class without harming the diversity between individual classifiers. With respect to the second method, the undersampling technique aims to improve the performance of individual classifiers on the minority class. The experimental results show that EURF achieves significantly better performance comparing to other state-of-the-art methods.

Download Full-text

Hierarchical Classification Based on Label Distribution Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015533 ◽

2019 ◽

Vol 33 ◽

pp. 5533-5540 ◽

Cited By ~ 3

Author(s):

Changdong Xu ◽

Xin Geng

Keyword(s):

Hierarchical Classification ◽

Experimental Results ◽

Challenging Problem ◽

Training Set ◽

True Label ◽

Major Bottleneck ◽

Novel Method ◽

Class Labels ◽

Label Distribution ◽

The Given

Hierarchical classification is a challenging problem where the class labels are organized in a predefined hierarchy. One primary challenge in hierarchical classification is the small training set issue of the local module. The local classifiers in the previous hierarchical classification approaches are prone to over-fitting, which becomes a major bottleneck of hierarchical classification. Fortunately, the labels in the local module are correlated, and the siblings of the true label can provide additional supervision information for the instance. This paper proposes a novel method to deal with the small training set issue. The key idea of the method is to represent the correlation among the labels by the label distribution. It generates a label distribution that contains the supervision information of each label for the given instance, and then learns a mapping from the instance to the label distribution. Experimental results on several hierarchical classification datasets show that our method significantly outperforms other state-of-theart hierarchical classification approaches.

Download Full-text

Superpixel-Guided Layer-Wise Embedding CNN for Remote Sensing Image Classification

Remote Sensing ◽

10.3390/rs11020174 ◽

2019 ◽

Vol 11 (2) ◽

pp. 174 ◽

Cited By ~ 4

Author(s):

Han Liu ◽

Jun Li ◽

Lin He ◽

Yu Wang

Keyword(s):

Remote Sensing ◽

Image Classification ◽

Remote Sensing Data ◽

Sampling Strategy ◽

Remote Sensing Image ◽

Fine Tuning ◽

Spatial Dependency ◽

Remote Sensing Images ◽

Training Set ◽

Remote Sensing Image Classification

Irregular spatial dependency is one of the major characteristics of remote sensing images, which brings about challenges for classification tasks. Deep supervised models such as convolutional neural networks (CNNs) have shown great capacity for remote sensing image classification. However, they generally require a huge labeled training set for the fine tuning of a deep neural network. To handle the irregular spatial dependency of remote sensing images and mitigate the conflict between limited labeled samples and training demand, we design a superpixel-guided layer-wise embedding CNN (SLE-CNN) for remote sensing image classification, which can efficiently exploit the information from both labeled and unlabeled samples. With the superpixel-guided sampling strategy for unlabeled samples, we can achieve an automatic determination of the neighborhood covering for a spatial dependency system and thus adapting to real scenes of remote sensing images. In the designed network, two types of loss costs are combined for the training of CNN, i.e., supervised cross entropy and unsupervised reconstruction cost on both labeled and unlabeled samples, respectively. Our experimental results are conducted with three types of remote sensing data, including hyperspectral, multispectral, and synthetic aperture radar (SAR) images. The designed SLE-CNN achieves excellent classification performance in all cases with a limited labeled training set, suggesting its good potential for remote sensing image classification.

Download Full-text

Knowledge Distillation from Bert in Pre-Training and Fine-Tuning for Polyphone Disambiguation

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) ◽

10.1109/asru46091.2019.9003918 ◽

2019 ◽

Author(s):

Hao Sun ◽

Xu Tan ◽

Jun-Wei Gan ◽

Sheng Zhao ◽

Dongxu Han ◽

...

Keyword(s):

Fine Tuning ◽

Knowledge Distillation

Download Full-text

Transfer Learning and Deep Domain Adaptation

Advances and Applications in Deep Learning ◽

10.5772/intechopen.94072 ◽

2020 ◽

Author(s):

Wen Xu ◽

Jing He ◽

Yanfeng Shu

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Transfer Learning ◽

Real World ◽

Deep Neural Networks ◽

Domain Adaptation ◽

Fine Tuning ◽

Real World Applications ◽

Comprehensive Survey ◽

Sample Reconstruction

Transfer learning is an emerging technique in machine learning, by which we can solve a new task with the knowledge obtained from an old task in order to address the lack of labeled data. In particular deep domain adaptation (a branch of transfer learning) gets the most attention in recently published articles. The intuition behind this is that deep neural networks usually have a large capacity to learn representation from one dataset and part of the information can be further used for a new task. In this research, we firstly present the complete scenarios of transfer learning according to the domains and tasks. Secondly, we conduct a comprehensive survey related to deep domain adaptation and categorize the recent advances into three types based on implementing approaches: fine-tuning networks, adversarial domain adaptation, and sample-reconstruction approaches. Thirdly, we discuss the details of these methods and introduce some typical real-world applications. Finally, we conclude our work and explore some potential issues to be further addressed.

Download Full-text

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6409 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8815-8821 ◽

Cited By ~ 3

Author(s):

Sheng Shen ◽

Zhen Dong ◽

Jiayu Ye ◽

Linjian Ma ◽

Zhewei Yao ◽

...

Keyword(s):

Language Processing ◽

Fine Tuning ◽

Model Parameters ◽

Quantization Scheme ◽

Performance Loss ◽

Extensive Analysis ◽

Constrained Environments ◽

Comparable Performance ◽

Novel Method ◽

Tuning Strategy

Transformer based architectures have become de-facto models used for a range of Natural Language Processing tasks. In particular, the BERT based models achieved significant accuracy gain for GLUE tasks, CoNLL-03 and SQuAD. However, BERT based models have a prohibitive memory footprint and latency. As a result, deploying BERT based models in resource constrained environments has become a challenging task. In this work, we perform an extensive analysis of fine-tuned BERT models using second order Hessian information, and we use our results to propose a novel method for quantizing BERT models to ultra low precision. In particular, we propose a new group-wise quantization scheme, and we use Hessian-based mix-precision method to compress the model further. We extensively test our proposed method on BERT downstream tasks of SST-2, MNLI, CoNLL-03, and SQuAD. We can achieve comparable performance to baseline with at most 2.3% performance degradation, even with ultra-low precision quantization down to 2 bits, corresponding up to 13× compression of the model parameters, and up to 4× compression of the embedding table as well as activations. Among all tasks, we observed the highest performance loss for BERT fine-tuned on SQuAD. By probing into the Hessian based analysis as well as visualization, we show that this is related to the fact that current training/fine-tuning strategy of BERT does not converge for SQuAD.

Download Full-text

Weighted Channel Dropout for Regularization of Deep Convolutional Neural Network

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018425 ◽

2019 ◽

Vol 33 ◽

pp. 8425-8432 ◽

Cited By ~ 7

Author(s):

Saihui Hou ◽

Zilei Wang

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Random Number ◽

Random Number Generator ◽

Test Phase ◽

Deep Convolutional Neural Network ◽

Multiple Datasets ◽

Novel Method ◽

Fully Connected ◽

Activation Status

In this work, we propose a novel method named Weighted Channel Dropout (WCD) for the regularization of deep Convolutional Neural Network (CNN). Different from Dropout which randomly selects the neurons to set to zero in the fully-connected layers, WCD operates on the channels in the stack of convolutional layers. Specifically, WCD consists of two steps, i.e., Rating Channels and Selecting Channels, and three modules, i.e., Global Average Pooling, Weighted Random Selection and Random Number Generator. It filters the channels according to their activation status and can be plugged into any two consecutive layers, which unifies the original Dropout and Channel-Wise Dropout. WCD is totally parameter-free and deployed only in training phase with very slight computation cost. The network in test phase remains unchanged and thus the inference cost is not added at all. Besides, when combining with the existing networks, it requires no re-pretraining on ImageNet and thus is well-suited for the application on small datasets. Finally, WCD with VGGNet-16, ResNet-101, Inception-V3 are experimentally evaluated on multiple datasets. The extensive results demonstrate that WCD can bring consistent improvements over the baselines.

Download Full-text

Knowledge Distillation Inspired Fine-Tuning Of Tucker Decomposed CNNS and Adversarial Robustness Analysis

2020 IEEE International Conference on Image Processing (ICIP) ◽

10.1109/icip40778.2020.9190672 ◽

2020 ◽

Author(s):

Ranajoy Sadhukhan ◽

Avinab Saha ◽

Jayanta Mukhopadhyay ◽

Amit Patra

Keyword(s):

Robustness Analysis ◽

Fine Tuning ◽

Knowledge Distillation

Download Full-text

A Novel Method of Predicting Protein Disordered Regions Based on Sequence Features

BioMed Research International ◽

10.1155/2013/414327 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 5

Author(s):

Tong-Hui Zhao ◽

Min Jiang ◽

Tao Huang ◽

Bi-Qing Li ◽

Ning Zhang ◽

...

Keyword(s):

Conservation Score ◽

Query Sequence ◽

Disordered Proteins ◽

Training Set ◽

Disordered Structures ◽

Feature List ◽

Novel Method ◽

Scoring Matrix ◽

Fold Cross Validation ◽

Disordered Regions

With a large number of disordered proteins and their important functions discovered, it is highly desired to develop effective methods to computationally predict protein disordered regions. In this study, based on Random Forest (RF), Maximum Relevancy Minimum Redundancy (mRMR), and Incremental Feature Selection (IFS), we developed a new method to predict disordered regions in proteins. The mRMR criterion was used to rank the importance of all candidate features. Finally, top 128 features were selected from the ranked feature list to build the optimal model, including 92 Position Specific Scoring Matrix (PSSM) conservation score features and 36 secondary structure features. As a result, Matthews correlation coefficient (MCC) of 0.3895 was achieved on the training set by 10-fold cross-validation. On the basis of predicting results for each query sequence by using the method, we used the scanning and modification strategy to improve the performance. The accuracy (ACC) and MCC were increased by 4% and almost 0.2%, respectively, compared with other three popular predictors: DISOPRED, DISOclust, and OnD-CRF. The selected features may shed some light on the understanding of the formation mechanism of disordered structures, providing guidelines for experimental validation.

Download Full-text

Targeted Gradient Descent: A Novel Method for Convolutional Neural Networks Fine-Tuning and Online-Learning

10.1007/978-3-030-87199-4_3 ◽

2021 ◽

pp. 25-35

Author(s):

Junyu Chen ◽

Evren Asma ◽

Chung Chan

Keyword(s):

Neural Networks ◽

Online Learning ◽

Convolutional Neural Networks ◽

Gradient Descent ◽

Fine Tuning ◽

Novel Method

Download Full-text