Semi-Supervised Learning under Class Distribution Mismatch

Yanbei Chen; Xiatian Zhu; Wei Li; Shaogang Gong

doi:10.1609/aaai.v34i04.5763

Semi-Supervised Learning under Class Distribution Mismatch

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5763 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3569-3576

Author(s):

Yanbei Chen ◽

Xiatian Zhu ◽

Wei Li ◽

Shaogang Gong

Keyword(s):

Image Classification ◽

Supervised Learning ◽

Error Propagation ◽

State Of The Art ◽

Training Data ◽

Class Distribution ◽

Unlabelled Data ◽

Popular Image ◽

Soft Targets ◽

Novel Algorithm

Semi-supervised learning (SSL) aims to avoid the need for collecting prohibitively expensive labelled training data. Whilst demonstrating impressive performance boost, existing SSL methods artificially assume that small labelled data and large unlabelled data are drawn from the same class distribution. In a more realistic scenario with class distribution mismatch between the two sets, they often suffer severe performance degradation due to error propagation introduced by irrelevant unlabelled samples. Our work addresses this under-studied and realistic SSL problem by a novel algorithm named Uncertainty-Aware Self-Distillation (UASD). Specifically, UASD produces soft targets that avoid catastrophic error propagation, and empower learning effectively from unconstrained unlabelled data with out-of-distribution (OOD) samples. This is based on joint Self-Distillation and OOD filtering in a unified formulation. Without bells and whistles, UASD significantly outperforms six state-of-the-art methods in more realistic SSL under class distribution mismatch on three popular image classification datasets: CIFAR10, CIFAR100, and TinyImageNet.

Download Full-text

Improving Semi-Supervised Learning for Audio Classification with FixMatch

Electronics ◽

10.3390/electronics10151807 ◽

2021 ◽

Vol 10 (15) ◽

pp. 1807

Author(s):

Sascha Grollmisch ◽

Estefanía Cano

Keyword(s):

Neural Networks ◽

Supervised Learning ◽

Transfer Learning ◽

Data Transfer ◽

State Of The Art ◽

Training Data ◽

Audio Classification ◽

Image Domain ◽

Full Dataset ◽

Audio Data

Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement.

Download Full-text

Improved CCG Parsing with Semi-supervised Supertagging

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00186 ◽

2014 ◽

Vol 2 ◽

pp. 327-338 ◽

Cited By ~ 7

Author(s):

Mike Lewis ◽

Mark Steedman

Keyword(s):

Wall Street Journal ◽

State Of The Art ◽

Training Data ◽

Word Embeddings ◽

Important Goal ◽

Wall Street ◽

Feature Sets ◽

Lexical Categories ◽

Unlabelled Data ◽

Pos Tagger

Current supervised parsers are limited by the size of their labelled training data, making improving them with unlabelled data an important goal. We show how a state-of-the-art CCG parser can be enhanced, by predicting lexical categories using unsupervised vector-space embeddings of words. The use of word embeddings enables our model to better generalize from the labelled data, and allows us to accurately assign lexical categories without depending on a POS-tagger. Our approach leads to substantial improvements in dependency parsing results over the standard supervised CCG parser when evaluated on Wall Street Journal (0.8%), Wikipedia (1.8%) and biomedical (3.4%) text. We compare the performance of two recently proposed approaches for classification using a wide variety of word embeddings. We also give a detailed error analysis demonstrating where using embeddings outperforms traditional feature sets, and showing how including POS features can decrease accuracy.

Download Full-text

Improved semi-supervised learning technique for automatic detection of South African abusive language on Twitter

South African Computer Journal ◽

10.18489/sacj.v32i2.847 ◽

2020 ◽

Vol 32 (2) ◽

Author(s):

Oluwafemi Oriola ◽

Eduan Kotzé

Keyword(s):

Logistic Regression ◽

Supervised Learning ◽

South African ◽

Learning Curves ◽

Training Data ◽

Support Vector ◽

Learning Techniques ◽

Learning Technique ◽

Unlabelled Data ◽

Language Detection

Semi-supervised learning is a potential solution for improving training data in low-resourced abusive language detection contexts such as South African abusive language detection on Twitter. However, the existing semi-supervised learning methods have been skewed towards small amounts of labelled data, with small feature space. This paper, therefore, presents a semi-supervised learning technique that improves the distribution of training data by assigning labels to unlabelled data based on the majority voting over different feature sets of labelled and unlabelled data clusters. The technique is applied to South African English corpora consisting of labelled and unlabelled abusive tweets. The proposed technique is compared with state-of-the-art self-learning and active learning techniques based on syntactic and semantic features. The performance of these techniques with Logistic Regression, Support Vector Machine and Neural Networks are evaluated. The proposed technique, with accuracy and F1-score of 0.97 and 0.95, respectively, outperforms existing semi-supervised learning techniques. The learning curves show that the training data was used more efficiently by the proposed technique compared to existing techniques. Overall, n-gram syntactic features with a Logistic Regression classifier records the highest performance. The paper concludes that the proposed semi-supervised learning technique effectively detected implicit and explicit South African abusive language on Twitter.

Download Full-text

Teaching Semi-Supervised Classifier via Generalized Distillation

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/298 ◽

2018 ◽

Cited By ~ 2

Author(s):

Chen Gong ◽

Xiaojun Chang ◽

Meng Fang ◽

Jian Yang

Keyword(s):

Supervised Learning ◽

Error Bounds ◽

State Of The Art ◽

Training Data ◽

Training Process ◽

Rademacher Complexity ◽

Optimization Framework ◽

Specific Teaching ◽

Teaching Function ◽

Intelligent Teacher

Semi-Supervised Learning (SSL) is able to build reliable classifier with very scarce labeled examples by properly utilizing the abundant unlabeled examples. However, existing SSL algorithms often yield unsatisfactory performance due to the lack of supervision information. To address this issue, this paper formulates SSL as a Generalized Distillation (GD) problem, which treats existing SSL algorithm as a learner and introduces a teacher to guide the learner?s training process. Specifically, the intelligent teacher holds the privileged knowledge that ?explains? the training data but remains unknown to the learner, and the teacher should convey its rich knowledge to the imperfect learner through a specific teaching function. After that, the learner gains knowledge by ?imitating? the output of the teaching function under an optimization framework. Therefore, the learner in our algorithm learns from both the teacher and the training data, so its output can be substantially distilled and enhanced. By deriving the Rademacher complexity and error bounds of the proposed algorithm, the usefulness of the introduced teacher is theoretically demonstrated. The superiority of our algorithm to the related state-of-the-art methods has also been empirically demonstrated by the experiments on different datasets with various sources of privileged knowledge.

Download Full-text

Balanced Linear Contextual Bandits

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013445 ◽

2019 ◽

Vol 33 ◽

pp. 3445-3453 ◽

Cited By ~ 1

Author(s):

Maria Dimakopoulou ◽

Zhengyuan Zhou ◽

Susan Athey ◽

Guido Imbens

Keyword(s):

Supervised Learning ◽

State Of The Art ◽

Model Misspecification ◽

Estimation Method ◽

Training Data ◽

Estimation Bias ◽

Initial Training ◽

Practical Advantage ◽

Estimation Problems ◽

Regret Bound

Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning. We develop algorithms for contextual bandits with linear payoffs that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation bias. We provide the first regret bound analyses for linear contextual bandits with balancing and show that our algorithms match the state of the art theoretical guarantees. We demonstrate the strong practical advantage of balanced contextual bandits on a large number of supervised learning datasets and on a synthetic example that simulates model misspecification and prejudice in the initial training data.

Download Full-text

Cross-Relation Cross-Bag Attention for Distantly-Supervised Relation Extraction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301419 ◽

2019 ◽

Vol 33 ◽

pp. 419-426 ◽

Cited By ~ 6

Author(s):

Yujin Yuan ◽

Liyuan Liu ◽

Siliang Tang ◽

Zhongfei Zhang ◽

Yueting Zhuang ◽

...

Keyword(s):

Selective Attention ◽

Supervised Learning ◽

State Of The Art ◽

Relation Extraction ◽

Knowledge Bases ◽

Training Data ◽

Distant Supervision ◽

Sentence Level ◽

Noise Robust

Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor performances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selective Attention (C2SA), which leads to noise-robust training for distant supervised relation extractor. Specifically, we employ the sentence-level selective attention to reduce the effect of noisy or mismatched sentences, while the correlation among relations were captured to improve the quality of attention weights. Moreover, instead of treating all entity-pairs equally, we try to pay more attention to entity-pairs with a higher quality. Similarly, we adopt the selective attention mechanism to achieve this goal. Experiments with two types of relation extractor demonstrate the superiority of the proposed approach over the state-of-the-art, while further ablation studies verify our intuitions and demonstrate the effectiveness of our proposed two techniques.

Download Full-text

A Two-Stream Mutual Attention Network for Semi-Supervised Biomedical Segmentation with Noisy Labels

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014578 ◽

2019 ◽

Vol 33 ◽

pp. 4578-4585 ◽

Cited By ~ 9

Author(s):

Shaobo Min ◽

Xuejin Chen ◽

Zheng-Jun Zha ◽

Feng Wu ◽

Yongdong Zhang

Keyword(s):

Supervised Learning ◽

State Of The Art ◽

Training Data ◽

Attention Network ◽

Attention Model ◽

Learning Framework ◽

Propagation Analysis ◽

Supervised Methods ◽

Multi Level ◽

Noisy Labels

Learning-based methods suffer from a deficiency of clean annotations, especially in biomedical segmentation. Although many semi-supervised methods have been proposed to provide extra training data, automatically generated labels are usually too noisy to retrain models effectively. In this paper, we propose a Two-Stream Mutual Attention Network (TSMAN) that weakens the influence of back-propagated gradients caused by incorrect labels, thereby rendering the network robust to unclean data. The proposed TSMAN consists of two sub-networks that are connected by three types of attention models in different layers. The target of each attention model is to indicate potentially incorrect gradients in a certain layer for both sub-networks by analyzing their inferred features using the same input. In order to achieve this purpose, the attention models are designed based on the propagation analysis of noisy gradients at different layers. This allows the attention models to effectively discover incorrect labels and weaken their influence during parameter updating process. By exchanging multi-level features within two-stream architecture, the effects of noisy labels in each sub-network are reduced by decreasing the noisy gradients. Furthermore, a hierarchical distillation is developed to provide reliable pseudo labels for unlabelded data, which further boosts the performance of TSMAN. The experiments using both HVSMR 2016 and BRATS 2015 benchmarks demonstrate that our semi-supervised learning framework surpasses the state-of-the-art fully-supervised results.

Download Full-text

A Semi-Supervised Active Learning Framework for Image Classification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.556-562.4765 ◽

2014 ◽

Vol 556-562 ◽

pp. 4765-4769

Author(s):

Han Yi Li ◽

Ming Yang ◽

Nan Nan Kang ◽

Lu Lu Yue

Keyword(s):

Active Learning ◽

Image Classification ◽

Supervised Learning ◽

The Other ◽

Training Data ◽

Classification Error ◽

Classification Algorithms ◽

Learning Framework ◽

User Query ◽

Changing Rate

In this paper, a novel image classification method, incorporating active learning and semi-supervised learning (SSL), is proposed. In this method, two classifiers are needed where one is trained by labeled data and some unlabeled data, while the other one is trained only by labeled data. Specifically, in each round, two classifiers iterate to select useful examples in contention for user query. Then we compute the label changing rate for every unlabeled example in each classifier. Those examples in which the label changing rate is zero and the label in the two classifiers is the same are selected to add into the training data of the first classifier. Our experimental results show that our method significantly reduced the need of labeled examples, while at the same time reducing classification error compared with widely used image classification algorithms.

Download Full-text

Active semi-supervised framework with data editing

Computer Science and Information Systems ◽

10.2298/csis120202045z ◽

2012 ◽

Vol 9 (4) ◽

pp. 1513-1532 ◽

Cited By ~ 4

Author(s):

Xue Zhang ◽

Wangxin Xiao

Keyword(s):

Active Learning ◽

Supervised Learning ◽

Text Classification ◽

State Of The Art ◽

The Self ◽

Training Data ◽

Data Sets ◽

Text Data ◽

Data Editing ◽

Data Problem

In order to address the insufficient training data problem, many active semi-supervised algorithms have been proposed. The self-labeled training data in semi-supervised learning may contain much noise due to the insufficient training data. Such noise may snowball themselves in the following learning process and thus hurt the generalization ability of the final hypothesis. Extremely few labeled training data in sparsely labeled text classification aggravate such situation. If such noise could be identified and removed by some strategy, the performance of the active semi-supervised algorithms should be improved. However, such useful techniques of identifying and removing noise have been seldom explored in existing active semi-supervised algorithms. In this paper, we propose an active semi-supervised framework with data editing (we call it ASSDE) to improve sparsely labeled text classification. A data editing technique is used to identify and remove noise introduced by semi-supervised labeling. We carry out the data editing technique by fully utilizing the advantage of active learning, which is novel according to our knowledge. The fusion of active learning with data editing makes ASSDE more robust to the sparsity and the distribution bias of the training data. It further simplifies the design of semi-supervised learning which makes ASSDE more efficient. Extensive experimental study on several real-world text data sets shows the encouraging results of the proposed framework for sparsely labeled text classification, compared with several state-of-the-art methods.

Download Full-text

Semi-supervised learning for medical image classification using imbalanced training data

Computer Methods and Programs in Biomedicine ◽

10.1016/j.cmpb.2022.106628 ◽

2022 ◽

pp. 106628

Author(s):

Tri Huynh ◽

Aiden Nibali ◽

Zhen He

Keyword(s):

Image Classification ◽

Supervised Learning ◽

Medical Image ◽

Training Data ◽

Medical Image Classification ◽

Imbalanced Training Data

Download Full-text