Efficient Active Learning by Querying Discriminative and Representative Samples and Fully Exploiting Unlabeled Data

Author(s):  
Bin Gu ◽  
Zhou Zhai ◽  
Cheng Deng ◽  
Heng Huang
Entropy ◽  
2019 ◽  
Vol 21 (10) ◽  
pp. 988 ◽  
Author(s):  
Fazakis ◽  
Kanas ◽  
Aridas ◽  
Karlos ◽  
Kotsiantis

One of the major aspects affecting the performance of the classification algorithms is the amount of labeled data which is available during the training phase. It is widely accepted that the labeling procedure of vast amounts of data is both expensive and time-consuming since it requires the employment of human expertise. For a wide variety of scientific fields, unlabeled examples are easy to collect but hard to handle in a useful manner, thus improving the contained information for a subject dataset. In this context, a variety of learning methods have been studied in the literature aiming to efficiently utilize the vast amounts of unlabeled data during the learning process. The most common approaches tackle problems of this kind by individually applying active learning or semi-supervised learning methods. In this work, a combination of active learning and semi-supervised learning methods is proposed, under a common self-training scheme, in order to efficiently utilize the available unlabeled data. The effective and robust metrics of the entropy and the distribution of probabilities of the unlabeled set, to select the most sufficient unlabeled examples for the augmentation of the initial labeled set, are used. The superiority of the proposed scheme is validated by comparing it against the base approaches of supervised, semi-supervised, and active learning in the wide range of fifty-five benchmark datasets.


2011 ◽  
Vol 5 (3) ◽  
pp. 618-628 ◽  
Author(s):  
Wei Di ◽  
Melba M. Crawford

A novel co-regularization framework for active learning is proposed for hyperspectral image classification. The first regularizer explores the intrinsic multi-view information embedded in the hyperspectral data. By adaptively and quantitatively measuring the disagreement level, it focuses only on samples with high uncertainty and builds a contention pool which is a small subset of the overall unlabeled data pool, thereby mitigating the computational cost. The second regularizer is based on the “consistency assumption” and designed on a spatial or the spectral based manifold space. It serves to further focus on the most informative samples within the contention pool by penalizing rapid changes in the classification function evaluated on proximally close samples in a local region. Such changes may be due to the lack of capability of the current learner to describe the unlabeled data. Incorporating manifold learning into the active learning process enforces the clustering assumption and avoids the degradation of the distance measure associated with the original high-dimensional spectral features. One spatial and two local spectral embedding methods are considered in this study, in conjunction with the support vector machine (SVM) classifier implemented with a radial basis function (RBF) kernel. Experiments show excellent performance on AVIRIS and Hyperion hyperspectral data as compared to random sampling and the state-of-the-art SVMSIMPLE.


Author(s):  
Alireza Ghasemi ◽  
Hamid R. Rabiee ◽  
Mohsen Fadaee ◽  
Mohammad T. Manzuri ◽  
Mohammad H. Rohban

Author(s):  
Changsheng Li ◽  
Handong Ma ◽  
Zhao Kang ◽  
Ye Yuan ◽  
Xiao-Yu Zhang ◽  
...  

Unsupervised active learning has attracted increasing attention in recent years, where its goal is to select representative samples in an unsupervised setting for human annotating. Most existing works are based on shallow linear models by assuming that each sample can be well approximated by the span (i.e., the set of all linear combinations) of certain selected samples, and then take these selected samples as representative ones to label. However, in practice, the data do not necessarily conform to linear models, and how to model nonlinearity of data often becomes the key point to success. In this paper, we present a novel Deep neural network framework for Unsupervised Active Learning, called DUAL. DUAL can explicitly learn a nonlinear embedding to map each input into a latent space through an encoder-decoder architecture, and introduce a selection block to select representative samples in the the learnt latent space. In the selection block, DUAL considers to simultaneously preserve the whole input patterns as well as the cluster structure of data. Extensive experiments are performed on six publicly available datasets, and experimental results clearly demonstrate the efficacy of our method, compared with state-of-the-arts.


Sign in / Sign up

Export Citation Format

Share Document