On the Complexity of Learning a Class Ratio from Unlabeled Data

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12013 ◽

2020 ◽

Vol 69 ◽

Author(s):

Benjamin Fish ◽

Lev Reyzin

Keyword(s):

Computational Complexity ◽

Unlabeled Data ◽

Training Data ◽

Pac Learning ◽

Vc Dimension ◽

Standard Set

In the problem of learning a class ratio from unlabeled data, which we call CR learning, the training data is unlabeled, and only the ratios, or proportions, of examples receiving each label are given. The goal is to learn a hypothesis that predicts the proportions of labels on the distribution underlying the sample. This model of learning is applicable to a wide variety of settings, including predicting the number of votes for candidates in political elections from polls. In this paper, we formally define this class and resolve foundational questions regarding the computational complexity of CR learning and characterize its relationship to PAC learning. Among our results, we show, perhaps surprisingly, that for finite VC classes what can be efficiently CR learned is a strict subset of what can be learned efficiently in PAC, under standard complexity assumptions. We also show that there exist classes of functions whose CR learnability is independent of ZFC, the standard set theoretic axioms. This implies that CR learning cannot be easily characterized (like PAC by VC dimension).

Download Full-text

On the Complexity of Learning from Label Proportions

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/232 ◽

2017 ◽

Cited By ~ 5

Author(s):

Benjamin Fish ◽

Lev Reyzin

Keyword(s):

Computational Complexity ◽

Training Data ◽

Pac Learning ◽

Vc Dimension ◽

Simple Version ◽

Learning From Label Proportions

In the problem of learning with label proportions (also known as the problem of estimating class ratios), the training data is unlabeled, and only the proportions of examples receiving each label are given. The goal is to learn a hypothesis that predicts the proportions of labels on the distribution underlying the sample. This model of learning is useful in a wide variety of settings, including predicting the number of votes for candidates in political elections from polls. In this paper, we resolve foundational questions regarding the computational complexity of learning in this setting. We formalize a simple version of the setting, and we compare the computational complexity of learning in this model to classical PAC learning. Perhaps surprisingly, we show that what can be learned efficiently in this model is a strict subset of what may be leaned efficiently in PAC, under standard complexity assumptions. We give a characterization in terms of VC dimension, and we show that there are non-trivial problems in this model that can be efficiently learned. We also give an algorithm that demonstrates the feasibility of learning under well-behaved distributions.

Download Full-text

Investigating the Potential of Network Optimization for a Constrained Object Detection Problem

Journal of Imaging ◽

10.3390/jimaging7040064 ◽

2021 ◽

Vol 7 (4) ◽

pp. 64

Author(s):

Tanguy Ophoff ◽

Cédric Gullentops ◽

Kristof Van Beeck ◽

Toon Goedemé

Keyword(s):

Computational Complexity ◽

Object Detection ◽

Network Optimization ◽

Real Life ◽

Optimization Techniques ◽

Training Data ◽

Single Shot ◽

Standard Object ◽

Number Of Classes

Object detection models are usually trained and evaluated on highly complicated, challenging academic datasets, which results in deep networks requiring lots of computations. However, a lot of operational use-cases consist of more constrained situations: they have a limited number of classes to be detected, less intra-class variance, less lighting and background variance, constrained or even fixed camera viewpoints, etc. In these cases, we hypothesize that smaller networks could be used without deteriorating the accuracy. However, there are multiple reasons why this does not happen in practice. Firstly, overparameterized networks tend to learn better, and secondly, transfer learning is usually used to reduce the necessary amount of training data. In this paper, we investigate how much we can reduce the computational complexity of a standard object detection network in such constrained object detection problems. As a case study, we focus on a well-known single-shot object detector, YoloV2, and combine three different techniques to reduce the computational complexity of the model without reducing its accuracy on our target dataset. To investigate the influence of the problem complexity, we compare two datasets: a prototypical academic (Pascal VOC) and a real-life operational (LWIR person detection) dataset. The three optimization steps we exploited are: swapping all the convolutions for depth-wise separable convolutions, perform pruning and use weight quantization. The results of our case study indeed substantiate our hypothesis that the more constrained a problem is, the more the network can be optimized. On the constrained operational dataset, combining these optimization techniques allowed us to reduce the computational complexity with a factor of 349, as compared to only a factor 9.8 on the academic dataset. When running a benchmark on an Nvidia Jetson AGX Xavier, our fastest model runs more than 15 times faster than the original YoloV2 model, whilst increasing the accuracy by 5% Average Precision (AP).

Download Full-text

Towards Accurate and Efficient Chinese Part-of-Speech Tagging

Computational Linguistics ◽

10.1162/coli_a_00253 ◽

2016 ◽

Vol 42 (3) ◽

pp. 391-419 ◽

Cited By ~ 4

Author(s):

Weiwei Sun ◽

Xiaojun Wan

Keyword(s):

Hybrid Systems ◽

Language Processing ◽

Large Scale ◽

Unlabeled Data ◽

Training Data ◽

Test Time ◽

System Combination ◽

Pos Tagging ◽

Hybrid Approaches ◽

Lexical Relations

From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical relations are implicitly captured by syntactic parsing in the constituency formalism, and are utilized via system combination. Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations. Our linguistically motivated, hybrid approaches yield a relative error reduction of 18% in total over state-of-the-art baselines. Despite the effectiveness to boost accuracy, computationally expensive parsers make hybrid systems inappropriate for many realistic NLP applications. In this article, we are also concerned with improving tagging efficiency at test time. In particular, we explore unlabeled data to transfer the predictive power of hybrid models to simple sequence models. Specifically, hybrid systems are utilized to create large-scale pseudo training data for cheap models. Experimental results illustrate that the re-compiled models not only achieve high accuracy with respect to per token classification, but also serve as a front-end to a parser well.

Download Full-text

Semi-Supervised Learning

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch192 ◽

2011 ◽

pp. 1022-1027

Author(s):

Tobias Scheffer

Keyword(s):

Supervised Learning ◽

Supervised Classification ◽

Unlabeled Data ◽

Training Data ◽

Classification Algorithms ◽

Classification Problems

For many classification problems, unlabeled training data are inexpensive and readily available, whereas labeling training data imposes costs. Semi-supervised classification algorithms aim at utilizing information contained in unlabeled data in addition to the (few) labeled data.

Download Full-text

Clustering-Based Transductive Semi-Supervised Learning for Learning-to-Rank

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001419510078 ◽

2019 ◽

Vol 33 (12) ◽

pp. 1951007 ◽

Cited By ~ 1

Author(s):

Ashwini Rahangdale ◽

Shital Raut

Keyword(s):

Supervised Learning ◽

Learning To Rank ◽

Cost Effective ◽

Unlabeled Data ◽

Training Data ◽

Density Region ◽

The Arts ◽

Proposed Model ◽

Supervised Learning Algorithms ◽

Multiple Loss

Learning-to-rank (LTR) is a very hot topic of research for information retrieval (IR). LTR framework usually learns the ranking function using available training data that are very cost-effective, time-consuming and biased. When sufficient amount of training data is not available, semi-supervised learning is one of the machine learning paradigms that can be applied to get pseudo label from unlabeled data. Cluster and label is a basic approach for semi-supervised learning to identify the high-density region in data space which is mainly used to support the supervised learning. However, clustering with conventional method may lead to prediction performance which is worse than supervised learning algorithms for application of LTR. Thus, we propose rank preserving clustering (RPC) with PLocalSearch and get pseudo label for unlabeled data. We present semi-supervised learning that adopts clustering-based transductive method and combine it with nonmeasure specific listwise approach to learn the LTR model. Moreover, each cluster follows the multi-task learning to avoid optimization of multiple loss functions. It reduces the training complexity of adopted listwise approach from an exponential order to a polynomial order. Empirical analysis on the standard datasets (LETOR) shows that the proposed model gives better results as compared to other state-of-the-arts.

Download Full-text

Time-Series Laplacian Semi-Supervised Learning for Indoor Localization †

Sensors ◽

10.3390/s19183867 ◽

2019 ◽

Vol 19 (18) ◽

pp. 3867 ◽

Cited By ~ 2

Author(s):

Jaehyun Yoo

Keyword(s):

Time Series ◽

Supervised Learning ◽

Indoor Localization ◽

Learning Algorithm ◽

Unlabeled Data ◽

Training Data ◽

Practical Implementation ◽

Additional Information ◽

Localization Scheme ◽

The Impact

Machine learning-based indoor localization used to suffer from the collection, construction, and maintenance of labeled training databases for practical implementation. Semi-supervised learning methods have been developed as efficient indoor localization methods to reduce use of labeled training data. To boost the efficiency and the accuracy of indoor localization, this paper proposes a new time-series semi-supervised learning algorithm. The key aspect of the developed method, which distinguishes it from conventional semi-supervised algorithms, is the use of unlabeled data. The learning algorithm finds spatio-temporal relationships in the unlabeled data, and pseudolabels are generated to compensate for the lack of labeled training data. In the next step, another balancing-optimization learning algorithm learns a positioning model. The proposed method is evaluated for estimating the location of a smartphone user by using a Wi-Fi received signal strength indicator (RSSI) measurement. The experimental results show that the developed learning algorithm outperforms some existing semi-supervised algorithms according to the variation of the number of training data and access points. Also, the proposed method is discussed in terms of why it gives better performance, by the analysis of the impact of the learning parameters. Moreover, the extended localization scheme in conjunction with a particle filter is executed to include additional information, such as a floor plan.

Download Full-text

SEMI-SUPERVISED SEQUENCE CLASSIFICATION WITH HMMs

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001405004034 ◽

2005 ◽

Vol 19 (02) ◽

pp. 165-182 ◽

Cited By ~ 7

Author(s):

SHI ZHONG

Keyword(s):

Supervised Learning ◽

Learning Strategies ◽

Test Data ◽

Unlabeled Data ◽

Training Data ◽

Model Complexity ◽

Model Parameters ◽

Training Process ◽

Transductive Learning ◽

Model Training

Using unlabeled data to help supervised learning has become an increasingly attractive methodology and proven to be effective in many applications. This paper applies semi-supervised classification algorithms, based on hidden Markov models, to classify sequences. For model-based classification, semi-supervised learning amounts to using both labeled and unlabeled data to train model parameters. We examine three different strategies of using labeled and unlabeled data in the model training process. These strategies differ in how and when labeled and unlabeled data contribute to the model training process. We also compare regular semi-supervised learning, where there are separate unlabeled training data and unlabeled test data, with transductive learning where we do not differentiate between unlabeled training data and unlabeled test data. Our experimental results on synthetic and real EEG time-series show that substantially improved classification accuracy can be achieved by these semi-supervised learning strategies. The effect of model complexity on semi-supervised learning is also studied in our experiments.

Download Full-text

PAC learning, VC dimension, and the arithmetic hierarchy

Archive for Mathematical Logic ◽

10.1007/s00153-015-0445-8 ◽

2015 ◽

Vol 54 (7-8) ◽

pp. 871-883

Author(s):

Wesley Calvert

Keyword(s):

Pac Learning ◽

Vc Dimension ◽

Arithmetic Hierarchy

Download Full-text

The VC Dimension and Pseudodimension of Two-Layer Neural Networks with Discrete Inputs

Neural Computation ◽

10.1162/neco.1996.8.3.625 ◽

1996 ◽

Vol 8 (3) ◽

pp. 625-628 ◽

Cited By ~ 9

Author(s):

Peter L. Bartlett ◽

Robert C. Williamson

Keyword(s):

Neural Networks ◽

Basis Function ◽

Upper Bounds ◽

Pac Learning ◽

Learning Performance ◽

Sigmoid Function ◽

Vc Dimension ◽

Learning Framework ◽

Probably Approximately Correct ◽

Training Examples

We give upper bounds on the Vapnik-Chervonenkis dimension and pseudodimension of two-layer neural networks that use the standard sigmoid function or radial basis function and have inputs from {−D, …,D}n. In Valiant's probably approximately correct (pac) learning framework for pattern classification, and in Haussler's generalization of this framework to nonlinear regression, the results imply that the number of training examples necessary for satisfactory learning performance grows no more rapidly than W log (WD), where W is the number of weights. The previous best bound for these networks was O(W4).

Download Full-text

Training Data Reduction and Classification Based on Greedy Kernel Principal Component Analysis and Fuzzy C-Means Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.347-350.2390 ◽

2013 ◽

Vol 347-350 ◽

pp. 2390-2394

Author(s):

Xiao Fang Liu ◽

Chun Yang

Keyword(s):

Principal Component Analysis ◽

Feature Extraction ◽

Computational Complexity ◽

Principal Component ◽

Large Data ◽

Component Analysis ◽

Training Data ◽

Kernel Principal Component Analysis ◽

Nonlinear Feature Extraction ◽

Nonlinear Feature

Nonlinear feature extraction used standard Kernel Principal Component Analysis (KPCA) method has large memories and high computational complexity in large datasets. A Greedy Kernel Principal Component Analysis (GKPCA) method is applied to reduce training data and deal with the nonlinear feature extraction problem for training data of large data in classification. First, a subset, which approximates to the original training data, is selected from the full training data using the greedy technique of the GKPCA method. Then, the feature extraction model is trained by the subset instead of the full training data. Finally, FCM algorithm classifies feature extraction data of the GKPCA, KPCA and PCA methods, respectively. The simulation results indicate that the feature extraction performance of both the GKPCA, and KPCA methods outperform the PCA method. In addition of retaining the performance of the KPCA method, the GKPCA method reduces computational complexity due to the reduced training set in classification.

Download Full-text