On the Complexity of Learning from Label Proportions

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/232 ◽

2017 ◽

Cited By ~ 5

Author(s):

Benjamin Fish ◽

Lev Reyzin

Keyword(s):

Computational Complexity ◽

Training Data ◽

Pac Learning ◽

Vc Dimension ◽

Simple Version ◽

Learning From Label Proportions

In the problem of learning with label proportions (also known as the problem of estimating class ratios), the training data is unlabeled, and only the proportions of examples receiving each label are given. The goal is to learn a hypothesis that predicts the proportions of labels on the distribution underlying the sample. This model of learning is useful in a wide variety of settings, including predicting the number of votes for candidates in political elections from polls. In this paper, we resolve foundational questions regarding the computational complexity of learning in this setting. We formalize a simple version of the setting, and we compare the computational complexity of learning in this model to classical PAC learning. Perhaps surprisingly, we show that what can be learned efficiently in this model is a strict subset of what may be leaned efficiently in PAC, under standard complexity assumptions. We give a characterization in terms of VC dimension, and we show that there are non-trivial problems in this model that can be efficiently learned. We also give an algorithm that demonstrates the feasibility of learning under well-behaved distributions.

Download Full-text

On the Complexity of Learning a Class Ratio from Unlabeled Data

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12013 ◽

2020 ◽

Vol 69 ◽

Author(s):

Benjamin Fish ◽

Lev Reyzin

Keyword(s):

Computational Complexity ◽

Unlabeled Data ◽

Training Data ◽

Pac Learning ◽

Vc Dimension ◽

Standard Set

In the problem of learning a class ratio from unlabeled data, which we call CR learning, the training data is unlabeled, and only the ratios, or proportions, of examples receiving each label are given. The goal is to learn a hypothesis that predicts the proportions of labels on the distribution underlying the sample. This model of learning is applicable to a wide variety of settings, including predicting the number of votes for candidates in political elections from polls. In this paper, we formally define this class and resolve foundational questions regarding the computational complexity of CR learning and characterize its relationship to PAC learning. Among our results, we show, perhaps surprisingly, that for finite VC classes what can be efficiently CR learned is a strict subset of what can be learned efficiently in PAC, under standard complexity assumptions. We also show that there exist classes of functions whose CR learnability is independent of ZFC, the standard set theoretic axioms. This implies that CR learning cannot be easily characterized (like PAC by VC dimension).

Download Full-text

Investigating the Potential of Network Optimization for a Constrained Object Detection Problem

Journal of Imaging ◽

10.3390/jimaging7040064 ◽

2021 ◽

Vol 7 (4) ◽

pp. 64

Author(s):

Tanguy Ophoff ◽

Cédric Gullentops ◽

Kristof Van Beeck ◽

Toon Goedemé

Keyword(s):

Computational Complexity ◽

Object Detection ◽

Network Optimization ◽

Real Life ◽

Optimization Techniques ◽

Training Data ◽

Single Shot ◽

Standard Object ◽

Number Of Classes

Object detection models are usually trained and evaluated on highly complicated, challenging academic datasets, which results in deep networks requiring lots of computations. However, a lot of operational use-cases consist of more constrained situations: they have a limited number of classes to be detected, less intra-class variance, less lighting and background variance, constrained or even fixed camera viewpoints, etc. In these cases, we hypothesize that smaller networks could be used without deteriorating the accuracy. However, there are multiple reasons why this does not happen in practice. Firstly, overparameterized networks tend to learn better, and secondly, transfer learning is usually used to reduce the necessary amount of training data. In this paper, we investigate how much we can reduce the computational complexity of a standard object detection network in such constrained object detection problems. As a case study, we focus on a well-known single-shot object detector, YoloV2, and combine three different techniques to reduce the computational complexity of the model without reducing its accuracy on our target dataset. To investigate the influence of the problem complexity, we compare two datasets: a prototypical academic (Pascal VOC) and a real-life operational (LWIR person detection) dataset. The three optimization steps we exploited are: swapping all the convolutions for depth-wise separable convolutions, perform pruning and use weight quantization. The results of our case study indeed substantiate our hypothesis that the more constrained a problem is, the more the network can be optimized. On the constrained operational dataset, combining these optimization techniques allowed us to reduce the computational complexity with a factor of 349, as compared to only a factor 9.8 on the academic dataset. When running a benchmark on an Nvidia Jetson AGX Xavier, our fastest model runs more than 15 times faster than the original YoloV2 model, whilst increasing the accuracy by 5% Average Precision (AP).

Download Full-text

PAC learning, VC dimension, and the arithmetic hierarchy

Archive for Mathematical Logic ◽

10.1007/s00153-015-0445-8 ◽

2015 ◽

Vol 54 (7-8) ◽

pp. 871-883

Author(s):

Wesley Calvert

Keyword(s):

Pac Learning ◽

Vc Dimension ◽

Arithmetic Hierarchy

Download Full-text

The VC Dimension and Pseudodimension of Two-Layer Neural Networks with Discrete Inputs

Neural Computation ◽

10.1162/neco.1996.8.3.625 ◽

1996 ◽

Vol 8 (3) ◽

pp. 625-628 ◽

Cited By ~ 9

Author(s):

Peter L. Bartlett ◽

Robert C. Williamson

Keyword(s):

Neural Networks ◽

Basis Function ◽

Upper Bounds ◽

Pac Learning ◽

Learning Performance ◽

Sigmoid Function ◽

Vc Dimension ◽

Learning Framework ◽

Probably Approximately Correct ◽

Training Examples

We give upper bounds on the Vapnik-Chervonenkis dimension and pseudodimension of two-layer neural networks that use the standard sigmoid function or radial basis function and have inputs from {−D, …,D}n. In Valiant's probably approximately correct (pac) learning framework for pattern classification, and in Haussler's generalization of this framework to nonlinear regression, the results imply that the number of training examples necessary for satisfactory learning performance grows no more rapidly than W log (WD), where W is the number of weights. The previous best bound for these networks was O(W4).

Download Full-text

Training Data Reduction and Classification Based on Greedy Kernel Principal Component Analysis and Fuzzy C-Means Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.347-350.2390 ◽

2013 ◽

Vol 347-350 ◽

pp. 2390-2394

Author(s):

Xiao Fang Liu ◽

Chun Yang

Keyword(s):

Principal Component Analysis ◽

Feature Extraction ◽

Computational Complexity ◽

Principal Component ◽

Large Data ◽

Component Analysis ◽

Training Data ◽

Kernel Principal Component Analysis ◽

Nonlinear Feature Extraction ◽

Nonlinear Feature

Nonlinear feature extraction used standard Kernel Principal Component Analysis (KPCA) method has large memories and high computational complexity in large datasets. A Greedy Kernel Principal Component Analysis (GKPCA) method is applied to reduce training data and deal with the nonlinear feature extraction problem for training data of large data in classification. First, a subset, which approximates to the original training data, is selected from the full training data using the greedy technique of the GKPCA method. Then, the feature extraction model is trained by the subset instead of the full training data. Finally, FCM algorithm classifies feature extraction data of the GKPCA, KPCA and PCA methods, respectively. The simulation results indicate that the feature extraction performance of both the GKPCA, and KPCA methods outperform the PCA method. In addition of retaining the performance of the KPCA method, the GKPCA method reduces computational complexity due to the reduced training set in classification.

Download Full-text

Two-stage Training for Learning from Label Proportions

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/377 ◽

2021 ◽

Author(s):

Jiabin Liu ◽

Bo Wang ◽

Xin Shen ◽

Zhiquan Qi ◽

Yingjie Tian

Keyword(s):

Performance Improvement ◽

Training Data ◽

Continuous Training ◽

High Entropy ◽

Training Stage ◽

Probabilistic Classifier ◽

Leibler Divergence ◽

Learning From Label Proportions ◽

Post Hoc ◽

The Given

Learning from label proportions (LLP) aims at learning an instance-level classifier with label proportions in grouped training data. Existing deep learning based LLP methods utilize end-to-end pipelines to obtain the proportional loss with Kullback-Leibler divergence between the bag-level prior and posterior class distributions. However, the unconstrained optimization on this objective can hardly reach a solution in accordance with the given proportions. Besides, concerning the probabilistic classifier, this strategy unavoidably results in high-entropy conditional class distributions at the instance level. These issues further degrade the performance of the instance-level classification. In this paper, we regard these problems as noisy pseudo labeling, and instead impose the strict proportion consistency on the classifier with a constrained optimization as a continuous training stage for existing LLP classifiers. In addition, we introduce the mixup strategy and symmetric cross-entropy to further reduce the label noise. Our framework is model-agnostic, and demonstrates compelling performance improvement in extensive experiments, when incorporated into other deep LLP models as a post-hoc phase.

Download Full-text

Parameter Estimation for α-GMM Based on Maximum Likelihood Criterion

Neural Computation ◽

10.1162/neco.2008.04-08-776 ◽

2009 ◽

Vol 21 (6) ◽

pp. 1776-1795 ◽

Cited By ~ 10

Author(s):

Dalei Wu

Keyword(s):

Parameter Estimation ◽

Computational Complexity ◽

Maximum Likelihood ◽

Expectation Maximization ◽

Speaker Recognition ◽

Expectation Maximization Algorithm ◽

Training Data ◽

Model Parameters ◽

Maximum Likelihood Criterion ◽

Method Model

α-integration and α-GMM have been recently proposed for integrated stochastic modeling. However, there has not been an approach to date for estimating model parameters for α-GMM in a statistical way, based on a set of training data. In this letter, parameter updating formulas are mathematically derived based on maximum likelihood criterion using an adapted expectation-maximization algorithm. With this method, model parameters for α-GMM are reestimated in an iterative way. The updating formulas were found to be simple and systematically compatible with the GMM equations. This advantage renders the α-GMM a superset of the GMM but with similar computational complexity. This method has been effectively applied to realistic speaker recognition applications.

Download Full-text

РОЗРОБКА КЛІНІЧНОЇ ДІАГНОСТИЧНОЇ СИСТЕМИ, ЩО ҐРУНТУЄТЬСЯ НА ПРАВИЛАХ, ПОБУДОВАНИХ МЕТОДОМ ПОСЛІДОВНОГО ПОКРИТТЯ

Medical Informatics and Engineering ◽

10.11603/mie.1996-1960.2014.2.3842 ◽

2015 ◽

Author(s):

O. O. Stakhanska

Keyword(s):

Computational Complexity ◽

Rule Induction ◽

Training Data ◽

Data Sets ◽

Diagnostic Systems ◽

Clinical Diagnostic ◽

Sequential Covering

The work deals with the computational complexity of the rule induction algorithm based on sequential covering when developing clinical diagnostic systems. Established evaluation confirmed experimentally as a change in the amount of attributes, and the volume of training data sets.

Download Full-text

Exemplar-centered Supervised Shallow Parametric Data Embedding

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/345 ◽

2017 ◽

Cited By ~ 1

Author(s):

Martin Renqiang Min ◽

Hongyu Guo ◽

Dongjin Song

Keyword(s):

Computational Complexity ◽

Test Data ◽

Metric Learning ◽

Computational Cost ◽

Training Data ◽

Data Embedding ◽

K Nearest Neighbors ◽

Parametric Data ◽

Benchmark Datasets ◽

Parametric Embedding

Metric learning methods for dimensionality reduction in combination with k-Nearest Neighbors (kNN) have been extensively deployed in many classification, data embedding, and information retrieval applications. However, most of these approaches involve pairwise training data comparisons, and thus have quadratic computational complexity with respect to the size of training set, preventing them from scaling to fairly big datasets. Moreover, during testing, comparing test data against all the training data points is also expensive in terms of both computational cost and resources required. Furthermore, previous metrics are either too constrained or too expressive to be well learned. To effectively solve these issues, we present an exemplar-centered supervised shallow parametric data embedding model, using a Maximally Collapsing Metric Learning (MCML) objective. Our strategy learns a shallow high-order parametric embedding function and compares training/test data only with learned or precomputed exemplars, resulting in a cost function with linear computational complexity for both training and testing. We also empirically demonstrate, using several benchmark datasets, that for classification in two-dimensional embedding space, our approach not only gains speedup of kNN by hundreds of times, but also outperforms state-of-the-art supervised embedding approaches.

Download Full-text

A Fast Algorithm for Multi-Class Learning from Label Proportions

Electronics ◽

10.3390/electronics8060609 ◽

2019 ◽

Vol 8 (6) ◽

pp. 609 ◽

Cited By ~ 1

Author(s):

Fan Zhang ◽

Jiabin Liu ◽

Bo Wang ◽

Zhiquan Qi ◽

Yong Shi

Keyword(s):

Extreme Learning Machine ◽

Fast Algorithm ◽

Efficient Solutions ◽

Training Data ◽

Fast Learning ◽

Learning Speed ◽

Output Matrix ◽

Learning Machine ◽

Learning From Label Proportions ◽

Hidden Layer

Learning from label proportions (LLP) is a new kind of learning problem which has attracted wide interest in machine learning. Different from the well-known supervised learning, the training data of LLP is in the form of bags and only the proportion of each class in each bag is available. Actually, many modern applications can be successfully abstracted to this problem such as modeling voting behaviors and spam filtering. However, time-consuming training is still a challenge for LLP, which becomes a bottleneck especially when addressing large bags and bag sizes. In this paper, we propose a fast algorithm called multi-class learning from label proportions by extreme learning machine (LLP-ELM), which takes advantage of an extreme learning machine with fast learning speed to solve multi-class learning from label proportions. Firstly, we reshape the hidden layer output matrix and the training data target matrix of an extreme learning machine to adapt to the proportion information instead of the real labels. Secondly, a robust loss function with a regularization term is formulated and two efficient solutions are provided to different cases. Finally, various experiments demonstrate the significant speed-up of the proposed model with better accuracies on different datasets compared with several state-of-the-art methods.

Download Full-text