generalization bound Latest Research Papers

Domain Adaptation aims at benefiting from a labeled dataset drawn from a source distribution to learn a model from examples generated from a different but related target distribution. Creating a domain-invariant representation between the two source and target domains is the most widely technique used. A simple and robust way to perform this task consists in (i) representing the two domains by subspaces described by their respective eigenvectors and (ii) seeking a mapping function which aligns them. In this paper, we propose to use Optimal Transport (OT) and its associated Wassertein distance to perform this alignment. While the idea of using OT in domain adaptation is not new, the original contribution of this paper is two-fold: (i) we derive a generalization bound on the target error involving several Wassertein distances. This prompts us to optimize the ground metric of OT to reduce the target risk; (ii) from this theoretical analysis, we design an algorithm (MLOT) which optimizes a Mahalanobis distance leading to a transportation plan that adapts better. Extensive experiments demonstrate the effectiveness of this original approach.

Download Full-text

Greedy Convex Ensemble

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/429 ◽

2020 ◽

Author(s):

Thanh Tan Nguyen ◽

Nan Ye ◽

Peter Bartlett

Keyword(s):

Convex Combination ◽

A Priori ◽

Parameter Tuning ◽

Basis Functions ◽

Early Stopping ◽

Linear Hull ◽

Simple Basis ◽

Generalization Bound ◽

The Greedy Algorithm ◽

Better Than

We consider learning a convex combination of basis models, and present some new theoretical and empirical results that demonstrate the effectiveness of a greedy approach. Theoretically, we first consider whether we can use linear, instead of convex, combinations, and obtain generalization results similar to existing ones for learning from a convex hull. We obtain a negative result that even the linear hull of very simple basis functions can have unbounded capacity, and is thus prone to overfitting; on the other hand, convex hulls are still rich but have bounded capacities. Secondly, we obtain a generalization bound for a general class of Lipschitz loss functions. Empirically, we first discuss how a convex combination can be greedily learned with early stopping, and how a convex combination can be non-greedily learned when the number of basis models is known a priori. Our experiments suggest that the greedy scheme is competitive with or better than several baselines, including boosting and random forests. The greedy algorithm requires little effort in hyper-parameter tuning, and also seems able to adapt to the underlying complexity of the problem. Our code is available at https://github.com/tan1889/gce.

Download Full-text

Cost-Accuracy Aware Adaptive Labeling for Active Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5640 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2569-2576

Author(s):

Ruijiang Gao ◽

Maytal Saar-Tsechansky

Keyword(s):

Active Learning ◽

Lower Cost ◽

State Of The Art ◽

Learning Algorithms ◽

Budget Constraint ◽

Fixed Cost ◽

Trade Off ◽

Generalization Performance ◽

Generalization Bound ◽

Generalization Accuracy

Conventional active learning algorithms assume a single labeler that produces noiseless label at a given, fixed cost, and aim to achieve the best generalization performance for given classifier under a budget constraint. However, in many real settings, different labelers have different labeling costs and can yield different labeling accuracies. Moreover, a given labeler may exhibit different labeling accuracies for different instances. This setting can be referred to as active learning with diverse labelers with varying costs and accuracies, and it arises in many important real settings. It is therefore beneficial to understand how to effectively trade-off between labeling accuracy for different instances, labeling costs, as well as the informativeness of training instances, so as to achieve the best generalization performance at the lowest labeling cost. In this paper, we propose a new algorithm for selecting instances, labelers (and their corresponding costs and labeling accuracies), that employs generalization bound of learning with label noise to select informative instances and labelers so as to achieve higher generalization accuracy at a lower cost. Our proposed algorithm demonstrates state-of-the-art performance on five UCI and a real crowdsourcing dataset.

Download Full-text

Scenario-based Generalization Bound for Anomaly Detection Support Vector Machine Ensembles

Proceedings of the 30th European Safety and Reliability Conference and 15th Probabilistic Safety Assessment and Management Conference ◽

10.3850/978-981-14-8593-0_5708-cd ◽

2020 ◽

Author(s):

Roberto Rocchetta ◽

Milan Petkovic ◽

Qi Gao

Keyword(s):

Support Vector Machine ◽

Anomaly Detection ◽

Support Vector ◽

Generalization Bound

Download Full-text

Theoretical Investigation of Generalization Bound for Residual Networks

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/288 ◽

2019 ◽

Author(s):

Hao Chen ◽

Zhanfeng Mo ◽

Zhouwang Yang ◽

Xiao Wang

Keyword(s):

Neural Networks ◽

Upper Bound ◽

Numerical Experiments ◽

Deep Neural Networks ◽

Square Root ◽

Capacity Control ◽

Regression Problem ◽

Root Term ◽

Generalization Bound ◽

Residual Block

This paper presents a framework for norm-based capacity control with respect to an lp,q-norm in weight-normalized Residual Neural Networks (ResNets). We first formulate the representation of each residual block. For the regression problem, we analyze the Rademacher Complexity of the ResNets family. We also establish a tighter generalization upper bound for weight-normalized ResNets. in a more general sight. Using the lp,q-norm weight normalization in which 1/p+1/q >=1, we discuss the properties of a width-independent capacity control, which only relies on the depth according to a square root term. Several comparisons suggest that our result is tighter than previous work. Parallel results for Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN) are included by introducing the lp,q-norm weight normalization for DNN and the lp,q-norm kernel normalization for CNN. Numerical experiments also verify that ResNet structures contribute to better generalization properties.

Download Full-text

Learning Personalized Attribute Preference via Multi-Task AUC Optimization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015660 ◽

2019 ◽

Vol 33 ◽

pp. 5660-5667

Author(s):

Zhiyong Yang ◽

Qianqian Xu ◽

Xiaochun Cao ◽

Qingming Huang

Keyword(s):

Wide Spectrum ◽

Closed Form Solution ◽

Form Solution ◽

Learning Methods ◽

Auc Optimization ◽

Area Under Roc Curve ◽

Multi Level ◽

Generalization Bound ◽

Attribute Learning ◽

Task Parameter

Traditionally, most of the existing attribute learning methods are trained based on the consensus of annotations aggregated from a limited number of annotators. However, the consensus might fail in settings, especially when a wide spectrum of annotators with different interests and comprehension about the attribute words are involved. In this paper, we develop a novel multi-task method to understand and predict personalized attribute annotations. Regarding the attribute preference learning for each annotator as a specific task, we first propose a multi-level task parameter decomposition to capture the evolution from a highly popular opinion of the mass to highly personalized choices that are special for each person. Meanwhile, for personalized learning methods, ranking prediction is much more important than accurate classification. This motivates us to employ an Area Under ROC Curve (AUC) based loss function to improve our model. On top of the AUC-based loss, we propose an efficient method to evaluate the loss and gradients. Theoretically, we propose a novel closed-form solution for one of our non-convex subproblem, which leads to provable convergence behaviors. Furthermore, we also provide a generalization bound to guarantee a reasonable performance. Finally, empirical analysis consistently speaks to the efficacy of our proposed method.

Download Full-text

A Sharper Generalization Bound for Divide-and-Conquer Ridge Regression

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015305 ◽

2019 ◽

Vol 33 ◽

pp. 5305-5312 ◽

Cited By ~ 2

Author(s):

Shusen Wang

Keyword(s):

Ridge Regression ◽

Constant Factor ◽

Divide And Conquer ◽

Empirical Risk Minimization ◽

Risk Minimization ◽

Regression Problem ◽

Empirical Risk ◽

Generalization Bound ◽

Better Than ◽

Real World Problems

We study the distributed machine learning problem where the n feature-response pairs are partitioned among m machines uniformly at random. The goal is to approximately solve an empirical risk minimization (ERM) problem with the minimum amount of communication. The divide-and-conquer (DC) method, which was proposed several years ago, lets every worker machine independently solve the same ERM problem using its local feature-response pairs and the driver machine combine the solutions. This approach is in one-shot and thereby extremely communication-efficient. Although the DC method has been studied by many prior works, reasonable generalization bound has not been established before this work.For the ridge regression problem, we show that the prediction error of the DC method on unseen test samples is at most ε times larger than the optimal. There have been constantfactor bounds in the prior works, their sample complexities have a quadratic dependence on d, which does not match the setting of most real-world problems. In contrast, our bounds are much stronger. First, our 1 + ε error bound is much better than their constant-factor bounds. Second, our sample complexity is merely linear with d.

Download Full-text

Dimensionality-Dependent Generalization Bounds for k-Dimensional Coding Schemes

Neural Computation ◽

10.1162/neco_a_00872 ◽

2016 ◽

Vol 28 (10) ◽

pp. 2213-2249 ◽

Cited By ~ 31

Author(s):

Tongliang Liu ◽

Dacheng Tao ◽

Dong Xu

Keyword(s):

Loss Function ◽

Nonnegative Matrix ◽

Feature Space ◽

Reconstruction Error ◽

Worst Case ◽

Generalization Bounds ◽

Coding Schemes ◽

Finite Dimensional ◽

Special Cases ◽

Generalization Bound

The k-dimensional coding schemes refer to a collection of methods that attempt to represent data using a set of representative k-dimensional vectors and include nonnegative matrix factorization, dictionary learning, sparse coding, k-means clustering, and vector quantization as special cases. Previous generalization bounds for the reconstruction error of the k-dimensional coding schemes are mainly dimensionality-independent. A major advantage of these bounds is that they can be used to analyze the generalization error when data are mapped into an infinite- or high-dimensional feature space. However, many applications use finite-dimensional data features. Can we obtain dimensionality-dependent generalization bounds for k-dimensional coding schemes that are tighter than dimensionality-independent bounds when data are in a finite-dimensional feature space? Yes. In this letter, we address this problem and derive a dimensionality-dependent generalization bound for k-dimensional coding schemes by bounding the covering number of the loss function class induced by the reconstruction error. The bound is of order [Formula: see text], where m is the dimension of features, k is the number of the columns in the linear implementation of coding schemes, and n is the size of sample, [Formula: see text] when n is finite and [Formula: see text] when n is infinite. We show that our bound can be tighter than previous results because it avoids inducing the worst-case upper bound on k of the loss function. The proposed generalization bound is also applied to some specific coding schemes to demonstrate that the dimensionality-dependent bound is an indispensable complement to the dimensionality-independent generalization bounds.

Download Full-text

generalization bound
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Conditional Mutual Information-Based Generalization Bound for Meta Learning

Generalization Bound for Imbalanced Classification

Metric Learning in Optimal Transport for Domain Adaptation

Greedy Convex Ensemble

Cost-Accuracy Aware Adaptive Labeling for Active Learning

Scenario-based Generalization Bound for Anomaly Detection Support Vector Machine Ensembles

Theoretical Investigation of Generalization Bound for Residual Networks

Learning Personalized Attribute Preference via Multi-Task AUC Optimization

A Sharper Generalization Bound for Divide-and-Conquer Ridge Regression

Dimensionality-Dependent Generalization Bounds for k-Dimensional Coding Schemes

Export Citation Format

generalization boundRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Conditional Mutual Information-Based Generalization Bound for Meta Learning

Generalization Bound for Imbalanced Classification

Metric Learning in Optimal Transport for Domain Adaptation

Greedy Convex Ensemble

Cost-Accuracy Aware Adaptive Labeling for Active Learning

Scenario-based Generalization Bound for Anomaly Detection Support Vector Machine Ensembles

Theoretical Investigation of Generalization Bound for Residual Networks

Learning Personalized Attribute Preference via Multi-Task AUC Optimization

A Sharper Generalization Bound for Divide-and-Conquer Ridge Regression

Dimensionality-Dependent Generalization Bounds for k-Dimensional Coding Schemes

generalization bound
Recently Published Documents