generalization bound
Recently Published Documents


TOTAL DOCUMENTS

16
(FIVE YEARS 9)

H-INDEX

3
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Arezou Rezazadeh ◽  
Sharu Theresa Jose ◽  
Giuseppe Durisi ◽  
Osvaldo Simeone

Author(s):  
Tanguy Kerdoncuff ◽  
Rémi Emonet ◽  
Marc Sebban

Domain Adaptation aims at benefiting from a labeled dataset drawn from a source distribution to learn a model from examples generated from a different but related target distribution. Creating a domain-invariant representation between the two source and target domains is the most widely technique used. A simple and robust way to perform this task consists in (i) representing the two domains by subspaces described by their respective eigenvectors and (ii) seeking a mapping function which aligns them. In this paper, we propose to use Optimal Transport (OT) and its associated Wassertein distance to perform this alignment. While the idea of using OT in domain adaptation is not new, the original contribution of this paper is two-fold: (i) we derive a generalization bound on the target error involving several Wassertein distances. This prompts us to optimize the ground metric of OT to reduce the target risk; (ii) from this theoretical analysis, we design an algorithm (MLOT) which optimizes a Mahalanobis distance leading to a transportation plan that adapts better. Extensive experiments demonstrate the effectiveness of this original approach.


Author(s):  
Thanh Tan Nguyen ◽  
Nan Ye ◽  
Peter Bartlett

We consider learning a convex combination of basis models, and present some new theoretical and empirical results that demonstrate the effectiveness of a greedy approach. Theoretically, we first consider whether we can use linear, instead of convex, combinations, and obtain generalization results similar to existing ones for learning from a convex hull. We obtain a negative result that even the linear hull of very simple basis functions can have unbounded capacity, and is thus prone to overfitting; on the other hand, convex hulls are still rich but have bounded capacities. Secondly, we obtain a generalization bound for a general class of Lipschitz loss functions. Empirically, we first discuss how a convex combination can be greedily learned with early stopping, and how a convex combination can be non-greedily learned when the number of basis models is known a priori. Our experiments suggest that the greedy scheme is competitive with or better than several baselines, including boosting and random forests. The greedy algorithm requires little effort in hyper-parameter tuning, and also seems able to adapt to the underlying complexity of the problem. Our code is available at https://github.com/tan1889/gce.


2020 ◽  
Vol 34 (03) ◽  
pp. 2569-2576
Author(s):  
Ruijiang Gao ◽  
Maytal Saar-Tsechansky

Conventional active learning algorithms assume a single labeler that produces noiseless label at a given, fixed cost, and aim to achieve the best generalization performance for given classifier under a budget constraint. However, in many real settings, different labelers have different labeling costs and can yield different labeling accuracies. Moreover, a given labeler may exhibit different labeling accuracies for different instances. This setting can be referred to as active learning with diverse labelers with varying costs and accuracies, and it arises in many important real settings. It is therefore beneficial to understand how to effectively trade-off between labeling accuracy for different instances, labeling costs, as well as the informativeness of training instances, so as to achieve the best generalization performance at the lowest labeling cost. In this paper, we propose a new algorithm for selecting instances, labelers (and their corresponding costs and labeling accuracies), that employs generalization bound of learning with label noise to select informative instances and labelers so as to achieve higher generalization accuracy at a lower cost. Our proposed algorithm demonstrates state-of-the-art performance on five UCI and a real crowdsourcing dataset.


Author(s):  
Hao Chen ◽  
Zhanfeng Mo ◽  
Zhouwang Yang ◽  
Xiao Wang

This paper presents a framework for norm-based capacity control with respect to an lp,q-norm in weight-normalized Residual Neural Networks (ResNets). We first formulate the representation of each residual block. For the regression problem, we analyze the Rademacher Complexity of the ResNets family. We also establish a tighter generalization upper bound for weight-normalized ResNets. in a more general sight. Using the lp,q-norm weight normalization in which 1/p+1/q >=1, we discuss the properties of a width-independent capacity control, which only relies on the depth according to a square root term. Several comparisons suggest that our result is tighter than previous work. Parallel results for Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN) are included by introducing the lp,q-norm weight normalization for DNN and the lp,q-norm kernel normalization for CNN. Numerical experiments also verify that ResNet structures contribute to better generalization properties.


Author(s):  
Zhiyong Yang ◽  
Qianqian Xu ◽  
Xiaochun Cao ◽  
Qingming Huang

Traditionally, most of the existing attribute learning methods are trained based on the consensus of annotations aggregated from a limited number of annotators. However, the consensus might fail in settings, especially when a wide spectrum of annotators with different interests and comprehension about the attribute words are involved. In this paper, we develop a novel multi-task method to understand and predict personalized attribute annotations. Regarding the attribute preference learning for each annotator as a specific task, we first propose a multi-level task parameter decomposition to capture the evolution from a highly popular opinion of the mass to highly personalized choices that are special for each person. Meanwhile, for personalized learning methods, ranking prediction is much more important than accurate classification. This motivates us to employ an Area Under ROC Curve (AUC) based loss function to improve our model. On top of the AUC-based loss, we propose an efficient method to evaluate the loss and gradients. Theoretically, we propose a novel closed-form solution for one of our non-convex subproblem, which leads to provable convergence behaviors. Furthermore, we also provide a generalization bound to guarantee a reasonable performance. Finally, empirical analysis consistently speaks to the efficacy of our proposed method.


Author(s):  
Shusen Wang

We study the distributed machine learning problem where the n feature-response pairs are partitioned among m machines uniformly at random. The goal is to approximately solve an empirical risk minimization (ERM) problem with the minimum amount of communication. The divide-and-conquer (DC) method, which was proposed several years ago, lets every worker machine independently solve the same ERM problem using its local feature-response pairs and the driver machine combine the solutions. This approach is in one-shot and thereby extremely communication-efficient. Although the DC method has been studied by many prior works, reasonable generalization bound has not been established before this work.For the ridge regression problem, we show that the prediction error of the DC method on unseen test samples is at most ε times larger than the optimal. There have been constantfactor bounds in the prior works, their sample complexities have a quadratic dependence on d, which does not match the setting of most real-world problems. In contrast, our bounds are much stronger. First, our 1 + ε error bound is much better than their constant-factor bounds. Second, our sample complexity is merely linear with d.


2016 ◽  
Vol 28 (10) ◽  
pp. 2213-2249 ◽  
Author(s):  
Tongliang Liu ◽  
Dacheng Tao ◽  
Dong Xu

The k-dimensional coding schemes refer to a collection of methods that attempt to represent data using a set of representative k-dimensional vectors and include nonnegative matrix factorization, dictionary learning, sparse coding, k-means clustering, and vector quantization as special cases. Previous generalization bounds for the reconstruction error of the k-dimensional coding schemes are mainly dimensionality-independent. A major advantage of these bounds is that they can be used to analyze the generalization error when data are mapped into an infinite- or high-dimensional feature space. However, many applications use finite-dimensional data features. Can we obtain dimensionality-dependent generalization bounds for k-dimensional coding schemes that are tighter than dimensionality-independent bounds when data are in a finite-dimensional feature space? Yes. In this letter, we address this problem and derive a dimensionality-dependent generalization bound for k-dimensional coding schemes by bounding the covering number of the loss function class induced by the reconstruction error. The bound is of order [Formula: see text], where m is the dimension of features, k is the number of the columns in the linear implementation of coding schemes, and n is the size of sample, [Formula: see text] when n is finite and [Formula: see text] when n is infinite. We show that our bound can be tighter than previous results because it avoids inducing the worst-case upper bound on k of the loss function. The proposed generalization bound is also applied to some specific coding schemes to demonstrate that the dimensionality-dependent bound is an indispensable complement to the dimensionality-independent generalization bounds.


Sign in / Sign up

Export Citation Format

Share Document