Generalization error of GAN from the discriminator’s perspective

Traditional online algorithms encapsulate decision making under uncertainty, and give ways to hedge against all possible future events, while guaranteeing a nearly optimal solution, as compared to an offline optimum. On the other hand, machine learning algorithms are in the business of extrapolating patterns found in the data to predict the future, and usually come with strong guarantees on the expected generalization error. In this work, we develop a framework for augmenting online algorithms with a machine learned predictor to achieve competitive ratios that provably improve upon unconditional worst-case lower bounds when the predictor has low error. Our approach treats the predictor as a complete black box and is not dependent on its inner workings or the exact distribution of its errors. We apply this framework to the traditional caching problem—creating an eviction strategy for a cache of size k . We demonstrate that naively following the oracle’s recommendations may lead to very poor performance, even when the average error is quite low. Instead, we show how to modify the Marker algorithm to take into account the predictions and prove that this combined approach achieves a competitive ratio that both (i) decreases as the predictor’s error decreases and (ii) is always capped by O (log k ), which can be achieved without any assistance from the predictor. We complement our results with an empirical evaluation of our algorithm on real-world datasets and show that it performs well empirically even when using simple off-the-shelf predictions.

Download Full-text

Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks

Nature Communications ◽

10.1038/s41467-021-23103-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Abdulkadir Canatar ◽

Blake Bordelon ◽

Cengiz Pehlevan

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Kernel Regression ◽

Learning Task ◽

Learning Curves ◽

Generalization Error ◽

Theoretical Understanding ◽

Classical Statistics ◽

Deep Networks ◽

Model Alignment

AbstractA theoretical understanding of generalization remains an open problem for many machine learning models, including deep networks where overparameterization leads to better performance, contradicting the conventional wisdom from classical statistics. Here, we investigate generalization error for kernel regression, which, besides being a popular machine learning method, also describes certain infinitely overparameterized neural networks. We use techniques from statistical mechanics to derive an analytical expression for generalization error applicable to any kernel and data distribution. We present applications of our theory to real and synthetic datasets, and for many kernels including those that arise from training deep networks in the infinite-width limit. We elucidate an inductive bias of kernel regression to explain data with simple functions, characterize whether a kernel is compatible with a learning task, and show that more data may impair generalization when noisy or not expressible by the kernel, leading to non-monotonic learning curves with possibly many peaks.

Download Full-text

A comparison of tight generalization error bounds

Proceedings of the 22nd international conference on Machine learning - ICML '05 ◽

10.1145/1102351.1102403 ◽

2005 ◽

Cited By ~ 5

Author(s):

Matti Kääriäinen ◽

John Langford

Keyword(s):

Error Bounds ◽

Generalization Error

Download Full-text

Maximizing minority accuracy for imbalanced pattern classification problems using cost-sensitive Localized Generalization Error Model

Applied Soft Computing ◽

10.1016/j.asoc.2021.107178 ◽

2021 ◽

Vol 104 ◽

pp. 107178

Author(s):

Wing W.Y. Ng ◽

Zhengxi Liu ◽

Jianjun Zhang ◽

Witold Pedrycz

Keyword(s):

Pattern Classification ◽

Error Model ◽

Generalization Error ◽

Classification Problems ◽

Localized Generalization Error

Download Full-text

Generalization error of random feature and kernel methods: hypercontractivity and kernel matrix concentration

Applied and Computational Harmonic Analysis ◽

10.1016/j.acha.2021.12.003 ◽

2021 ◽

Author(s):

Song Mei ◽

Theodor Misiakiewicz ◽

Andrea Montanari

Keyword(s):

Kernel Methods ◽

Kernel Matrix ◽

Generalization Error ◽

Matrix Concentration

Download Full-text

Generalization Error Analysis for FDR Controlled Classification

2007 IEEE/SP 14th Workshop on Statistical Signal Processing ◽

10.1109/ssp.2007.4301368 ◽

2007 ◽

Cited By ~ 1

Author(s):

Clayton Scott ◽

Gowtham Bellala ◽

Rebecca Willett

Keyword(s):

Error Analysis ◽

Generalization Error

Download Full-text

FEATURE SELECTION AND THE CHESSBOARD PROBLEM

Acta Universitatis Lodziensis Folia oeconomica ◽

10.18778/0208-6018.311.03 ◽

2015 ◽

Vol 1 (311) ◽

Author(s):

Mariusz Kubus

Keyword(s):

Feature Selection ◽

Data Structure ◽

Important Criterion ◽

Multivariate Approach ◽

Generalization Error ◽

Selection Methods ◽

Embedded Methods ◽

Feature Relevance

Feature selection methods are usually classified into three groups: filters, wrappers and embedded methods. The second important criterion of their classification is an individual or multivariate approach to evaluation of the feature relevance. The chessboard problem is an illustrative example, where two variables which have no individual influence on the dependent variable can be essential to separate the classes. The classifiers which deal well with such data structure are sensitive to irrelevant variables. The generalization error increases with the number of noisy variables. We discuss the feature selection methods in the context of chessboard-like structure in the data with numerous irrelevant variables.

Download Full-text

Training error, generalization error and learning curves in neural learning

Proceedings 1995 Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems ◽

10.1109/annes.1995.499426 ◽

2002 ◽

Cited By ~ 1

Author(s):

S.-I. Amari

Keyword(s):

Learning Curves ◽

Generalization Error ◽

Neural Learning ◽

Training Error

Download Full-text

Flat Minima

Neural Computation ◽

10.1162/neco.1997.9.1.1 ◽

1997 ◽

Vol 9 (1) ◽

pp. 1-42 ◽

Cited By ~ 156

Author(s):

Sepp Hochreiter ◽

Jürgen Schmidhuber

Keyword(s):

Neural Networks ◽

Error Function ◽

Low Complexity ◽

Generalization Error ◽

Input Output ◽

Generalization Capability ◽

Training Set ◽

Weight Decay ◽

Optimal Brain Surgeon ◽

And Training

We present a new algorithm for finding low-complexity neural networks with high generalization capability. The algorithm searches for a “flat” minimum of the error function. A flat minimum is a large connected region in weight space where the error remains approximately constant. An MDL-based, Bayesian argument suggests that flat minima correspond to “simple” networks and low expected overfitting. The argument is based on a Gibbs algorithm variant and a novel way of splitting generalization error into underfitting and overfitting error. Unlike many previous approaches, ours does not require gaussian assumptions and does not depend on a “good” weight prior. Instead we have a prior over input output functions, thus taking into account net architecture and training set. Although our algorithm requires the computation of second-order derivatives, it has backpropagation's order of complexity. Automatically, it effectively prunes units, weights, and input lines. Various experiments with feedforward and recurrent nets are described. In an application to stock market prediction, flat minimum search outperforms conventional backprop, weight decay, and “optimal brain surgeon/optimal brain damage.”

Download Full-text

Generalization Error of Linear Neural Networks in Unidentifiable Cases

Lecture Notes in Computer Science - Algorithmic Learning Theory ◽

10.1007/3-540-46769-6_5 ◽

1999 ◽

pp. 51-62 ◽

Cited By ~ 9

Author(s):

Kenji Fukumizu

Keyword(s):

Neural Networks ◽

Generalization Error

Download Full-text