Rényi Divergence Based Bounds on Generalization Error

Traditional online algorithms encapsulate decision making under uncertainty, and give ways to hedge against all possible future events, while guaranteeing a nearly optimal solution, as compared to an offline optimum. On the other hand, machine learning algorithms are in the business of extrapolating patterns found in the data to predict the future, and usually come with strong guarantees on the expected generalization error. In this work, we develop a framework for augmenting online algorithms with a machine learned predictor to achieve competitive ratios that provably improve upon unconditional worst-case lower bounds when the predictor has low error. Our approach treats the predictor as a complete black box and is not dependent on its inner workings or the exact distribution of its errors. We apply this framework to the traditional caching problem—creating an eviction strategy for a cache of size k . We demonstrate that naively following the oracle’s recommendations may lead to very poor performance, even when the average error is quite low. Instead, we show how to modify the Marker algorithm to take into account the predictions and prove that this combined approach achieves a competitive ratio that both (i) decreases as the predictor’s error decreases and (ii) is always capped by O (log k ), which can be achieved without any assistance from the predictor. We complement our results with an empirical evaluation of our algorithm on real-world datasets and show that it performs well empirically even when using simple off-the-shelf predictions.

Download Full-text

Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks

Nature Communications ◽

10.1038/s41467-021-23103-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Abdulkadir Canatar ◽

Blake Bordelon ◽

Cengiz Pehlevan

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Kernel Regression ◽

Learning Task ◽

Learning Curves ◽

Generalization Error ◽

Theoretical Understanding ◽

Classical Statistics ◽

Deep Networks ◽

Model Alignment

AbstractA theoretical understanding of generalization remains an open problem for many machine learning models, including deep networks where overparameterization leads to better performance, contradicting the conventional wisdom from classical statistics. Here, we investigate generalization error for kernel regression, which, besides being a popular machine learning method, also describes certain infinitely overparameterized neural networks. We use techniques from statistical mechanics to derive an analytical expression for generalization error applicable to any kernel and data distribution. We present applications of our theory to real and synthetic datasets, and for many kernels including those that arise from training deep networks in the infinite-width limit. We elucidate an inductive bias of kernel regression to explain data with simple functions, characterize whether a kernel is compatible with a learning task, and show that more data may impair generalization when noisy or not expressible by the kernel, leading to non-monotonic learning curves with possibly many peaks.

Download Full-text

A comparison of tight generalization error bounds

Proceedings of the 22nd international conference on Machine learning - ICML '05 ◽

10.1145/1102351.1102403 ◽

2005 ◽

Cited By ~ 5

Author(s):

Matti Kääriäinen ◽

John Langford

Keyword(s):

Error Bounds ◽

Generalization Error

Download Full-text

Fuzzy Clustering Methods with Rényi Relative Entropy and Cluster Size

Mathematics ◽

10.3390/math9121423 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1423

Author(s):

Javier Bonilla ◽

Daniel Vélez ◽

Javier Montero ◽

J. Tinguaro Rodríguez

Keyword(s):

Relative Entropy ◽

Fuzzy Clustering ◽

Cluster Size ◽

Computational Study ◽

Gaussian Kernel ◽

Clustering Methods ◽

Rényi Divergence ◽

Divergence Measures ◽

Fuzzy Clustering Methods ◽

Rényi Relative Entropy

In the last two decades, information entropy measures have been relevantly applied in fuzzy clustering problems in order to regularize solutions by avoiding the formation of partitions with excessively overlapping clusters. Following this idea, relative entropy or divergence measures have been similarly applied, particularly to enable that kind of entropy-based regularization to also take into account, as well as interact with, cluster size variables. Particularly, since Rényi divergence generalizes several other divergence measures, its application in fuzzy clustering seems promising for devising more general and potentially more effective methods. However, previous works making use of either Rényi entropy or divergence in fuzzy clustering, respectively, have not considered cluster sizes (thus applying regularization in terms of entropy, not divergence) or employed divergence without a regularization purpose. Then, the main contribution of this work is the introduction of a new regularization term based on Rényi relative entropy between membership degrees and observation ratios per cluster to penalize overlapping solutions in fuzzy clustering analysis. Specifically, such Rényi divergence-based term is added to the variance-based Fuzzy C-means objective function when allowing cluster sizes. This then leads to the development of two new fuzzy clustering methods exhibiting Rényi divergence-based regularization, the second one extending the first by considering a Gaussian kernel metric instead of the Euclidean distance. Iterative expressions for these methods are derived through the explicit application of Lagrange multipliers. An interesting feature of these expressions is that the proposed methods seem to take advantage of a greater amount of information in the updating steps for membership degrees and observations ratios per cluster. Finally, an extensive computational study is presented showing the feasibility and comparatively good performance of the proposed methods.

Download Full-text

Estimation of different entropies via Lidstone polynomial using Jensen-type functionals

Arabian Journal of Mathematics ◽

10.1007/s40065-020-00277-y ◽

2020 ◽

Vol 9 (3) ◽

pp. 613-631

Author(s):

Khuram Ali Khan ◽

Tasadduq Niaz ◽

Đilda Pečarić ◽

Josip Pečarić

Keyword(s):

Convex Function ◽

Shannon Entropy ◽

Rényi Divergence ◽

Lidstone Polynomial

Abstract In this work, some new functional of Jensen-type inequalities are constructed using Shannon entropy, f-divergence, and Rényi divergence, and some estimates are obtained for these new functionals. Also using the Zipf–Mandelbrot law and hybrid Zipf–Mandelbrot law, we investigate some bounds for these new functionals. Furthermore, we generalize these new functionals for m-convex function using Lidstone polynomial.

Download Full-text