generalization bounds Latest Research Papers

Hausdorff dimension, heavy tails, and generalization in neural networks*

Journal of Statistical Mechanics Theory and Experiment ◽

10.1088/1742-5468/ac3ae7 ◽

2021 ◽

Vol 2021 (12) ◽

pp. 124014

Author(s):

Umut Şimşekli ◽

Ozan Sener ◽

George Deligiannidis ◽

Murat A Erdogdu

Keyword(s):

Neural Networks ◽

Hausdorff Dimension ◽

Heavy Tails ◽

Stochastic Gradient Descent ◽

Generalization Error ◽

Generalization Bounds ◽

Wide Range ◽

Important Challenge ◽

Rigorous Treatment ◽

Heavy Tailed

Abstract Despite its success in a wide range of applications, characterizing the generalization properties of stochastic gradient descent (SGD) in non-convex deep learning problems is still an important challenge. While modeling the trajectories of SGD via stochastic differential equations (SDE) under heavy-tailed gradient noise has recently shed light over several peculiar characteristics of SGD, a rigorous treatment of the generalization properties of such SDEs in a learning theoretical framework is still missing. Aiming to bridge this gap, in this paper, we prove generalization bounds for SGD under the assumption that its trajectories can be well-approximated by a Feller process, which defines a rich class of Markov processes that include several recent SDE representations (both Brownian or heavy-tailed) as its special case. We show that the generalization error can be controlled by the Hausdorff dimension of the trajectories, which is intimately linked to the tail behavior of the driving process. Our results imply that heavier-tailed processes should achieve better generalization; hence, the tail-index of the process can be used as a notion of ‘capacity metric’. We support our theory with experiments on deep neural networks illustrating that the proposed capacity metric accurately estimates the generalization error, and it does not necessarily grow with the number of parameters unlike the existing capacity metrics in the literature.

Encoding-dependent generalization bounds for parametrized quantum circuits

Quantum ◽

10.22331/q-2021-11-17-582 ◽

2021 ◽

Vol 5 ◽

pp. 582

Author(s):

Matthias C. Caro ◽

Elies Gil-Fuster ◽

Johannes Jakob Meyer ◽

Jens Eisert ◽

Ryan Sweke

Keyword(s):

Large Body ◽

Quantum Circuits ◽

Risk Minimization ◽

Complexity Measures ◽

Generalization Bounds ◽

Data Encoding ◽

Unseen Data ◽

Out Of Sample ◽

Rigorous Framework ◽

Encoding Strategies

A large body of recent work has begun to explore the potential of parametrized quantum circuits (PQCs) as machine learning models, within the framework of hybrid quantum-classical optimization. In particular, theoretical guarantees on the out-of-sample performance of such models, in terms of generalization bounds, have emerged. However, none of these generalization bounds depend explicitly on how the classical input data is encoded into the PQC. We derive generalization bounds for PQC-based models that depend explicitly on the strategy used for data-encoding. These imply bounds on the performance of trained PQC-based models on unseen data. Moreover, our results facilitate the selection of optimal data-encoding strategies via structural risk minimization, a mathematically rigorous framework for model selection. We obtain our generalization bounds by bounding the complexity of PQC-based models as measured by the Rademacher complexity and the metric entropy, two complexity measures from statistical learning theory. To achieve this, we rely on a representation of PQC-based models via trigonometric functions. Our generalization bounds emphasize the importance of well-considered data-encoding strategies for PQC-based models.

Corrections to “Generalization Bounds via Information Density and Conditional Information Density”

IEEE Journal on Selected Areas in Information Theory ◽

10.1109/jsait.2021.3088240 ◽

2021 ◽

Vol 2 (3) ◽

pp. 1072-1073

Author(s):

Fredrik Hellstrom ◽

Giuseppe Durisi

Keyword(s):

Generalization Bounds ◽

Information Density ◽

Conditional Information

Stability and Generalization for Randomized Coordinate Descent

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/427 ◽

2021 ◽

Author(s):

Puyu Wang ◽

Liang Wu ◽

Yunwen Lei

Keyword(s):

Machine Learning ◽

Theoretical Analysis ◽

Optimization Algorithm ◽

Gradient Descent ◽

Coordinate Descent ◽

Learning Problems ◽

Stochastic Gradient Descent ◽

Strongly Convex ◽

Generalization Bounds ◽

Algorithmic Stability

Randomized coordinate descent (RCD) is a popular optimization algorithm with wide applications in various machine learning problems, which motivates a lot of theoretical analysis on its convergence behavior. As a comparison, there is no work studying how the models trained by RCD would generalize to test examples. In this paper, we initialize the generalization analysis of RCD by leveraging the powerful tool of algorithmic stability. We establish argument stability bounds of RCD for both convex and strongly convex objectives, from which we develop optimal generalization bounds by showing how to early-stop the algorithm to tradeoff the estimation and optimization. Our analysis shows that RCD enjoys better stability as compared to stochastic gradient descent.

Fine-grained Generalization Analysis of Structured Output Prediction

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/391 ◽

2021 ◽

Author(s):

Waleed Mustafa ◽

Yunwen Lei ◽

Antoine Ledent ◽

Marius Kloft

Keyword(s):

Language Processing ◽

Large Scale ◽

Fine Grained ◽

Generalization Bounds ◽

Probability Bounds ◽

Structured Output Prediction ◽

Algorithmic Stability ◽

Structured Output ◽

Weakly Dependent Data ◽

Prediction Problems

In machine learning we often encounter structured output prediction problems (SOPPs), i.e. problems where the output space admits a rich internal structure. Application domains where SOPPs naturally occur include natural language processing, speech recognition, and computer vision. Typical SOPPs have an extremely large label set, which grows exponentially as a function of the size of the output. Existing generalization analysis implies generalization bounds with at least a square-root dependency on the cardinality d of the label set, which can be vacuous in practice. In this paper, we significantly improve the state of the art by developing novel high-probability bounds with a logarithmic dependency on d. Furthermore, we leverage the lens of algorithmic stability to develop generalization bounds in expectation without any dependency on d. Our results therefore build a solid theoretical foundation for learning in large-scale SOPPs. Furthermore, we extend our results to learning with weakly dependent data.

Information Complexity and Generalization Bounds

10.1109/isit45174.2021.9517960 ◽

2021 ◽

Author(s):

Pradeep Kr. Banerjee ◽

Guido Montufar

Keyword(s):

Information Complexity ◽

Generalization Bounds

Coarse-refinement dilemma: on generalization bounds for data clustering

Expert Systems with Applications ◽

10.1016/j.eswa.2021.115399 ◽

2021 ◽

pp. 115399

Author(s):

Yule Vaz ◽

Rodrigo Fernandes de Mello ◽

Carlos Henrique Grossi Ferreira

Keyword(s):

Data Clustering ◽

Generalization Bounds

Theoretical Investigation of Generalization Bounds for Adversarial Learning of Deep Neural Networks

Journal of Statistical Theory and Practice ◽

10.1007/s42519-021-00171-6 ◽

2021 ◽

Vol 15 (2) ◽

Author(s):

Qingyi Gao ◽

Xiao Wang

Keyword(s):

Neural Networks ◽

Theoretical Investigation ◽

Deep Neural Networks ◽

Adversarial Learning ◽

Generalization Bounds

Generalization Bounds and Algorithms for Learning to Communicate over Additive Noise Channels

IEEE Transactions on Information Theory ◽

10.1109/tit.2021.3129080 ◽

2021 ◽

pp. 1-1

Author(s):

Nir Weinberger

Keyword(s):

Additive Noise ◽

Generalization Bounds

Compression-based Network Interpretability Schemes

10.1101/2020.10.27.358226 ◽

2020 ◽

Author(s):

Jonathan Warrell ◽

Hussein Mohsen ◽

Mark Gerstein

Keyword(s):

Trust Model ◽

Model Transformations ◽

Network Decomposition ◽

Informative Feature ◽

Generalization Bounds ◽

Knowledge Based ◽

Interpretable Model ◽

Post Hoc ◽

The Relationship

AbstractDeep learning methods have achieved state-of-the-art performance in many domains of artificial intelligence, but are typically hard to interpret. Network interpretation is important for multiple reasons, including knowledge discovery, hypothesis generation, fairness and establishing trust. Model transformations provide a general approach to interpreting a trained network post-hoc: the network is approximated by a model, which is typically compressed, whose structure can be more easily interpreted in some way (we call such approaches interpretability schemes). However, the relationship between compression and interpretation has not been fully explored: How much should a network be compressed for optimal extraction of interpretable information? Should compression be combined with other criteria when selecting model transformations? We investigate these issues using two different compression-based schemes, which aim to extract orthogonal kinds of information, pertaining to feature and data instance-based groupings respectively. The first (rank projection trees) uses a structured sparsification method such that nested groups of features can be extracted having potential joint interactions. The second (cascaded network decomposition) splits a network into a cascade of simpler networks, allowing groups of training instances with similar characteristics to be extracted at each stage of the cascade. We use predictive tasks in cancer and psychiatric genomics to assess the ability of these approaches to extract informative feature and data-point groupings from trained networks. We show that the generalization error of a network provides an indicator of the quality of the information extracted; further we derive PAC-Bayes generalization bounds for both schemes, which we show can be used as proxy indicators, and can thus provide a criterion for selecting the optimal compression. Finally, we show that the PAC-Bayes framework can be naturally modified to incorporate additional criteria alongside compression, such as prior knowledge based on previous models, which can enhance interpretable model selection.

generalization bounds
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Hausdorff dimension, heavy tails, and generalization in neural networks*

Encoding-dependent generalization bounds for parametrized quantum circuits

Corrections to “Generalization Bounds via Information Density and Conditional Information Density”

Stability and Generalization for Randomized Coordinate Descent

Fine-grained Generalization Analysis of Structured Output Prediction

Information Complexity and Generalization Bounds

Coarse-refinement dilemma: on generalization bounds for data clustering

Theoretical Investigation of Generalization Bounds for Adversarial Learning of Deep Neural Networks

Generalization Bounds and Algorithms for Learning to Communicate over Additive Noise Channels

Compression-based Network Interpretability Schemes

Export Citation Format

generalization boundsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Hausdorff dimension, heavy tails, and generalization in neural networks*

Encoding-dependent generalization bounds for parametrized quantum circuits

Corrections to “Generalization Bounds via Information Density and Conditional Information Density”

Stability and Generalization for Randomized Coordinate Descent

Fine-grained Generalization Analysis of Structured Output Prediction

Information Complexity and Generalization Bounds

Coarse-refinement dilemma: on generalization bounds for data clustering

Theoretical Investigation of Generalization Bounds for Adversarial Learning of Deep Neural Networks

Generalization Bounds and Algorithms for Learning to Communicate over Additive Noise Channels

Compression-based Network Interpretability Schemes

generalization bounds
Recently Published Documents