Statistical physics of neural networks

AbstractWe present a modelling framework for the investigation of supervised learning in non-stationary environments. Specifically, we model two example types of learning systems: prototype-based learning vector quantization (LVQ) for classification and shallow, layered neural networks for regression tasks. We investigate so-called student–teacher scenarios in which the systems are trained from a stream of high-dimensional, labeled data. Properties of the target task are considered to be non-stationary due to drift processes while the training is performed. Different types of concept drift are studied, which affect the density of example inputs only, the target rule itself, or both. By applying methods from statistical physics, we develop a modelling framework for the mathematical analysis of the training dynamics in non-stationary environments. Our results show that standard LVQ algorithms are already suitable for the training in non-stationary environments to a certain extent. However, the application of weight decay as an explicit mechanism of forgetting does not improve the performance under the considered drift processes. Furthermore, we investigate gradient-based training of layered neural networks with sigmoidal activation functions and compare with the use of rectified linear units. Our findings show that the sensitivity to concept drift and the effectiveness of weight decay differs significantly between the two types of activation function.

Download Full-text

The Statistical Physics of Learning Revisited: Typical Learning Curves in Model Scenarios

Lecture Notes in Computer Science - Brain-Inspired Computing ◽

10.1007/978-3-030-82427-3_10 ◽

2021 ◽

pp. 128-142

Author(s):

Michael Biehl

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Statistical Learning ◽

Statistical Physics ◽

Interdisciplinary Approach ◽

Learning Curves ◽

Basic Concepts ◽

Model Scenarios

AbstractThe exchange of ideas between computer science and statistical physics has advanced the understanding of machine learning and inference significantly. This interdisciplinary approach is currently regaining momentum due to the revived interest in neural networks and deep learning. Methods borrowed from statistical mechanics complement other approaches to the theory of computational and statistical learning. In this brief review, we outline and illustrate some of the basic concepts. We exemplify the role of the statistical physics approach in terms of a particularly important contribution: the computation of typical learning curves in student teacher scenarios of supervised learning. Two, by now classical examples from the literature illustrate the approach: the learning of a linearly separable rule by a perceptron with continuous and with discrete weights, respectively. We address these prototypical problems in terms of the simplifying limit of stochastic training at high formal temperature and obtain the corresponding learning curves.

Download Full-text

Generalisation error in learning with random features and the hidden manifold model*

Journal of Statistical Mechanics Theory and Experiment ◽

10.1088/1742-5468/ac3ae6 ◽

2021 ◽

Vol 2021 (12) ◽

pp. 124013

Author(s):

Federica Gerace ◽

Bruno Loureiro ◽

Florent Krzakala ◽

Marc Mézard ◽

Lenka Zdeborová

Keyword(s):

Neural Networks ◽

Logistic Regression ◽

Linear Regression ◽

Closed Form ◽

Linear Model ◽

Statistical Physics ◽

Loss Functions ◽

Closed Form Expression ◽

High Dimensional ◽

Form Expression

Abstract We study generalised linear regression and classification for a synthetically generated dataset encompassing different problems of interest, such as learning with random features, neural networks in the lazy training regime, and the hidden manifold model. We consider the high-dimensional regime and using the replica method from statistical physics, we provide a closed-form expression for the asymptotic generalisation performance in these problems, valid in both the under- and over-parametrised regimes and for a broad choice of generalised linear model loss functions. In particular, we show how to obtain analytically the so-called double descent behaviour for logistic regression with a peak at the interpolation threshold, we illustrate the superiority of orthogonal against random Gaussian projections in learning with random features, and discuss the role played by correlations in the data generated by the hidden manifold model. Beyond the interest in these particular problems, the theoretical formalism introduced in this manuscript provides a path to further extensions to more complex tasks.

Download Full-text