Statistical Mechanics of On-Line Learning Under Concept Drift

We introduce a modeling framework for the investigation of on-line machine learning processes in non-stationary environments. We exemplify the approach in terms of two specific model situations: In the first, we consider the learning of a classification scheme from clustered data by means of prototype-based Learning Vector Quantization (LVQ). In the second, we study the training of layered neural networks with sigmoidal activations for the purpose of regression. In both cases, the target, i.e., the classification or regression scheme, is considered to change continuously while the system is trained from a stream of labeled data. We extend and apply methods borrowed from statistical physics which have been used frequently for the exact description of training dynamics in stationary environments. Extensions of the approach allow for the computation of typical learning curves in the presence of concept drift in a variety of model situations. First results are presented and discussed for stochastic drift processes in classification and regression problems. They indicate that LVQ is capable of tracking a classification scheme under drift to a non-trivial extent. Furthermore, we show that concept drift can cause the persistence of sub-optimal plateau states in gradient based training of layered neural networks for regression.

Download Full-text

Statistical Mechanics of On-Line Learning Under Concept Drift

10.20944/preprints201809.0104.v1 ◽

2018 ◽

Author(s):

Michiel Straat ◽

Fthi Abadi ◽

Christina Göpfert ◽

Barbara Hammer ◽

Michael Biehl

Keyword(s):

Neural Networks ◽

Statistical Physics ◽

Classification Scheme ◽

Concept Drift ◽

Specific Model ◽

Learning Curves ◽

Modelling Framework ◽

First Results ◽

Gradient Based ◽

On Line

We introduce a modelling framework for the investigation of on-line machine learning processes in non-stationary environments. We exemplify the approach in terms of two specific model situations: In the first, we consider the learning of a classification scheme from clustered data by means of prototype-based Learning Vector Quantization (LVQ). In the second, we study the training of layered neural networks with sigmoidal activations for the purpose of regression. In both cases, the target, i.e. the classification or regression scheme, is considered to change continuously while the system is trained from a stream of labeled data. We extend and apply methods borrowed from statistical physics which have been used frequently for the exact description of training dynamics in stationary environments. Extensions of the approach allow for the computation of typical learning curves in the presence of concept drift in a variety of model situations. First results are presented and discussed for stochastic drift processes in classification and regression problems. They indicate that LVQ is capable of tracking a classification scheme under drift to a non-trivial extent. Furthermore, we show that concept drift can cause the persistence of sub-optimal plateau states in gradient based training of layered neural networks for regression.

Download Full-text

Supervised learning in the presence of concept drift: a modelling framework

Neural Computing and Applications ◽

10.1007/s00521-021-06035-1 ◽

2021 ◽

Author(s):

M. Straat ◽

F. Abadi ◽

Z. Kan ◽

C. Göpfert ◽

B. Hammer ◽

...

Keyword(s):

Neural Networks ◽

Supervised Learning ◽

Statistical Physics ◽

Concept Drift ◽

Activation Function ◽

High Dimensional ◽

Weight Decay ◽

Modelling Framework ◽

Different Types ◽

Gradient Based

AbstractWe present a modelling framework for the investigation of supervised learning in non-stationary environments. Specifically, we model two example types of learning systems: prototype-based learning vector quantization (LVQ) for classification and shallow, layered neural networks for regression tasks. We investigate so-called student–teacher scenarios in which the systems are trained from a stream of high-dimensional, labeled data. Properties of the target task are considered to be non-stationary due to drift processes while the training is performed. Different types of concept drift are studied, which affect the density of example inputs only, the target rule itself, or both. By applying methods from statistical physics, we develop a modelling framework for the mathematical analysis of the training dynamics in non-stationary environments. Our results show that standard LVQ algorithms are already suitable for the training in non-stationary environments to a certain extent. However, the application of weight decay as an explicit mechanism of forgetting does not improve the performance under the considered drift processes. Furthermore, we investigate gradient-based training of layered neural networks with sigmoidal activation functions and compare with the use of rectified linear units. Our findings show that the sensitivity to concept drift and the effectiveness of weight decay differs significantly between the two types of activation function.

Download Full-text

The Statistical Physics of Learning Revisited: Typical Learning Curves in Model Scenarios

Lecture Notes in Computer Science - Brain-Inspired Computing ◽

10.1007/978-3-030-82427-3_10 ◽

2021 ◽

pp. 128-142

Author(s):

Michael Biehl

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Statistical Learning ◽

Statistical Physics ◽

Interdisciplinary Approach ◽

Learning Curves ◽

Basic Concepts ◽

Model Scenarios

AbstractThe exchange of ideas between computer science and statistical physics has advanced the understanding of machine learning and inference significantly. This interdisciplinary approach is currently regaining momentum due to the revived interest in neural networks and deep learning. Methods borrowed from statistical mechanics complement other approaches to the theory of computational and statistical learning. In this brief review, we outline and illustrate some of the basic concepts. We exemplify the role of the statistical physics approach in terms of a particularly important contribution: the computation of typical learning curves in student teacher scenarios of supervised learning. Two, by now classical examples from the literature illustrate the approach: the learning of a linearly separable rule by a perceptron with continuous and with discrete weights, respectively. We address these prototypical problems in terms of the simplifying limit of stochastic training at high formal temperature and obtain the corresponding learning curves.

Download Full-text

Probabilistic Models with Deep Neural Networks

Entropy ◽

10.3390/e23010117 ◽

2021 ◽

Vol 23 (1) ◽

pp. 117

Author(s):

Andrés R. Masegosa ◽

Rafael Cabañas ◽

Helge Langseth ◽

Thomas D. Nielsen ◽

Antonio Salmerón

Keyword(s):

Neural Networks ◽

Learning Community ◽

Statistical Physics ◽

Deep Neural Networks ◽

Probabilistic Models ◽

Broad Class ◽

Probabilistic Inference ◽

Probabilistic Modeling ◽

Stochastic Gradient Descent ◽

Modeling Framework

Recent advances in statistical inference have significantly expanded the toolbox of probabilistic modeling. Historically, probabilistic modeling has been constrained to very restricted model classes, where exact or approximate probabilistic inference is feasible. However, developments in variational inference, a general form of approximate probabilistic inference that originated in statistical physics, have enabled probabilistic modeling to overcome these limitations: (i) Approximate probabilistic inference is now possible over a broad class of probabilistic models containing a large number of parameters, and (ii) scalable inference methods based on stochastic gradient descent and distributed computing engines allow probabilistic modeling to be applied to massive data sets. One important practical consequence of these advances is the possibility to include deep neural networks within probabilistic models, thereby capturing complex non-linear stochastic relationships between the random variables. These advances, in conjunction with the release of novel probabilistic modeling toolboxes, have greatly expanded the scope of applications of probabilistic models, and allowed the models to take advantage of the recent strides made by the deep learning community. In this paper, we provide an overview of the main concepts, methods, and tools needed to use deep neural networks within a probabilistic modeling framework.

Download Full-text

Task-space regulation of rigid-link electrically-driven robots with uncertain kinematics using neural networks

Measurement and Control ◽

10.1177/0020294020983383 ◽

2021 ◽

Vol 54 (1-2) ◽

pp. 102-115

Author(s):

Wenhui Si ◽

Lingyan Zhao ◽

Jianping Wei ◽

Zhiguang Guan

Keyword(s):

Neural Networks ◽

Asymptotic Stability ◽

Control Method ◽

Joint Space ◽

Robot Kinematics ◽

Rbf Neural Networks ◽

Task Space ◽

Actuator Dynamics ◽

Rigid Link ◽

On Line

Extensive research efforts have been made to address the motion control of rigid-link electrically-driven (RLED) robots in literature. However, most existing results were designed in joint space and need to be converted to task space as more and more control tasks are defined in their operational space. In this work, the direct task-space regulation of RLED robots with uncertain kinematics is studied by using neural networks (NN) technique. Radial basis function (RBF) neural networks are used to estimate complicated and calibration heavy robot kinematics and dynamics. The NN weights are updated on-line through two adaptation laws without the necessity of off-line training. Compared with most existing NN-based robot control results, the novelty of the proposed method lies in that asymptotic stability of the overall system can be achieved instead of just uniformly ultimately bounded (UUB) stability. Moreover, the proposed control method can tolerate not only the actuator dynamics uncertainty but also the uncertainty in robot kinematics by adopting an adaptive Jacobian matrix. The asymptotic stability of the overall system is proven rigorously through Lyapunov analysis. Numerical studies have been carried out to verify efficiency of the proposed method.

Download Full-text

Improving Adversarial Attacks on Deep Neural Networks via Constricted Gradient-based Perturbations

Information Sciences ◽

10.1016/j.ins.2021.04.033 ◽

2021 ◽

Author(s):

Yatie Xiao ◽

Chi-Man Pun

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Gradient Based

Download Full-text

Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks

Nature Communications ◽

10.1038/s41467-021-23103-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Abdulkadir Canatar ◽

Blake Bordelon ◽

Cengiz Pehlevan

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Kernel Regression ◽

Learning Task ◽

Learning Curves ◽

Generalization Error ◽

Theoretical Understanding ◽

Classical Statistics ◽

Deep Networks ◽

Model Alignment

AbstractA theoretical understanding of generalization remains an open problem for many machine learning models, including deep networks where overparameterization leads to better performance, contradicting the conventional wisdom from classical statistics. Here, we investigate generalization error for kernel regression, which, besides being a popular machine learning method, also describes certain infinitely overparameterized neural networks. We use techniques from statistical mechanics to derive an analytical expression for generalization error applicable to any kernel and data distribution. We present applications of our theory to real and synthetic datasets, and for many kernels including those that arise from training deep networks in the infinite-width limit. We elucidate an inductive bias of kernel regression to explain data with simple functions, characterize whether a kernel is compatible with a learning task, and show that more data may impair generalization when noisy or not expressible by the kernel, leading to non-monotonic learning curves with possibly many peaks.

Download Full-text