Supervised learning in the presence of concept drift: a modelling framework

Neural Computing and Applications ◽

10.1007/s00521-021-06035-1 ◽

2021 ◽

Author(s):

M. Straat ◽

F. Abadi ◽

Z. Kan ◽

C. Göpfert ◽

B. Hammer ◽

...

Keyword(s):

Neural Networks ◽

Supervised Learning ◽

Statistical Physics ◽

Concept Drift ◽

Activation Function ◽

High Dimensional ◽

Weight Decay ◽

Modelling Framework ◽

Different Types ◽

Gradient Based

AbstractWe present a modelling framework for the investigation of supervised learning in non-stationary environments. Specifically, we model two example types of learning systems: prototype-based learning vector quantization (LVQ) for classification and shallow, layered neural networks for regression tasks. We investigate so-called student–teacher scenarios in which the systems are trained from a stream of high-dimensional, labeled data. Properties of the target task are considered to be non-stationary due to drift processes while the training is performed. Different types of concept drift are studied, which affect the density of example inputs only, the target rule itself, or both. By applying methods from statistical physics, we develop a modelling framework for the mathematical analysis of the training dynamics in non-stationary environments. Our results show that standard LVQ algorithms are already suitable for the training in non-stationary environments to a certain extent. However, the application of weight decay as an explicit mechanism of forgetting does not improve the performance under the considered drift processes. Furthermore, we investigate gradient-based training of layered neural networks with sigmoidal activation functions and compare with the use of rectified linear units. Our findings show that the sensitivity to concept drift and the effectiveness of weight decay differs significantly between the two types of activation function.

Download Full-text

Statistical Mechanics of On-Line Learning Under Concept Drift

10.20944/preprints201809.0104.v1 ◽

2018 ◽

Author(s):

Michiel Straat ◽

Fthi Abadi ◽

Christina Göpfert ◽

Barbara Hammer ◽

Michael Biehl

Keyword(s):

Neural Networks ◽

Statistical Physics ◽

Classification Scheme ◽

Concept Drift ◽

Specific Model ◽

Learning Curves ◽

Modelling Framework ◽

First Results ◽

Gradient Based ◽

On Line

We introduce a modelling framework for the investigation of on-line machine learning processes in non-stationary environments. We exemplify the approach in terms of two specific model situations: In the first, we consider the learning of a classification scheme from clustered data by means of prototype-based Learning Vector Quantization (LVQ). In the second, we study the training of layered neural networks with sigmoidal activations for the purpose of regression. In both cases, the target, i.e. the classification or regression scheme, is considered to change continuously while the system is trained from a stream of labeled data. We extend and apply methods borrowed from statistical physics which have been used frequently for the exact description of training dynamics in stationary environments. Extensions of the approach allow for the computation of typical learning curves in the presence of concept drift in a variety of model situations. First results are presented and discussed for stochastic drift processes in classification and regression problems. They indicate that LVQ is capable of tracking a classification scheme under drift to a non-trivial extent. Furthermore, we show that concept drift can cause the persistence of sub-optimal plateau states in gradient based training of layered neural networks for regression.

Download Full-text

Statistical Mechanics of On-Line Learning Under Concept Drift

Entropy ◽

10.3390/e20100775 ◽

2018 ◽

Vol 20 (10) ◽

pp. 775 ◽

Cited By ~ 4

Author(s):

Michiel Straat ◽

Fthi Abadi ◽

Christina Göpfert ◽

Barbara Hammer ◽

Michael Biehl

Keyword(s):

Neural Networks ◽

Statistical Physics ◽

Classification Scheme ◽

Concept Drift ◽

Specific Model ◽

Learning Curves ◽

Modeling Framework ◽

First Results ◽

Gradient Based ◽

On Line

We introduce a modeling framework for the investigation of on-line machine learning processes in non-stationary environments. We exemplify the approach in terms of two specific model situations: In the first, we consider the learning of a classification scheme from clustered data by means of prototype-based Learning Vector Quantization (LVQ). In the second, we study the training of layered neural networks with sigmoidal activations for the purpose of regression. In both cases, the target, i.e., the classification or regression scheme, is considered to change continuously while the system is trained from a stream of labeled data. We extend and apply methods borrowed from statistical physics which have been used frequently for the exact description of training dynamics in stationary environments. Extensions of the approach allow for the computation of typical learning curves in the presence of concept drift in a variety of model situations. First results are presented and discussed for stochastic drift processes in classification and regression problems. They indicate that LVQ is capable of tracking a classification scheme under drift to a non-trivial extent. Furthermore, we show that concept drift can cause the persistence of sub-optimal plateau states in gradient based training of layered neural networks for regression.

Download Full-text

Efficient approximation of solutions of parametric linear transport equations by ReLU DNNs

Advances in Computational Mathematics ◽

10.1007/s10444-020-09834-7 ◽

2021 ◽

Vol 47 (1) ◽

Author(s):

Fabian Laakmann ◽

Philipp Petersen

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Initial Conditions ◽

Activation Function ◽

Transport Equations ◽

High Dimensional ◽

Linear Transport ◽

Approximation Rates ◽

Curse Of Dimension ◽

Efficient Approximation

AbstractWe demonstrate that deep neural networks with the ReLU activation function can efficiently approximate the solutions of various types of parametric linear transport equations. For non-smooth initial conditions, the solutions of these PDEs are high-dimensional and non-smooth. Therefore, approximation of these functions suffers from a curse of dimension. We demonstrate that through their inherent compositionality deep neural networks can resolve the characteristic flow underlying the transport equations and thereby allow approximation rates independent of the parameter dimension.

Download Full-text

SSTDP: Supervised Spike Timing Dependent Plasticity for Efficient Spiking Neural Network Training

Frontiers in Neuroscience ◽

10.3389/fnins.2021.756876 ◽

2021 ◽

Vol 15 ◽

Author(s):

Fangxin Liu ◽

Wenbo Zhao ◽

Yongbiao Chen ◽

Zongwu Wang ◽

Tao Yang ◽

...

Keyword(s):

Neural Networks ◽

Learning Algorithm ◽

Activation Function ◽

Spike Timing ◽

Extraction Property ◽

Dependent Plasticity ◽

Neuromorphic Hardware ◽

Gradient Based ◽

Spatio Temporal ◽

Spike Time Dependent Plasticity

Spiking Neural Networks (SNNs) are a pathway that could potentially empower low-power event-driven neuromorphic hardware due to their spatio-temporal information processing capability and high biological plausibility. Although SNNs are currently more efficient than artificial neural networks (ANNs), they are not as accurate as ANNs. Error backpropagation is the most common method for directly training neural networks, promoting the prosperity of ANNs in various deep learning fields. However, since the signals transmitted in the SNN are non-differentiable discrete binary spike events, the activation function in the form of spikes presents difficulties for the gradient-based optimization algorithms to be directly applied in SNNs, leading to a performance gap (i.e., accuracy and latency) between SNNs and ANNs. This paper introduces a new learning algorithm, called SSTDP, which bridges the gap between backpropagation (BP)-based learning and spike-time-dependent plasticity (STDP)-based learning to train SNNs efficiently. The scheme incorporates the global optimization process from BP and the efficient weight update derived from STDP. It not only avoids the non-differentiable derivation in the BP process but also utilizes the local feature extraction property of STDP. Consequently, our method can lower the possibility of vanishing spikes in BP training and reduce the number of time steps to reduce network latency. In SSTDP, we employ temporal-based coding and use Integrate-and-Fire (IF) neuron as the neuron model to provide considerable computational benefits. Our experiments show the effectiveness of the proposed SSTDP learning algorithm on the SNN by achieving the best classification accuracy 99.3% on the Caltech 101 dataset, 98.1% on the MNIST dataset, and 91.3% on the CIFAR-10 dataset compared to other SNNs trained with other learning methods. It also surpasses the best inference accuracy of the directly trained SNN with 25~32× less inference latency. Moreover, we analyze event-based computations to demonstrate the efficacy of the SNN for inference operation in the spiking domain, and SSTDP methods can achieve 1.3~37.7× fewer addition operations per inference. The code is available at: https://github.com/MXHX7199/SNN-SSTDP.

Download Full-text

Generalisation error in learning with random features and the hidden manifold model*

Journal of Statistical Mechanics Theory and Experiment ◽

10.1088/1742-5468/ac3ae6 ◽

2021 ◽

Vol 2021 (12) ◽

pp. 124013

Author(s):

Federica Gerace ◽

Bruno Loureiro ◽

Florent Krzakala ◽

Marc Mézard ◽

Lenka Zdeborová

Keyword(s):

Neural Networks ◽

Logistic Regression ◽

Linear Regression ◽

Closed Form ◽

Linear Model ◽

Statistical Physics ◽

Loss Functions ◽

Closed Form Expression ◽

High Dimensional ◽

Form Expression

Abstract We study generalised linear regression and classification for a synthetically generated dataset encompassing different problems of interest, such as learning with random features, neural networks in the lazy training regime, and the hidden manifold model. We consider the high-dimensional regime and using the replica method from statistical physics, we provide a closed-form expression for the asymptotic generalisation performance in these problems, valid in both the under- and over-parametrised regimes and for a broad choice of generalised linear model loss functions. In particular, we show how to obtain analytically the so-called double descent behaviour for logistic regression with a peak at the interpolation threshold, we illustrate the superiority of orthogonal against random Gaussian projections in learning with random features, and discuss the role played by correlations in the data generated by the hidden manifold model. Beyond the interest in these particular problems, the theoretical formalism introduced in this manuscript provides a path to further extensions to more complex tasks.

Download Full-text

Absum: Simple Regularization Method for Reducing Structural Sensitivity of Convolutional Neural Networks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5865 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4394-4403

Author(s):

Sekitoshi Kanai ◽

Yasutoshi Ida ◽

Yasuhiro Fujiwara ◽

Masanori Yamada ◽

Shuichi Adachi

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Regularization Method ◽

Regularization Methods ◽

Frequency Noise ◽

Weight Decay ◽

Structural Sensitivity ◽

Convolution Filter ◽

Gradient Based ◽

Standard Regularization

We propose Absum, which is a regularization method for improving adversarial robustness of convolutional neural networks (CNNs). Although CNNs can accurately recognize images, recent studies have shown that the convolution operations in CNNs commonly have structural sensitivity to specific noise composed of Fourier basis functions. By exploiting this sensitivity, they proposed a simple black-box adversarial attack: Single Fourier attack. To reduce structural sensitivity, we can use regularization of convolution filter weights since the sensitivity of linear transform can be assessed by the norm of the weights. However, standard regularization methods can prevent minimization of the loss function because they impose a tight constraint for obtaining high robustness. To solve this problem, Absum imposes a loose constraint; it penalizes the absolute values of the summation of the parameters in the convolution layers. Absum can improve robustness against single Fourier attack while being as simple and efficient as standard regularization methods (e.g., weight decay and L1 regularization). Our experiments demonstrate that Absum improves robustness against single Fourier attack more than standard regularization methods. Furthermore, we reveal that robust CNNs with Absum are more robust against transferred attacks due to decreasing the common sensitivity and against high-frequency noise than standard regularization methods. We also reveal that Absum can improve robustness against gradient-based attacks (projected gradient descent) when used with adversarial training.

Download Full-text

SCORING MODELING BASED ON NEURAL NETWORKS FOR DETERMINING A BANK BORROWER'S RATING

Economy of Ukraine ◽

10.15407/economyukr.2020.10.054 ◽

2020 ◽

Vol 2020 (10) ◽

pp. 54-62

Author(s):

Oleksii VASYLIEV ◽

Keyword(s):

Neural Network ◽

Neural Networks ◽

Network Architecture ◽

Statistical Data ◽

Activation Function ◽

Decision Making Process ◽

Neural Network Architecture ◽

Acceptable Accuracy ◽

The Neural Network ◽

Sigmoid Activation Function

The problem of applying neural networks to calculate ratings used in banking in the decision-making process on granting or not granting loans to borrowers is considered. The task is to determine the rating function of the borrower based on a set of statistical data on the effectiveness of loans provided by the bank. When constructing a regression model to calculate the rating function, it is necessary to know its general form. If so, the task is to calculate the parameters that are included in the expression for the rating function. In contrast to this approach, in the case of using neural networks, there is no need to specify the general form for the rating function. Instead, certain neural network architecture is chosen and parameters are calculated for it on the basis of statistical data. Importantly, the same neural network architecture can be used to process different sets of statistical data. The disadvantages of using neural networks include the need to calculate a large number of parameters. There is also no universal algorithm that would determine the optimal neural network architecture. As an example of the use of neural networks to determine the borrower's rating, a model system is considered, in which the borrower's rating is determined by a known non-analytical rating function. A neural network with two inner layers, which contain, respectively, three and two neurons and have a sigmoid activation function, is used for modeling. It is shown that the use of the neural network allows restoring the borrower's rating function with quite acceptable accuracy.

Download Full-text

Squeak and rattle noise classification using radial basis function neural networks

Noise Control Engineering Journal ◽

10.3397/1/376824 ◽

2020 ◽

Vol 68 (4) ◽

pp. 283-293

Author(s):

Oleksandr Pogorilyi ◽

Mohammad Fard ◽

John Davy ◽

Mechanical and Automotive Engineering, School ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

High Accuracy ◽

Training Method ◽

Vehicle Interior ◽

Trained Classifier ◽

Different Types ◽

Noise Classification ◽

Automatic Tool ◽

Multi Class Classification

In this article, an artificial neural network is proposed to classify short audio sequences of squeak and rattle (S&R) noises. The aim of the classification is to see how accurately the trained classifier can recognize different types of S&R sounds. Having a high accuracy model that can recognize audible S&R noises could help to build an automatic tool able to identify unpleasant vehicle interior sounds in a matter of seconds from a short audio recording of the sounds. In this article, the training method of the classifier is proposed, and the results show that the trained model can identify various classes of S&R noises: simple (binary clas- sification) and complex ones (multi class classification).

Download Full-text

The New Activation Function for Complex Valued Neural Networks: Complex Swish Function

4th International Symposium on Innovative Approaches in Engineering and Natural Sciences Proceedings ◽

10.36287/setsci.4.6.050 ◽

2019 ◽

Author(s):

Mehmet Çelebi ◽

Murat Ceylan

Keyword(s):

Neural Networks ◽

Activation Function ◽

Complex Valued

Download Full-text

Analysis of Non-Linear Activation Functions for Classification Tasks Using Convolutional Neural Networks

Recent Patents on Computer Science ◽

10.2174/2213275911666181025143029 ◽

2019 ◽

Vol 12 (3) ◽

pp. 156-161 ◽

Cited By ~ 3

Author(s):

Aman Dureja ◽

Payal Pahwa

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Activation Function ◽

Primary Objective ◽

Experimental Comparison ◽

Activation Functions ◽

Practical Applications ◽

Network Activation ◽

Non Linear ◽

Hidden Layer

Background: In making the deep neural network, activation functions play an important role. But the choice of activation functions also affects the network in term of optimization and to retrieve the better results. Several activation functions have been introduced in machine learning for many practical applications. But which activation function should use at hidden layer of deep neural networks was not identified. Objective: The primary objective of this analysis was to describe which activation function must be used at hidden layers for deep neural networks to solve complex non-linear problems. Methods: The configuration for this comparative model was used by using the datasets of 2 classes (Cat/Dog). The number of Convolutional layer used in this network was 3 and the pooling layer was also introduced after each layer of CNN layer. The total of the dataset was divided into the two parts. The first 8000 images were mainly used for training the network and the next 2000 images were used for testing the network. Results: The experimental comparison was done by analyzing the network by taking different activation functions on each layer of CNN network. The validation error and accuracy on Cat/Dog dataset were analyzed using activation functions (ReLU, Tanh, Selu, PRelu, Elu) at number of hidden layers. Overall the Relu gave best performance with the validation loss at 25th Epoch 0.3912 and validation accuracy at 25th Epoch 0.8320. Conclusion: It is found that a CNN model with ReLU hidden layers (3 hidden layers here) gives best results and improve overall performance better in term of accuracy and speed. These advantages of ReLU in CNN at number of hidden layers are helpful to effectively and fast retrieval of images from the databases.

Download Full-text