Two adaptive stepsize rules for gradient descent and their application to the training of feedforward artificial neural networks

Author(s):  
M. Mohandes ◽  
C.W. Codrington ◽  
S.B. Gelfand
2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Ximing Li ◽  
Luna Rizik ◽  
Valeriia Kravchik ◽  
Maria Khoury ◽  
Netanel Korin ◽  
...  

AbstractComplex biological systems in nature comprise cells that act collectively to solve sophisticated tasks. Synthetic biological systems, in contrast, are designed for specific tasks, following computational principles including logic gates and analog design. Yet such approaches cannot be easily adapted for multiple tasks in biological contexts. Alternatively, artificial neural networks, comprised of flexible interactions for computation, support adaptive designs and are adopted for diverse applications. Here, motivated by the structural similarity between artificial neural networks and cellular networks, we implement neural-like computing in bacteria consortia for recognizing patterns. Specifically, receiver bacteria collectively interact with sender bacteria for decision-making through quorum sensing. Input patterns formed by chemical inducers activate senders to produce signaling molecules at varying levels. These levels, which act as weights, are programmed by tuning the sender promoter strength Furthermore, a gradient descent based algorithm that enables weights optimization was developed. Weights were experimentally examined for recognizing 3 × 3-bit pattern.


2021 ◽  
Author(s):  
Ruthvik Vaila

Spiking neural networks are biologically plausible counterparts of artificial neural networks. Artificial neural networks are usually trained with stochastic gradient descent (SGD) and spiking neural networks are trained with bioinspired spike timing dependent plasticity (STDP). Spiking networks could potentially help in reducing power usage owing to their binary activations. In this work, we use unsupervised STDP in the feature extraction layers of a neural network with instantaneous neurons to extract meaningful features. The extracted binary feature vectors are then classified using classification layers containing neurons with binary activations. Gradient descent (backpropagation) is used only on the output layer to perform training for classification. Surrogate gradients are proposed to perform backpropagation with binary gradients. The accuracies obtained for MNIST and the balanced EMNIST data set compare favorably with other approaches. The effect of the stochastic gradient descent (SGD) approximations on learning capabilities of our network are also explored. We also studied catastrophic forgetting and its effect on spiking neural networks (SNNs). For the experiments regarding catastrophic forgetting, in the classification sections of the network we use a modified synaptic intelligence that we refer to as cost per synapse metric as a regularizer to immunize the network against catastrophic forgetting in a Single-Incremental-Task scenario (SIT). In catastrophic forgetting experiments, we use MNIST and EMNIST handwritten digits datasets that were divided into five and ten incremental subtasks respectively. We also examine behavior of the spiking neural network and empirically study the effect of various hyperparameters on its learning capabilities using the software tool SPYKEFLOW that we developed. We employ MNIST, EMNIST and NMNIST data sets to produce our results.


2021 ◽  
Vol 7 ◽  
pp. e429
Author(s):  
Yuri Antonacci ◽  
Ludovico Minati ◽  
Luca Faes ◽  
Riccardo Pernice ◽  
Giandomenico Nollo ◽  
...  

One of the most challenging problems in the study of complex dynamical systems is to find the statistical interdependencies among the system components. Granger causality (GC) represents one of the most employed approaches, based on modeling the system dynamics with a linear vector autoregressive (VAR) model and on evaluating the information flow between two processes in terms of prediction error variances. In its most advanced setting, GC analysis is performed through a state-space (SS) representation of the VAR model that allows to compute both conditional and unconditional forms of GC by solving only one regression problem. While this problem is typically solved through Ordinary Least Square (OLS) estimation, a viable alternative is to use Artificial Neural Networks (ANNs) implemented in a simple structure with one input and one output layer and trained in a way such that the weights matrix corresponds to the matrix of VAR parameters. In this work, we introduce an ANN combined with SS models for the computation of GC. The ANN is trained through the Stochastic Gradient Descent L1 (SGD-L1) algorithm, and a cumulative penalty inspired from penalized regression is applied to the network weights to encourage sparsity. Simulating networks of coupled Gaussian systems, we show how the combination of ANNs and SGD-L1 allows to mitigate the strong reduction in accuracy of OLS identification in settings of low ratio between number of time series points and of VAR parameters. We also report how the performances in GC estimation are influenced by the number of iterations of gradient descent and by the learning rate used for training the ANN. We recommend using some specific combinations for these parameters to optimize the performance of GC estimation. Then, the performances of ANN and OLS are compared in terms of GC magnitude and statistical significance to highlight the potential of the new approach to reconstruct causal coupling strength and network topology even in challenging conditions of data paucity. The results highlight the importance of of a proper selection of regularization parameter which determines the degree of sparsity in the estimated network. Furthermore, we apply the two approaches to real data scenarios, to study the physiological network of brain and peripheral interactions in humans under different conditions of rest and mental stress, and the effects of the newly emerged concept of remote synchronization on the information exchanged in a ring of electronic oscillators. The results highlight how ANNs provide a mesoscopic description of the information exchanged in networks of multiple interacting physiological systems, preserving the most active causal interactions between cardiovascular, respiratory and brain systems. Moreover, ANNs can reconstruct the flow of directed information in a ring of oscillators whose statistical properties can be related to those of physiological networks.


2017 ◽  
Vol 28 (5) ◽  
pp. 893-903 ◽  
Author(s):  
S. Sankar Ganesh ◽  
Pachaiyappan Arulmozhivarman ◽  
Rao Tatavarti

Abstract Air is the most essential constituent for the sustenance of life on earth. The air we inhale has a tremendous impact on our health and well-being. Hence, it is always advisable to monitor the quality of air in our environment. To forecast the air quality index (AQI), artificial neural networks (ANNs) trained with conjugate gradient descent (CGD), such as multilayer perceptron (MLP), cascade forward neural network, Elman neural network, radial basis function (RBF) neural network, and nonlinear autoregressive model with exogenous input (NARX) along with regression models such as multiple linear regression (MLR) consisting of batch gradient descent (BGD), stochastic gradient descent (SGD), mini-BGD (MBGD) and CGD algorithms, and support vector regression (SVR), are implemented. In these models, the AQI is the dependent variable and the concentrations of NO2, CO, O3, PM2.5, SO2, and PM10 for the years 2010–2016 in Houston and Los Angeles are the independent variables. For the final forecast, several ensemble models of individual neural network predictors and individual regression predictors are presented. This proposed approach performs with the highest efficiency in terms of forecasting air quality index.


Sign in / Sign up

Export Citation Format

Share Document