Asymptotics of Reinforcement Learning with Neural Networks

Stochastic Systems ◽

10.1287/stsy.2021.0072 ◽

2021 ◽

Author(s):

Justin Sirignano ◽

Konstantinos Spiliopoulos

Keyword(s):

Differential Equation ◽

Neural Networks ◽

Stationary Solution ◽

Gradient Descent ◽

Learning Algorithm ◽

Single Layer ◽

Stochastic Gradient Descent ◽

Distributed Data ◽

Limiting Behavior ◽

Q Learning

We prove that a single-layer neural network trained with the Q-learning algorithm converges in distribution to a random ordinary differential equation as the size of the model and the number of training steps become large. Analysis of the limit differential equation shows that it has a unique stationary solution that is the solution of the Bellman equation, thus giving the optimal control for the problem. In addition, we study the convergence of the limit differential equation to the stationary solution. As a by-product of our analysis, we obtain the limiting behavior of single-layer neural networks when trained on independent and identically distributed data with stochastic gradient descent under the widely used Xavier initialization.

Download Full-text

Cognitive Electronic Jamming Decision-Making Method Based on Improved Q -Learning Algorithm

International Journal of Aerospace Engineering ◽

10.1155/2021/8647386 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Huiqin Li ◽

Yanling Li ◽

Chuan He ◽

Jianwei Zhan ◽

Hui Zhang

Keyword(s):

Decision Making ◽

Gradient Descent ◽

Learning Algorithm ◽

Learning Rate ◽

Stochastic Gradient Descent ◽

Local Optima ◽

Q Learning ◽

Exploration Strategy ◽

Decision Making Model ◽

Metropolis Criterion

In this paper, a cognitive electronic jamming decision-making method based on improved Q -learning is proposed to improve the efficiency of radar jamming decision-making. First, the method adopts the simulated annealing (SA) algorithm’s Metropolis criterion to enhance the exploration strategy, balancing the contradictory relationship between exploration and utilization in the algorithm to avoid falling into local optima. At the same time, the idea of stochastic gradient descent with warm restarts (SGDR) is introduced to improve the learning rate of the algorithm, which reduces the oscillation and improves convergence speed at the later stage of the algorithm iteration. Then, a cognitive electronic jamming decision-making model is constructed, and the improved Q -learning algorithm’s specific steps are given. The simulation experiment takes a multifunctional radar as an example to analyze the influence of exploration strategy and learning rate on decision-making performance. The results reveal that compared with the traditional Q -learning algorithm, the improved Q -learning algorithm proposed in this paper can fully explore and efficiently utilize and converge the results to a better solution at a faster speed. The number of iterations can be reduced to more than 50%, which proves the feasibility and effectiveness of the method applied to cognitive electronic jamming decision-making.

Download Full-text

Analysis of Q-learning with Adaptation and Momentum Restart for Gradient Descent

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/422 ◽

2020 ◽

Author(s):

Bowen Weng ◽

Huaqing Xiong ◽

Yingbin Liang ◽

Wei Zhang

Keyword(s):

Convergence Rate ◽

Gradient Descent ◽

Learning Algorithm ◽

Learning Algorithms ◽

Linear Quadratic Regulator ◽

Stochastic Gradient Descent ◽

Learning Method ◽

Linear Quadratic ◽

Q Learning ◽

Moment Estimation

Existing convergence analyses of Q-learning mostly focus on the vanilla stochastic gradient descent (SGD) type of updates. Despite the Adaptive Moment Estimation (Adam) has been commonly used for practical Q-learning algorithms, there has not been any convergence guarantee provided for Q-learning with such type of updates. In this paper, we first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update (a commonly adopted alternative of Adam for theoretical analysis). To further improve the performance, we propose to incorporate the momentum restart scheme to Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm. The convergence rate of Q-AMSGradR is also established. Our experiments on a linear quadratic regulator problem demonstrate that the two proposed Q-learning algorithms outperform the vanilla Q-learning with SGD updates. The two algorithms also exhibit significantly better performance than the DQN learning method over a batch of Atari 2600 games.

Download Full-text

Fractional Stochastic Gradient Descent Based Learning Algorithm For Multi-layer Perceptron Neural Networks

10.1109/icias49414.2021.9642687 ◽

2021 ◽

Author(s):

Alishba Sadiq ◽

Norashikin Yahya

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Learning Algorithm ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Multi Layer Perceptron

Download Full-text

Identify PMSM's Parameters by Single-Layer Neural Networks with Gradient Descent

2010 International Conference on Electrical and Control Engineering ◽

10.1109/icece.2010.930 ◽

2010 ◽

Cited By ~ 2

Author(s):

Wang Shaowei ◽

Wan Shanming

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Single Layer

Download Full-text

Optical Recognition of Handwritten Logic Formulas Using Neural Networks

Electronics ◽

10.3390/electronics10222761 ◽

2021 ◽

Vol 10 (22) ◽

pp. 2761

Author(s):

Vaios Ampelakiotis ◽

Isidoros Perikos ◽

Ioannis Hatzilygeroudis ◽

George Tsihrintzis

Keyword(s):

Neural Networks ◽

Character Recognition ◽

Gradient Descent ◽

Feedforward Neural Networks ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Training Algorithms ◽

Gradient Descent Algorithm ◽

Two Stages ◽

And Training

In this paper, we present a handwritten character recognition (HCR) system that aims to recognize first-order logic handwritten formulas and create editable text files of the recognized formulas. Dense feedforward neural networks (NNs) are utilized, and their performance is examined under various training conditions and methods. More specifically, after three training algorithms (backpropagation, resilient propagation and stochastic gradient descent) had been tested, we created and trained an NN with the stochastic gradient descent algorithm, optimized by the Adam update rule, which was proved to be the best, using a trainset of 16,750 handwritten image samples of 28 × 28 each and a testset of 7947 samples. The final accuracy achieved is 90.13%. The general methodology followed consists of two stages: the image processing and the NN design and training. Finally, an application has been created that implements the methodology and automatically recognizes handwritten logic formulas. An interesting feature of the application is that it allows for creating new, user-oriented training sets and parameter settings, and thus new NN models.

Download Full-text

A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization

Stochastic Systems ◽

10.1287/stsy.2021.0083 ◽

2021 ◽

Author(s):

Tianyi Liu ◽

Zhehui Chen ◽

Enlu Zhou ◽

Tuo Zhao

Keyword(s):

Neural Networks ◽

Nonconvex Optimization ◽

Gradient Descent ◽

Deep Neural Networks ◽

Optimization Problems ◽

Saddle Points ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Nonconvex Optimization Problems ◽

Empirical Success

Momentum stochastic gradient descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning (e.g., training deep neural networks, variational Bayesian inference, etc.). Despite its empirical success, there is still a lack of theoretical understanding of convergence properties of MSGD. To fill this gap, we propose to analyze the algorithmic behavior of MSGD by diffusion approximations for nonconvex optimization problems with strict saddle points and isolated local optima. Our study shows that the momentum helps escape from saddle points but hurts the convergence within the neighborhood of optima (if without the step size annealing or momentum annealing). Our theoretical discovery partially corroborates the empirical success of MSGD in training deep neural networks.

Download Full-text

Layer-Wise Compressive Training for Convolutional Neural Networks

Future Internet ◽

10.3390/fi11010007 ◽

2018 ◽

Vol 11 (1) ◽

pp. 7 ◽

Cited By ~ 3

Author(s):

Matteo Grimaldi ◽

Valerio Tenace ◽

Andrea Calimera

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Gradient Descent ◽

Computational Models ◽

Stochastic Gradient Descent ◽

Training Algorithm ◽

Heuristic Rules ◽

Human Capabilities ◽

Model Size ◽

Large Model

Convolutional Neural Networks (CNNs) are brain-inspired computational models designed to recognize patterns. Recent advances demonstrate that CNNs are able to achieve, and often exceed, human capabilities in many application domains. Made of several millions of parameters, even the simplest CNN shows large model size. This characteristic is a serious concern for the deployment on resource-constrained embedded-systems, where compression stages are needed to meet the stringent hardware constraints. In this paper, we introduce a novel accuracy-driven compressive training algorithm. It consists of a two-stage flow: first, layers are sorted by means of heuristic rules according to their significance; second, a modified stochastic gradient descent optimization is applied on less significant layers such that their representation is collapsed into a constrained subspace. Experimental results demonstrate that our approach achieves remarkable compression rates with low accuracy loss (<1%).

Download Full-text

Variational Information Bottleneck for Unsupervised Clustering: Deep Gaussian Mixture Embedding

Entropy ◽

10.3390/e22020213 ◽

2020 ◽

Vol 22 (2) ◽

pp. 213 ◽

Cited By ~ 1

Author(s):

Yiğit Uğur ◽

George Arvanitakis ◽

Abdellatif Zaidi

Keyword(s):

Neural Networks ◽

Lower Bound ◽

Gradient Descent ◽

Gaussian Mixture ◽

Variational Inference ◽

Stochastic Gradient Descent ◽

Information Bottleneck ◽

Latent Space ◽

Type Algorithm ◽

The Cost

In this paper, we develop an unsupervised generative clustering framework that combines the variational information bottleneck and the Gaussian mixture model. Specifically, in our approach, we use the variational information bottleneck method and model the latent space as a mixture of Gaussians. We derive a bound on the cost function of our model that generalizes the Evidence Lower Bound (ELBO) and provide a variational inference type algorithm that allows computing it. In the algorithm, the coders’ mappings are parametrized using neural networks, and the bound is approximated by Markov sampling and optimized with stochastic gradient descent. Numerical results on real datasets are provided to support the efficiency of our method.

Download Full-text

Gradient Descent Learning Algorithm Based on Spike Selection Mechanism for Multilayer Spiking Neural Networks

10.1007/978-3-030-92238-2_4 ◽

2021 ◽

pp. 40-51

Author(s):

Xianghong Lin ◽

Tiandou Hu ◽

Xiangwen Wang ◽

Han Lu

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Learning Algorithm ◽

Spiking Neural Networks ◽

Selection Mechanism

Download Full-text

ACCELERATED LEARNING BY ACTIVE EXAMPLE SELECTION

International Journal of Neural Systems ◽

10.1142/s0129065794000086 ◽

1994 ◽

Vol 05 (01) ◽

pp. 67-75 ◽

Cited By ~ 32

Author(s):

BYOUNG-TAK ZHANG

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Learning Algorithm ◽

Accelerated Learning ◽

Training Set ◽

Alternative Approach ◽

Speed Up ◽

Multilayer Neural Networks ◽

Training Examples ◽

The Given

Much previous work on training multilayer neural networks has attempted to speed up the backpropagation algorithm using more sophisticated weight modification rules, whereby all the given training examples are used in a random or predetermined sequence. In this paper we investigate an alternative approach in which the learning proceeds on an increasing number of selected training examples, starting with a small training set. We derive a measure of criticality of examples and present an incremental learning algorithm that uses this measure to select a critical subset of given examples for solving the particular task. Our experimental results suggest that the method can significantly improve training speed and generalization performance in many real applications of neural networks. This method can be used in conjunction with other variations of gradient descent algorithms.

Download Full-text