scholarly journals Asymptotics of Reinforcement Learning with Neural Networks

2021 ◽  
Author(s):  
Justin Sirignano ◽  
Konstantinos Spiliopoulos

We prove that a single-layer neural network trained with the Q-learning algorithm converges in distribution to a random ordinary differential equation as the size of the model and the number of training steps become large. Analysis of the limit differential equation shows that it has a unique stationary solution that is the solution of the Bellman equation, thus giving the optimal control for the problem. In addition, we study the convergence of the limit differential equation to the stationary solution. As a by-product of our analysis, we obtain the limiting behavior of single-layer neural networks when trained on independent and identically distributed data with stochastic gradient descent under the widely used Xavier initialization.

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Huiqin Li ◽  
Yanling Li ◽  
Chuan He ◽  
Jianwei Zhan ◽  
Hui Zhang

In this paper, a cognitive electronic jamming decision-making method based on improved Q -learning is proposed to improve the efficiency of radar jamming decision-making. First, the method adopts the simulated annealing (SA) algorithm’s Metropolis criterion to enhance the exploration strategy, balancing the contradictory relationship between exploration and utilization in the algorithm to avoid falling into local optima. At the same time, the idea of stochastic gradient descent with warm restarts (SGDR) is introduced to improve the learning rate of the algorithm, which reduces the oscillation and improves convergence speed at the later stage of the algorithm iteration. Then, a cognitive electronic jamming decision-making model is constructed, and the improved Q -learning algorithm’s specific steps are given. The simulation experiment takes a multifunctional radar as an example to analyze the influence of exploration strategy and learning rate on decision-making performance. The results reveal that compared with the traditional Q -learning algorithm, the improved Q -learning algorithm proposed in this paper can fully explore and efficiently utilize and converge the results to a better solution at a faster speed. The number of iterations can be reduced to more than 50%, which proves the feasibility and effectiveness of the method applied to cognitive electronic jamming decision-making.


Author(s):  
Bowen Weng ◽  
Huaqing Xiong ◽  
Yingbin Liang ◽  
Wei Zhang

Existing convergence analyses of Q-learning mostly focus on the vanilla stochastic gradient descent (SGD) type of updates. Despite the Adaptive Moment Estimation (Adam) has been commonly used for practical Q-learning algorithms, there has not been any convergence guarantee provided for Q-learning with such type of updates. In this paper, we first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update (a commonly adopted alternative of Adam for theoretical analysis). To further improve the performance, we propose to incorporate the momentum restart scheme to Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm. The convergence rate of Q-AMSGradR is also established. Our experiments on a linear quadratic regulator problem demonstrate that the two proposed Q-learning algorithms outperform the vanilla Q-learning with SGD updates. The two algorithms also exhibit significantly better performance than the DQN learning method over a batch of Atari 2600 games.


Electronics ◽  
2021 ◽  
Vol 10 (22) ◽  
pp. 2761
Author(s):  
Vaios Ampelakiotis ◽  
Isidoros Perikos ◽  
Ioannis Hatzilygeroudis ◽  
George Tsihrintzis

In this paper, we present a handwritten character recognition (HCR) system that aims to recognize first-order logic handwritten formulas and create editable text files of the recognized formulas. Dense feedforward neural networks (NNs) are utilized, and their performance is examined under various training conditions and methods. More specifically, after three training algorithms (backpropagation, resilient propagation and stochastic gradient descent) had been tested, we created and trained an NN with the stochastic gradient descent algorithm, optimized by the Adam update rule, which was proved to be the best, using a trainset of 16,750 handwritten image samples of 28 × 28 each and a testset of 7947 samples. The final accuracy achieved is 90.13%. The general methodology followed consists of two stages: the image processing and the NN design and training. Finally, an application has been created that implements the methodology and automatically recognizes handwritten logic formulas. An interesting feature of the application is that it allows for creating new, user-oriented training sets and parameter settings, and thus new NN models.


2021 ◽  
Author(s):  
Tianyi Liu ◽  
Zhehui Chen ◽  
Enlu Zhou ◽  
Tuo Zhao

Momentum stochastic gradient descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning (e.g., training deep neural networks, variational Bayesian inference, etc.). Despite its empirical success, there is still a lack of theoretical understanding of convergence properties of MSGD. To fill this gap, we propose to analyze the algorithmic behavior of MSGD by diffusion approximations for nonconvex optimization problems with strict saddle points and isolated local optima. Our study shows that the momentum helps escape from saddle points but hurts the convergence within the neighborhood of optima (if without the step size annealing or momentum annealing). Our theoretical discovery partially corroborates the empirical success of MSGD in training deep neural networks.


2018 ◽  
Vol 11 (1) ◽  
pp. 7 ◽  
Author(s):  
Matteo Grimaldi ◽  
Valerio Tenace ◽  
Andrea Calimera

Convolutional Neural Networks (CNNs) are brain-inspired computational models designed to recognize patterns. Recent advances demonstrate that CNNs are able to achieve, and often exceed, human capabilities in many application domains. Made of several millions of parameters, even the simplest CNN shows large model size. This characteristic is a serious concern for the deployment on resource-constrained embedded-systems, where compression stages are needed to meet the stringent hardware constraints. In this paper, we introduce a novel accuracy-driven compressive training algorithm. It consists of a two-stage flow: first, layers are sorted by means of heuristic rules according to their significance; second, a modified stochastic gradient descent optimization is applied on less significant layers such that their representation is collapsed into a constrained subspace. Experimental results demonstrate that our approach achieves remarkable compression rates with low accuracy loss (<1%).


Entropy ◽  
2020 ◽  
Vol 22 (2) ◽  
pp. 213 ◽  
Author(s):  
Yiğit Uğur ◽  
George Arvanitakis ◽  
Abdellatif Zaidi

In this paper, we develop an unsupervised generative clustering framework that combines the variational information bottleneck and the Gaussian mixture model. Specifically, in our approach, we use the variational information bottleneck method and model the latent space as a mixture of Gaussians. We derive a bound on the cost function of our model that generalizes the Evidence Lower Bound (ELBO) and provide a variational inference type algorithm that allows computing it. In the algorithm, the coders’ mappings are parametrized using neural networks, and the bound is approximated by Markov sampling and optimized with stochastic gradient descent. Numerical results on real datasets are provided to support the efficiency of our method.


1994 ◽  
Vol 05 (01) ◽  
pp. 67-75 ◽  
Author(s):  
BYOUNG-TAK ZHANG

Much previous work on training multilayer neural networks has attempted to speed up the backpropagation algorithm using more sophisticated weight modification rules, whereby all the given training examples are used in a random or predetermined sequence. In this paper we investigate an alternative approach in which the learning proceeds on an increasing number of selected training examples, starting with a small training set. We derive a measure of criticality of examples and present an incremental learning algorithm that uses this measure to select a critical subset of given examples for solving the particular task. Our experimental results suggest that the method can significantly improve training speed and generalization performance in many real applications of neural networks. This method can be used in conjunction with other variations of gradient descent algorithms.


Sign in / Sign up

Export Citation Format

Share Document