Riemannian structure of some new gradient descent learning algorithms

Several studies have shown that natural gradient descent for on-line learning is much more efficient than standard gradient descent. In this article, we derive natural gradients in a slightly different manner and discuss implications for batch-mode learning and pruning, linking them to existing algorithms such as Levenberg-Marquardt optimization and optimal brain surgeon. The Fisher matrix plays an important role in all these algorithms. The second half of the article discusses a layered approximation of the Fisher matrix specific to multilayered perceptrons. Using this approximation rather than the exact Fisher matrix, we arrive at much faster “natural” learning algorithms and more robust pruning procedures.

Download Full-text

Comparison of gradient descent and conjugate gradient learning algorithms for classification of electrogastrogram

Proceedings of 17th International Conference of the Engineering in Medicine and Biology Society ◽

10.1109/iembs.1995.575378 ◽

2002 ◽

Author(s):

Zhiyue Lin ◽

J. Maris ◽

L. Hermans ◽

J. Vandewalle ◽

Jian De ◽

...

Keyword(s):

Conjugate Gradient ◽

Gradient Descent ◽

Learning Algorithms ◽

Gradient Learning

Download Full-text

The Eighty Five Percent Rule for Optimal Learning

10.1101/255182 ◽

2018 ◽

Cited By ~ 2

Author(s):

Robert C. Wilson ◽

Amitai Shenhav ◽

Mark Straccia ◽

Jonathan D. Cohen

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Broad Class ◽

Binary Classification ◽

Learning Algorithms ◽

Sweet Spot ◽

Optimal Learning ◽

Rate Of Learning ◽

Classification Tasks

AbstractResearchers and educators have long wrestled with the question of how best to teach their clients be they human, animal or machine. Here we focus on the role of a single variable, the difficulty of training, and examine its effect on the rate of learning. In many situations we find that there is a sweet spot in which training is neither too easy nor too hard, and where learning progresses most quickly. We derive conditions for this sweet spot for a broad class of learning algorithms in the context of binary classification tasks, in which ambiguous stimuli must be sorted into one of two classes. For all of these gradient-descent based learning algorithms we find that the optimal error rate for training is around 15.87% or, conversely, that the optimal training accuracy is about 85%. We demonstrate the efficacy of this ‘Eighty Five Percent Rule’ for artificial neural networks used in AI and biologically plausible neural networks thought to describe human and animal learning.

Download Full-text

When does reinforcement learning stand out in quantum control? A comparative study on state preparation

npj Quantum Information ◽

10.1038/s41534-019-0201-8 ◽

2019 ◽

Vol 5 (1) ◽

Cited By ~ 9

Author(s):

Xiao-Ming Zhang ◽

Zezhu Wei ◽

Raza Asad ◽

Xu-Chen Yang ◽

Xin Wang

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Comparative Study ◽

Gradient Descent ◽

Quantum Control ◽

Learning Algorithms ◽

Stochastic Gradient Descent ◽

Q Learning ◽

Machine Learning Methods ◽

Policy Gradient

Abstract Reinforcement learning has been widely used in many problems, including quantum control of qubits. However, such problems can, at the same time, be solved by traditional, non-machine-learning methods, such as stochastic gradient descent and Krotov algorithms, and it remains unclear which one is most suitable when the control has specific constraints. In this work, we perform a comparative study on the efficacy of three reinforcement learning algorithms: tabular Q-learning, deep Q-learning, and policy gradient, as well as two non-machine-learning methods: stochastic gradient descent and Krotov algorithms, in the problem of preparing a desired quantum state. We found that overall, the deep Q-learning and policy gradient algorithms outperform others when the problem is discretized, e.g. allowing discrete values of control, and when the problem scales up. The reinforcement learning algorithms can also adaptively reduce the complexity of the control sequences, shortening the operation time and improving the fidelity. Our comparison provides insights into the suitability of reinforcement learning in quantum control problems.

Download Full-text

Analysis of Q-learning with Adaptation and Momentum Restart for Gradient Descent

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/422 ◽

2020 ◽

Author(s):

Bowen Weng ◽

Huaqing Xiong ◽

Yingbin Liang ◽

Wei Zhang

Keyword(s):

Convergence Rate ◽

Gradient Descent ◽

Learning Algorithm ◽

Learning Algorithms ◽

Linear Quadratic Regulator ◽

Stochastic Gradient Descent ◽

Learning Method ◽

Linear Quadratic ◽

Q Learning ◽

Moment Estimation

Existing convergence analyses of Q-learning mostly focus on the vanilla stochastic gradient descent (SGD) type of updates. Despite the Adaptive Moment Estimation (Adam) has been commonly used for practical Q-learning algorithms, there has not been any convergence guarantee provided for Q-learning with such type of updates. In this paper, we first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update (a commonly adopted alternative of Adam for theoretical analysis). To further improve the performance, we propose to incorporate the momentum restart scheme to Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm. The convergence rate of Q-AMSGradR is also established. Our experiments on a linear quadratic regulator problem demonstrate that the two proposed Q-learning algorithms outperform the vanilla Q-learning with SGD updates. The two algorithms also exhibit significantly better performance than the DQN learning method over a batch of Atari 2600 games.

Download Full-text

Evaluation of gradient descent learning algorithms with an adaptive local rate technique for hierarchical feedforward architectures

Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium ◽

10.1109/ijcnn.2000.857895 ◽

2000 ◽

Cited By ~ 2

Author(s):

F. Diotalevi ◽

M. Valle ◽

D.D. Caviglia

Keyword(s):

Gradient Descent ◽

Learning Algorithms ◽

Local Rate

Download Full-text

Online Gradient Descent Learning Algorithms

Foundations of Computational Mathematics ◽

10.1007/s10208-006-0237-y ◽

2007 ◽

Vol 8 (5) ◽

pp. 561-596 ◽

Cited By ~ 28

Author(s):

Yiming Ying ◽

Massimiliano Pontil

Keyword(s):

Gradient Descent ◽

Learning Algorithms

Download Full-text

Some considerations on conventional neuro-fuzzy learning algorithms by gradient descent method

Fuzzy Sets and Systems ◽

10.1016/s0165-0114(98)00056-6 ◽

2000 ◽

Vol 112 (1) ◽

pp. 51-63 ◽

Cited By ~ 41

Author(s):

Yan Shi ◽

Masaharu Mizumoto

Keyword(s):

Gradient Descent ◽

Learning Algorithms ◽

Descent Method ◽

Gradient Descent Method ◽

Neuro Fuzzy ◽

Fuzzy Learning

Download Full-text

Contrastive Learning and Neural Oscillations

Neural Computation ◽

10.1162/neco.1991.3.4.526 ◽

1991 ◽

Vol 3 (4) ◽

pp. 526-545 ◽

Cited By ~ 24

Author(s):

Pierre Baldi ◽

Fernando Pineda

Keyword(s):

Gradient Descent ◽

Learning Algorithms ◽

Learning Rule ◽

Unified Framework ◽

Contrast Function ◽

New Learning ◽

Different Types ◽

Free Network ◽

Two Phases ◽

The Brain

The concept of Contrastive Learning (CL) is developed as a family of possible learning algorithms for neural networks. CL is an extension of Deterministic Boltzmann Machines to more general dynamical systems. During learning, the network oscillates between two phases. One phase has a teacher signal and one phase has no teacher signal. The weights are updated using a learning rule that corresponds to gradient descent on a contrast function that measures the discrepancy between the free network and the network with a teacher signal. The CL approach provides a general unified framework for developing new learning algorithms. It also shows that many different types of clamping and teacher signals are possible. Several examples are given and an analysis of the landscape of the contrast function is proposed with some relevant predictions for the CL curves. An approach that may be suitable for collective analog implementations is described. Simulation results and possible extensions are briefly discussed together with a new conjecture regarding the function of certain oscillations in the brain. In the appendix, we also examine two extensions of contrastive learning to time-dependent trajectories.

Download Full-text