On “Natural” Learning and Pruning in Multilayered Perceptrons

Several studies have shown that natural gradient descent for on-line learning is much more efficient than standard gradient descent. In this article, we derive natural gradients in a slightly different manner and discuss implications for batch-mode learning and pruning, linking them to existing algorithms such as Levenberg-Marquardt optimization and optimal brain surgeon. The Fisher matrix plays an important role in all these algorithms. The second half of the article discusses a layered approximation of the Fisher matrix specific to multilayered perceptrons. Using this approximation rather than the exact Fisher matrix, we arrive at much faster “natural” learning algorithms and more robust pruning procedures.

Download Full-text

Natural Gradient Descent for On-Line Learning

Physical Review Letters ◽

10.1103/physrevlett.81.5461 ◽

1998 ◽

Vol 81 (24) ◽

pp. 5461-5464 ◽

Cited By ~ 59

Author(s):

Magnus Rattray ◽

David Saad ◽

Shun-ichi Amari

Keyword(s):

Gradient Descent ◽

Natural Gradient ◽

On Line ◽

On Line Learning

Download Full-text

On-Line Learning Theory of Soft Committee Machines with Correlated Hidden Units –Steepest Gradient Descent and Natural Gradient Descent–

Journal of the Physical Society of Japan ◽

10.1143/jpsj.72.805 ◽

2003 ◽

Vol 72 (4) ◽

pp. 805-810 ◽

Cited By ~ 18

Author(s):

Masato Inoue ◽

Hyeyoung Park ◽

Masato Okada

Keyword(s):

Learning Theory ◽

Gradient Descent ◽

Natural Gradient ◽

On Line ◽

On Line Learning

Download Full-text

Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent

Neural Computation ◽

10.1162/08997660260028683 ◽

2002 ◽

Vol 14 (7) ◽

pp. 1723-1738 ◽

Cited By ~ 85

Author(s):

Nicol N. Schraudolph

Keyword(s):

Gradient Descent ◽

Linear Time ◽

Second Order ◽

Natural Gradient ◽

Acceleration Techniques ◽

On Line ◽

On Line Learning ◽

Matrix Vector ◽

Fresh Insight ◽

Insight Into

We propose a generic method for iteratively approximating various second-order gradient steps—-Newton, Gauss-Newton, Levenberg-Marquardt, and natural gradient—-in linear time per iteration, using special curvature matrix-vector products that can be computed in O(n). Two recent acceleration techniques for on-line learning, matrix momentum and stochastic meta-descent (SMD), implement this approach. Since both were originally derived by very different routes, this offers fresh insight into their operation, resulting in further improvements to SMD.

Download Full-text

On-line learning algorithms for locally recurrent neural networks

IEEE Transactions on Neural Networks ◽

10.1109/72.750549 ◽

1999 ◽

Vol 10 (2) ◽

pp. 253-271 ◽

Cited By ~ 115

Author(s):

P. Campolucci ◽

A. Uncini ◽

F. Piazza ◽

B.D. Rao

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Learning Algorithms ◽

On Line ◽

Locally Recurrent ◽

On Line Learning

Download Full-text

Wirtinger Calculus Based Gradient Descent and Levenberg-Marquardt Learning Algorithms in Complex-Valued Neural Networks

Neural Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-24955-6_66 ◽

2011 ◽

pp. 550-559 ◽

Cited By ~ 16

Author(s):

Md. Faijul Amin ◽

Muhammad Ilias Amin ◽

A. Y. H. Al-Nuaimi ◽

Kazuyuki Murase

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Learning Algorithms ◽

Wirtinger Calculus ◽

Levenberg Marquardt ◽

Complex Valued

Download Full-text

On the Generalization Ability of On-Line Learning Algorithms

IEEE Transactions on Information Theory ◽

10.1109/tit.2004.833339 ◽

2004 ◽

Vol 50 (9) ◽

pp. 2050-2057 ◽

Cited By ~ 104

Author(s):

N. Cesa-Bianchi ◽

A. Conconi ◽

C. Gentile

Keyword(s):

Learning Algorithms ◽

Generalization Ability ◽

On Line ◽

On Line Learning

Download Full-text

Adaptive and Self-Confident On-Line Learning Algorithms

Journal of Computer and System Sciences ◽

10.1006/jcss.2001.1795 ◽

2002 ◽

Vol 64 (1) ◽

pp. 48-75 ◽

Cited By ~ 60

Author(s):

Peter Auer ◽

Nicolò Cesa-Bianchi ◽

Claudio Gentile

Keyword(s):

Learning Algorithms ◽

On Line ◽

On Line Learning

Download Full-text

On the Generalization Ability of On-line Learning Algorithms

Advances in Neural Information Processing Systems 14 ◽

10.7551/mitpress/1120.003.0051 ◽

2002 ◽

Keyword(s):

Learning Algorithms ◽

Generalization Ability ◽

On Line ◽

On Line Learning

Download Full-text

Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms

Advances in Neural Information Processing Systems 14 ◽

10.7551/mitpress/1120.003.0059 ◽

2002 ◽

Keyword(s):

Learning Algorithms ◽

On Line ◽

On Line Learning

Download Full-text

Smooth On-Line Learning Algorithms for Hidden Markov Models

Neural Computation ◽

10.1162/neco.1994.6.2.307 ◽

1994 ◽

Vol 6 (2) ◽

pp. 307-318 ◽

Cited By ~ 56

Author(s):

Pierre Baldi ◽

Yves Chauvin

Keyword(s):

Hidden Markov Models ◽

Expectation Maximization ◽

Markov Models ◽

Learning Algorithm ◽

Hidden Markov ◽

Batch Mode ◽

On Line ◽

Maximization Algorithms ◽

On Line Learning ◽

Entropy Functions

A simple learning algorithm for Hidden Markov Models (HMMs) is presented together with a number of variations. Unlike other classical algorithms such as the Baum-Welch algorithm, the algorithms described are smooth and can be used on-line (after each example presentation) or in batch mode, with or without the usual Viterbi most likely path approximation. The algorithms have simple expressions that result from using a normalized-exponential representation for the HMM parameters. All the algorithms presented are proved to be exact or approximate gradient optimization algorithms with respect to likelihood, log-likelihood, or cross-entropy functions, and as such are usually convergent. These algorithms can also be casted in the more general EM (Expectation-Maximization) framework where they can be viewed as exact or approximate GEM (Generalized Expectation-Maximization) algorithms. The mathematical properties of the algorithms are derived in the appendix.

Download Full-text