scholarly journals On “Natural” Learning and Pruning in Multilayered Perceptrons

2000 ◽  
Vol 12 (4) ◽  
pp. 881-901 ◽  
Author(s):  
Tom Heskes

Several studies have shown that natural gradient descent for on-line learning is much more efficient than standard gradient descent. In this article, we derive natural gradients in a slightly different manner and discuss implications for batch-mode learning and pruning, linking them to existing algorithms such as Levenberg-Marquardt optimization and optimal brain surgeon. The Fisher matrix plays an important role in all these algorithms. The second half of the article discusses a layered approximation of the Fisher matrix specific to multilayered perceptrons. Using this approximation rather than the exact Fisher matrix, we arrive at much faster “natural” learning algorithms and more robust pruning procedures.

1998 ◽  
Vol 81 (24) ◽  
pp. 5461-5464 ◽  
Author(s):  
Magnus Rattray ◽  
David Saad ◽  
Shun-ichi Amari

2002 ◽  
Vol 14 (7) ◽  
pp. 1723-1738 ◽  
Author(s):  
Nicol N. Schraudolph

We propose a generic method for iteratively approximating various second-order gradient steps—-Newton, Gauss-Newton, Levenberg-Marquardt, and natural gradient—-in linear time per iteration, using special curvature matrix-vector products that can be computed in O(n). Two recent acceleration techniques for on-line learning, matrix momentum and stochastic meta-descent (SMD), implement this approach. Since both were originally derived by very different routes, this offers fresh insight into their operation, resulting in further improvements to SMD.


1999 ◽  
Vol 10 (2) ◽  
pp. 253-271 ◽  
Author(s):  
P. Campolucci ◽  
A. Uncini ◽  
F. Piazza ◽  
B.D. Rao

2004 ◽  
Vol 50 (9) ◽  
pp. 2050-2057 ◽  
Author(s):  
N. Cesa-Bianchi ◽  
A. Conconi ◽  
C. Gentile

2002 ◽  
Vol 64 (1) ◽  
pp. 48-75 ◽  
Author(s):  
Peter Auer ◽  
Nicolò Cesa-Bianchi ◽  
Claudio Gentile

1994 ◽  
Vol 6 (2) ◽  
pp. 307-318 ◽  
Author(s):  
Pierre Baldi ◽  
Yves Chauvin

A simple learning algorithm for Hidden Markov Models (HMMs) is presented together with a number of variations. Unlike other classical algorithms such as the Baum-Welch algorithm, the algorithms described are smooth and can be used on-line (after each example presentation) or in batch mode, with or without the usual Viterbi most likely path approximation. The algorithms have simple expressions that result from using a normalized-exponential representation for the HMM parameters. All the algorithms presented are proved to be exact or approximate gradient optimization algorithms with respect to likelihood, log-likelihood, or cross-entropy functions, and as such are usually convergent. These algorithms can also be casted in the more general EM (Expectation-Maximization) framework where they can be viewed as exact or approximate GEM (Generalized Expectation-Maximization) algorithms. The mathematical properties of the algorithms are derived in the appendix.


Sign in / Sign up

Export Citation Format

Share Document