Practical Consideration on Generalization Property of Natural Gradient Learning

Author(s):  
Hyeyoung Park
2008 ◽  
Vol 88 (3) ◽  
pp. 761-766 ◽  
Author(s):  
S. Squartini ◽  
A. Arcangeli ◽  
F. Piazza

2018 ◽  
Vol 30 (1) ◽  
pp. 1-33 ◽  
Author(s):  
Shun-ichi Amari ◽  
Tomoko Ozeki ◽  
Ryo Karakida ◽  
Yuki Yoshida ◽  
Masato Okada

The dynamics of supervised learning play a main role in deep learning, which takes place in the parameter space of a multilayer perceptron (MLP). We review the history of supervised stochastic gradient learning, focusing on its singular structure and natural gradient. The parameter space includes singular regions in which parameters are not identifiable. One of our results is a full exploration of the dynamical behaviors of stochastic gradient learning in an elementary singular network. The bad news is its pathological nature, in which part of the singular region becomes an attractor and another part a repulser at the same time, forming a Milnor attractor. A learning trajectory is attracted by the attractor region, staying in it for a long time, before it escapes the singular region through the repulser region. This is typical of plateau phenomena in learning. We demonstrate the strange topology of a singular region by introducing blow-down coordinates, which are useful for analyzing the natural gradient dynamics. We confirm that the natural gradient dynamics are free of critical slowdown. The second main result is the good news: the interactions of elementary singular networks eliminate the attractor part and the Milnor-type attractors disappear. This explains why large-scale networks do not suffer from serious critical slowdowns due to singularities. We finally show that the unit-wise natural gradient is effective for learning in spite of its low computational cost.


1999 ◽  
Vol 11 (8) ◽  
pp. 1875-1883 ◽  
Author(s):  
Shun-ichi Amari

Independent component analysis or blind source separation is a new technique of extracting independent signals from mixtures. It is applicable even when the number of independent sources is unknown and is larger or smaller than the number of observed mixture signals. This article extends the natural gradient learning algorithm to be applicable to these overcomplete and undercomplete cases. Here, the observed signals are assumed to be whitened by preprocessing, so that we use the natural Riemannian gradient in Stiefel manifolds.


Sign in / Sign up

Export Citation Format

Share Document