Analytical form of Fisher information matrix of bipoloar-activation-function-based multilayer perceptrons

Author(s):  
Weili Guo ◽  
Liping Xie ◽  
Zhenyong Fu ◽  
Jianhui Guo ◽  
Guochen Pang ◽  
...  
2019 ◽  
Vol 49 (8) ◽  
pp. 3088-3098 ◽  
Author(s):  
Weili Guo ◽  
Yew-Soon Ong ◽  
Yingjiang Zhou ◽  
Jaime Rubio Hervas ◽  
Aiguo Song ◽  
...  

2006 ◽  
Vol 18 (5) ◽  
pp. 1007-1065 ◽  
Author(s):  
Shun-ichi Amari ◽  
Hyeyoung Park ◽  
Tomoko Ozeki

The parameter spaces of hierarchical systems such as multilayer perceptrons include singularities due to the symmetry and degeneration of hidden units. A parameter space forms a geometrical manifold, called the neuromanifold in the case of neural networks. Such a model is identified with a statistical model, and a Riemannian metric is given by the Fisher information matrix. However, the matrix degenerates at singularities. Such a singular structure is ubiquitous not only in multilayer perceptrons but also in the gaussian mixture probability densities, ARMA time-series model, and many other cases. The standard statistical paradigm of the Cramér-Rao theorem does not hold, and the singularity gives rise to strange behaviors in parameter estimation, hypothesis testing, Bayesian inference, model selection, and in particular, the dynamics of learning from examples. Prevailing theories so far have not paid much attention to the problem caused by singularity, relying only on ordinary statistical theories developed for regular (nonsingular) models. Only recently have researchers remarked on the effects of singularity, and theories are now being developed. This article gives an overview of the phenomena caused by the singularities of statistical manifolds related to multilayer perceptrons and gaussian mixtures. We demonstrate our recent results on these problems. Simple toy models are also used to show explicit solutions. We explain that the maximum likelihood estimator is no longer subject to the gaussian distribution even asymptotically, because the Fisher information matrix degenerates, that the model selection criteria such as AIC, BIC, and MDL fail to hold in these models, that a smooth Bayesian prior becomes singular in such models, and that the trajectories of dynamics of learning are strongly affected by the singularity, causing plateaus or slow manifolds in the parameter space. The natural gradient method is shown to perform well because it takes the singular geometrical structure into account. The generalization error and the training error are studied in some examples.


2011 ◽  
Vol 2011 ◽  
pp. 1-9 ◽  
Author(s):  
Michael R. Bastian ◽  
Jacob H. Gunther ◽  
Todd K. Moon

Adaptive natural gradient learning avoids singularities in the parameter space of multilayer perceptrons. However, it requires a larger number of additional parameters than ordinary backpropagation in the form of the Fisher information matrix. This paper describes a new approach to natural gradient learning that uses a smaller Fisher information matrix. It also uses a prior distribution on the neural network parameters and an annealed learning rate. While this new approach is computationally simpler, its performance is comparable to that of adaptive natural gradient learning.


2000 ◽  
Vol 12 (6) ◽  
pp. 1399-1409 ◽  
Author(s):  
Shun-ichi Amari ◽  
Hyeyoung Park ◽  
Kenji Fukumizu

The natural gradient learning method is known to have ideal performances for on-line training of multilayer perceptrons. It avoids plateaus, which give rise to slow convergence of the backpropagation method. It is Fisher efficient, whereas the conventional method is not. However, for implementing the method, it is necessary to calculate the Fisher information matrix and its inverse, which is practically very difficult. This article proposes an adaptive method of directly obtaining the inverse of the Fisher information matrix. It generalizes the adaptive Gauss-Newton algorithms and provides a solid theoretical justification of them. Simulations show that the proposed adaptive method works very well for realizing natural gradient learning.


1998 ◽  
Vol 10 (8) ◽  
pp. 2137-2157 ◽  
Author(s):  
Howard Hua Yang ◽  
Shun-ichi Amari

The natural gradient descent method is applied to train an n-m-1 multilayer perceptron. Based on an efficient scheme to represent the Fisher information matrix for an n-m-1 stochastic multilayer perceptron, a new algorithm is proposed to calculate the natural gradient without inverting the Fisher information matrix explicitly. When the input dimension n is much larger than the number of hidden neurons m, the time complexity of computing the natural gradient is O(n).


Sign in / Sign up

Export Citation Format

Share Document