Two Further Gradient BYY Learning Rules for Gaussian Mixture with Automated Model Selection

The parameter spaces of hierarchical systems such as multilayer perceptrons include singularities due to the symmetry and degeneration of hidden units. A parameter space forms a geometrical manifold, called the neuromanifold in the case of neural networks. Such a model is identified with a statistical model, and a Riemannian metric is given by the Fisher information matrix. However, the matrix degenerates at singularities. Such a singular structure is ubiquitous not only in multilayer perceptrons but also in the gaussian mixture probability densities, ARMA time-series model, and many other cases. The standard statistical paradigm of the Cramér-Rao theorem does not hold, and the singularity gives rise to strange behaviors in parameter estimation, hypothesis testing, Bayesian inference, model selection, and in particular, the dynamics of learning from examples. Prevailing theories so far have not paid much attention to the problem caused by singularity, relying only on ordinary statistical theories developed for regular (nonsingular) models. Only recently have researchers remarked on the effects of singularity, and theories are now being developed. This article gives an overview of the phenomena caused by the singularities of statistical manifolds related to multilayer perceptrons and gaussian mixtures. We demonstrate our recent results on these problems. Simple toy models are also used to show explicit solutions. We explain that the maximum likelihood estimator is no longer subject to the gaussian distribution even asymptotically, because the Fisher information matrix degenerates, that the model selection criteria such as AIC, BIC, and MDL fail to hold in these models, that a smooth Bayesian prior becomes singular in such models, and that the trajectories of dynamics of learning are strongly affected by the singularity, causing plateaus or slow manifolds in the parameter space. The natural gradient method is shown to perform well because it takes the singular geometrical structure into account. The generalization error and the training error are studied in some examples.

Download Full-text

Galactic double neutron star total masses and Gaussian mixture model selection

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stz358 ◽

2019 ◽

Vol 485 (2) ◽

pp. 1665-1674 ◽

Cited By ~ 7

Author(s):

David Keitel

Keyword(s):

Neutron Star ◽

Model Selection ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Gaussian Mixture

Download Full-text

Gaussian Mixture Model Selection Using Multiple Random Subsampling with Initialization

Computer Analysis of Images and Patterns - Lecture Notes in Computer Science ◽

10.1007/978-3-319-23192-1_57 ◽

2015 ◽

pp. 678-689 ◽

Cited By ~ 1

Author(s):

Josef V. Psutka

Keyword(s):

Model Selection ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Gaussian Mixture

Download Full-text

Learning Gaussian mixture with automatic model selection: A comparative study on three Bayesian related approaches

Frontiers of Electrical and Electronic Engineering in China ◽

10.1007/s11460-011-0153-z ◽

2011 ◽

Vol 6 (2) ◽

pp. 215-244 ◽

Cited By ~ 12

Author(s):

Lei Shi ◽

Shikui Tu ◽

Lei Xu

Keyword(s):

Model Selection ◽

Comparative Study ◽

Gaussian Mixture ◽

Automatic Model Selection

Download Full-text

Best Harmony, Unified RPCL and Automated Model Selection for Unsupervised and Supervised Learning on Gaussian Mixtures, Three-Layer Nets and ME-RBF-SVM Models

International Journal of Neural Systems ◽

10.1142/s0129065701000497 ◽

2001 ◽

Vol 11 (01) ◽

pp. 43-69 ◽

Cited By ~ 33

Author(s):

Lei Xu

Keyword(s):

Model Selection ◽

Supervised Learning ◽

Kernel Regression ◽

Gaussian Mixture ◽

Parameter Learning ◽

Special Cases ◽

Regularization Techniques ◽

New Type ◽

New Criterion ◽

Nonparametric Statistical

After introducing the fundamentals of BYY system and harmony learning, which has been developed in past several years as a unified statistical framework for parameter learning, regularization and model selection, we systematically discuss this BYY harmony learning on systems with discrete inner-representations. First, we shown that one special case leads to unsupervised learning on Gaussian mixture. We show how harmony learning not only leads us to the EM algorithm for maximum likelihood (ML) learning and the corresponding extended KMEAN algorithms for Mahalanobis clustering with criteria for selecting the number of Gaussians or clusters, but also provides us two new regularization techniques and a unified scheme that includes the previous rival penalized competitive learning (RPCL) as well as its various variants and extensions that performs model selection automatically during parameter learning. Moreover, as a by-product, we also get a new approach for determining a set of 'supporting vectors' for Parzen window density estimation. Second, we shown that other special cases lead to three typical supervised learning models with several new results. On three layer net, we get (i) a new regularized ML learning, (ii) a new criterion for selecting the number of hidden units, and (iii) a family of EM-like algorithms that combines harmony learning with new techniques of regularization. On the original and alternative models of mixture-of-expert (ME) as well as radial basis function (RBF) nets, we get not only a new type of criteria for selecting the number of experts or basis functions but also a new type of the EM-like algorithms that combines regularization techniques and RPCL learning for parameter learning with either least complexity nature on the original ME model or automated model selection on the alternative ME model and RBF nets. Moreover, all the results for the alternative ME model are also applied to other two popular nonparametric statistical approaches, namely kernel regression and supporting vector machine. Particularly, not only we get an easily implemented approach for determining the smoothing parameter in kernel regression, but also we get an alternative approach for deciding the set of supporting vectors in supporting vector machine.

Download Full-text

CONJUGATE AND NATURAL GRADIENT RULES FOR BYY HARMONY LEARNING ON GAUSSIAN MIXTURE WITH AUTOMATED MODEL SELECTION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001405004228 ◽

2005 ◽

Vol 19 (05) ◽

pp. 701-713 ◽

Cited By ~ 17

Author(s):

JINWEN MA ◽

BIN GAO ◽

YANG WANG ◽

QIANSHENG CHENG

Keyword(s):

Model Selection ◽

Learning Theory ◽

Gaussian Mixture ◽

Parameter Learning ◽

Natural Gradient ◽

Simulation Experiments ◽

Sample Data

Under the Bayesian Ying–Yang (BYY) harmony learning theory, a harmony function has been developed on a BI-directional architecture of the BYY system for Gaussian mixture with an important feature that, via its maximization through a general gradient rule, a model selection can be made automatically during parameter learning on a set of sample data from a Gaussian mixture. This paper further proposes the conjugate and natural gradient rules to efficiently implement the maximization of the harmony function, i.e. the BYY harmony learning, on Gaussian mixture. It is demonstrated by simulation experiments that these two new gradient rules not only work well, but also converge more quickly than the general gradient ones.

Download Full-text

An annealing approach to BYY harmony learning on Gaussian mixture with automated model selection

International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003 ◽

10.1109/icnnsp.2003.1279204 ◽

2003 ◽

Author(s):

Jinwen Ma ◽

Taijun Wang ◽

Lei Xu

Keyword(s):

Model Selection ◽

Gaussian Mixture

Download Full-text