Best Harmony, Unified RPCL and Automated Model Selection for Unsupervised and Supervised Learning on Gaussian Mixtures, Three-Layer Nets and ME-RBF-SVM Models

2001 ◽  
Vol 11 (01) ◽  
pp. 43-69 ◽  
Author(s):  
Lei Xu

After introducing the fundamentals of BYY system and harmony learning, which has been developed in past several years as a unified statistical framework for parameter learning, regularization and model selection, we systematically discuss this BYY harmony learning on systems with discrete inner-representations. First, we shown that one special case leads to unsupervised learning on Gaussian mixture. We show how harmony learning not only leads us to the EM algorithm for maximum likelihood (ML) learning and the corresponding extended KMEAN algorithms for Mahalanobis clustering with criteria for selecting the number of Gaussians or clusters, but also provides us two new regularization techniques and a unified scheme that includes the previous rival penalized competitive learning (RPCL) as well as its various variants and extensions that performs model selection automatically during parameter learning. Moreover, as a by-product, we also get a new approach for determining a set of 'supporting vectors' for Parzen window density estimation. Second, we shown that other special cases lead to three typical supervised learning models with several new results. On three layer net, we get (i) a new regularized ML learning, (ii) a new criterion for selecting the number of hidden units, and (iii) a family of EM-like algorithms that combines harmony learning with new techniques of regularization. On the original and alternative models of mixture-of-expert (ME) as well as radial basis function (RBF) nets, we get not only a new type of criteria for selecting the number of experts or basis functions but also a new type of the EM-like algorithms that combines regularization techniques and RPCL learning for parameter learning with either least complexity nature on the original ME model or automated model selection on the alternative ME model and RBF nets. Moreover, all the results for the alternative ME model are also applied to other two popular nonparametric statistical approaches, namely kernel regression and supporting vector machine. Particularly, not only we get an easily implemented approach for determining the smoothing parameter in kernel regression, but also we get an alternative approach for deciding the set of supporting vectors in supporting vector machine.

Author(s):  
JINWEN MA ◽  
BIN GAO ◽  
YANG WANG ◽  
QIANSHENG CHENG

Under the Bayesian Ying–Yang (BYY) harmony learning theory, a harmony function has been developed on a BI-directional architecture of the BYY system for Gaussian mixture with an important feature that, via its maximization through a general gradient rule, a model selection can be made automatically during parameter learning on a set of sample data from a Gaussian mixture. This paper further proposes the conjugate and natural gradient rules to efficiently implement the maximization of the harmony function, i.e. the BYY harmony learning, on Gaussian mixture. It is demonstrated by simulation experiments that these two new gradient rules not only work well, but also converge more quickly than the general gradient ones.


2021 ◽  
Vol 69 (4) ◽  
pp. 59-65
Author(s):  
Zheng Li ◽  
◽  
Wei Feng ◽  
Ze Wang ◽  
He Chen ◽  
...  

Non-intrusive Load Identification play an important role in daily life. It can monitor and predict grid load while statistics and analysis of user electricity information. Aiming at the problems of low non-intrusive load decomposition ability and low precision when two electrical appliances are started and stopped at the same time, a new type of clustering and decomposition algorithm is proposed. The algorithm first analyses the measured power and use DBSCAN to filter out the noise of the collected data. Secondly, the remaining power points are clustered using the Adaptive Gaussian Mixture Model (AGMM) to obtain the cluster centres of the electrical appliances, and finally correlate the corresponding current waveform to establish a load characteristic database. In terms of load decomposition, a mathematical model was established for the magnitude of the changing power and current. The Grasshopper optimization algorithm (GOA) is optimized by introducing simulated annealing (SA) to identify and decompose electrical appliances that start and stop at the same time. The result of the decomposition is checked by the current similarity test to determine whether the result of the decomposition is correct, thereby improving the recognition accuracy. Experimental data shows that the combination of DBSCAN and GMM can can identify similar power characteristics. The introduction of SA makes up for the weakness of GOA and gives full play to the advantages of GOA's high identification efficiency. Finally, the test is carried out through the load detection data of the simultaneous start and stop of the two equipment. The test results show that the proposed method can effectively identify the simultaneous start and stop of two loads and can solve the problem of low recognition rate caused by the similar load power, which lays the foundation for the development of non-intrusive load identification in the future.


2008 ◽  
Vol 41 (2) ◽  
pp. 6088-6093
Author(s):  
Gang Wang ◽  
Shiyin Qin ◽  
Pipei Huang

2006 ◽  
Vol 18 (5) ◽  
pp. 1007-1065 ◽  
Author(s):  
Shun-ichi Amari ◽  
Hyeyoung Park ◽  
Tomoko Ozeki

The parameter spaces of hierarchical systems such as multilayer perceptrons include singularities due to the symmetry and degeneration of hidden units. A parameter space forms a geometrical manifold, called the neuromanifold in the case of neural networks. Such a model is identified with a statistical model, and a Riemannian metric is given by the Fisher information matrix. However, the matrix degenerates at singularities. Such a singular structure is ubiquitous not only in multilayer perceptrons but also in the gaussian mixture probability densities, ARMA time-series model, and many other cases. The standard statistical paradigm of the Cramér-Rao theorem does not hold, and the singularity gives rise to strange behaviors in parameter estimation, hypothesis testing, Bayesian inference, model selection, and in particular, the dynamics of learning from examples. Prevailing theories so far have not paid much attention to the problem caused by singularity, relying only on ordinary statistical theories developed for regular (nonsingular) models. Only recently have researchers remarked on the effects of singularity, and theories are now being developed. This article gives an overview of the phenomena caused by the singularities of statistical manifolds related to multilayer perceptrons and gaussian mixtures. We demonstrate our recent results on these problems. Simple toy models are also used to show explicit solutions. We explain that the maximum likelihood estimator is no longer subject to the gaussian distribution even asymptotically, because the Fisher information matrix degenerates, that the model selection criteria such as AIC, BIC, and MDL fail to hold in these models, that a smooth Bayesian prior becomes singular in such models, and that the trajectories of dynamics of learning are strongly affected by the singularity, causing plateaus or slow manifolds in the parameter space. The natural gradient method is shown to perform well because it takes the singular geometrical structure into account. The generalization error and the training error are studied in some examples.


Sign in / Sign up

Export Citation Format

Share Document