scholarly journals Data Assimilation with Gaussian Mixture Models Using the Dynamically Orthogonal Field Equations. Part I: Theory and Scheme

2013 ◽  
Vol 141 (6) ◽  
pp. 1737-1760 ◽  
Author(s):  
Thomas Sondergaard ◽  
Pierre F. J. Lermusiaux

Abstract This work introduces and derives an efficient, data-driven assimilation scheme, focused on a time-dependent stochastic subspace that respects nonlinear dynamics and captures non-Gaussian statistics as it occurs. The motivation is to obtain a filter that is applicable to realistic geophysical applications, but that also rigorously utilizes the governing dynamical equations with information theory and learning theory for efficient Bayesian data assimilation. Building on the foundations of classical filters, the underlying theory and algorithmic implementation of the new filter are developed and derived. The stochastic Dynamically Orthogonal (DO) field equations and their adaptive stochastic subspace are employed to predict prior probabilities for the full dynamical state, effectively approximating the Fokker–Planck equation. At assimilation times, the DO realizations are fit to semiparametric Gaussian Mixture Models (GMMs) using the Expectation-Maximization algorithm and the Bayesian Information Criterion. Bayes’s law is then efficiently carried out analytically within the evolving stochastic subspace. The resulting GMM-DO filter is illustrated in a very simple example. Variations of the GMM-DO filter are also provided along with comparisons with related schemes.

2011 ◽  
Vol 23 (6) ◽  
pp. 1605-1622 ◽  
Author(s):  
Lingyan Ruan ◽  
Ming Yuan ◽  
Hui Zou

Finite gaussian mixture models are widely used in statistics thanks to their great flexibility. However, parameter estimation for gaussian mixture models with high dimensionality can be challenging because of the large number of parameters that need to be estimated. In this letter, we propose a penalized likelihood estimator to address this difficulty. The [Formula: see text]-type penalty we impose on the inverse covariance matrices encourages sparsity on its entries and therefore helps to reduce the effective dimensionality of the problem. We show that the proposed estimate can be efficiently computed using an expectation-maximization algorithm. To illustrate the practical merits of the proposed method, we consider its applications in model-based clustering and mixture discriminant analysis. Numerical experiments with both simulated and real data show that the new method is a valuable tool for high-dimensional data analysis.


2011 ◽  
Vol 474-476 ◽  
pp. 442-447
Author(s):  
Zhi Gao Zeng ◽  
Li Xin Ding ◽  
Sheng Qiu Yi ◽  
San You Zeng ◽  
Zi Hua Qiu

In order to improve the accuracy of the image segmentation in video surveillance sequences and to overcome the limits of the traditional clustering algorithms that can not accurately model the image data sets which Contains noise data, the paper presents an automatic and accurate video image segmentation algorithm, according to the spatial properties, which uses the Gaussian mixture models to segment the image. But the expectation-maximization algorithm is very sensitive to initial values, and easy to fall into local optimums, so the paper presents a differential evolution-based parameters estimation for Gaussian mixture models. The experiment result shows that the segmentation accuracy has been improved greatly than by the traditional segmentation algorithms.


2013 ◽  
Vol 141 (6) ◽  
pp. 1761-1785 ◽  
Author(s):  
Thomas Sondergaard ◽  
Pierre F. J. Lermusiaux

Abstract The properties and capabilities of the Gaussian Mixture Model–Dynamically Orthogonal filter (GMM-DO) are assessed and exemplified by applications to two dynamical systems: 1) the double well diffusion and 2) sudden expansion flows; both of which admit far-from-Gaussian statistics. The former test case, or twin experiment, validates the use of the Expectation-Maximization (EM) algorithm and Bayesian Information Criterion with GMMs in a filtering context; the latter further exemplifies its ability to efficiently handle state vectors of nontrivial dimensionality and dynamics with jets and eddies. For each test case, qualitative and quantitative comparisons are made with contemporary filters. The sensitivity to input parameters is illustrated and discussed. Properties of the filter are examined and its estimates are described, including the equation-based and adaptive prediction of the probability densities; the evolution of the mean field, stochastic subspace modes, and stochastic coefficients; the fitting of GMMs; and the efficient and analytical Bayesian updates at assimilation times and the corresponding data impacts. The advantages of respecting nonlinear dynamics and preserving non-Gaussian statistics are brought to light. For realistic test cases admitting complex distributions and with sparse or noisy measurements, the GMM-DO filter is shown to fundamentally improve the filtering skill, outperforming simpler schemes invoking the Gaussian parametric distribution.


2019 ◽  
Vol 19 (11) ◽  
pp. 2050204
Author(s):  
Sara Shirinkam ◽  
Adel Alaeddini ◽  
Elizabeth Gross

Using Gaussian mixture models for clustering is a statistically mature method for clustering in data science with numerous successful applications in science and engineering. The parameters for a Gaussian mixture model (GMM) are typically estimated from training data using the iterative expectation-maximization algorithm, which requires the number of Gaussian components a priori. In this study, we propose two algorithms rooted in numerical algebraic geometry (NAG), namely, an area-based algorithm and a local maxima algorithm, to identify the optimal number of components. The area-based algorithm transforms several GMM with varying number of components into sets of equivalent polynomial regression splines. Next, it uses homotopy continuation methods for evaluating the resulting splines to identify the number of components that is most compatible with the gradient data. The local maxima algorithm forms a set of polynomials by fitting a smoothing spline to a dataset. Next, it uses NAG to solve the system of the first derivatives for finding the local maxima of the resulting smoothing spline, which represent the number of mixture components. The local maxima algorithm also identifies the location of the centers of Gaussian components. Using a real-world case study in automotive manufacturing and extensive simulations, we demonstrate that the performance of the proposed algorithms is comparable with that of Akaike information criterion (AIC) and Bayesian information criterion (BIC), which are popular methods in the literature. We also show the proposed algorithms are more robust than AIC and BIC when the Gaussian assumption is violated.


Author(s):  
Ching-Hua Chuan

This paper presents an audio classification and retrieval system using wavelets for extracting low-level acoustic features. The author performed multiple-level decomposition using discrete wavelet transform to extract acoustic features from audio recordings at different scales and times. The extracted features are then translated into a compact vector representation. Gaussian mixture models with expectation maximization algorithm are used to build models for audio classes and individual audio examples. The system is evaluated using three audio classification tasks: speech/music, male/female speech, and music genre. They also show how wavelets and Gaussian mixture models are used for class-based audio retrieval in two approaches: indexing using only wavelets versus indexing by Gaussian components. By evaluating the system through 10-fold cross-validation, the author shows the promising capability of wavelets and Gaussian mixture models for audio classification and retrieval. They also compare how parameters including frame size, wavelet level, Gaussian components, and sampling size affect performance in Gaussian models.


Filomat ◽  
2019 ◽  
Vol 33 (15) ◽  
pp. 4753-4767
Author(s):  
Khalil Masmoudi ◽  
Afif Masmoudi

In this paper, we introduce finite mixture models with singular multivariate normal components. These models are useful when the observed data involves collinearities, that is when the covariance matrices are singular. They are also useful when the covariance matrices are ill-conditioned. In the latter case, the classical approaches may lead to numerical instabilities and give inaccurate estimations. Hence, an extension of the Expectation Maximization algorithm, with complete proof, is proposed to derive the maximum likelihood estimators and cluster the data instances for mixtures of singular multivariate normal distributions. The accuracy of the proposed algorithm is then demonstrated on the grounds of several numerical experiments. Finally, we discuss the application of the proposed distribution to financial asset returns modeling and portfolio selection.


Sign in / Sign up

Export Citation Format

Share Document