scholarly journals Gaussian Mixture Models Based on Principal Components and Applications

2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Nada A. Alqahtani ◽  
Zakiah I. Kalantan

Data scientists use various machine learning algorithms to discover patterns in large data that can lead to actionable insights. In general, high-dimensional data are reduced by obtaining a set of principal components so as to highlight similarities and differences. In this work, we deal with the reduced data using a bivariate mixture model and learning with a bivariate Gaussian mixture model. We discuss a heuristic for detecting important components by choosing the initial values of location parameters using two different techniques: cluster means, k-means and hierarchical clustering, and default values in the “mixtools” R package. The parameters of the model are obtained via an expectation maximization algorithm. The criteria from Bayesian point are evaluated for both techniques, demonstrating that both techniques are efficient with respect to computation capacity. The effectiveness of the discussed techniques is demonstrated through a simulation study and using real data sets from different fields.


2005 ◽  
Vol 128 (3) ◽  
pp. 479-483
Author(s):  
Hani Hamdan ◽  
Gérard Govaert

In this paper, we present a new and original mixture model approach for acoustic emission (AE) data clustering. AE techniques have been used in a variety of applications in industrial plants. These techniques can provide the most sophisticated monitoring test and can generally be done with the plant/pressure equipment operating at several conditions. Since the AE clusters may present several constraints (different proportions, volumes, orientations, and shapes), we propose to base the AE cluster analysis on Gaussian mixture models, which will be, in such situations, a powerful approach. Furthermore, the diagonal Gaussian mixture model seems to be well adapted to the detection and monitoring of defect classes since the weldings of cylindrical pressure equipment are lengthened horizontally and vertically (cluster shapes lengthened along the axes). The EM (Expectation-Maximization) algorithm applied to a diagonal Gaussian mixture model provides a satisfactory solution but the real time constraints imposed in our problem make the application of this algorithm impossible if the number of points becomes too big. The solution that we propose is to use the CEM (Classification Expectation-Maximization) algorithm, which converges faster and generates comparable solutions in terms of resulting partition. The practical results on real data are very satisfactory from the experts point of view.



2011 ◽  
Vol 23 (6) ◽  
pp. 1605-1622 ◽  
Author(s):  
Lingyan Ruan ◽  
Ming Yuan ◽  
Hui Zou

Finite gaussian mixture models are widely used in statistics thanks to their great flexibility. However, parameter estimation for gaussian mixture models with high dimensionality can be challenging because of the large number of parameters that need to be estimated. In this letter, we propose a penalized likelihood estimator to address this difficulty. The [Formula: see text]-type penalty we impose on the inverse covariance matrices encourages sparsity on its entries and therefore helps to reduce the effective dimensionality of the problem. We show that the proposed estimate can be efficiently computed using an expectation-maximization algorithm. To illustrate the practical merits of the proposed method, we consider its applications in model-based clustering and mixture discriminant analysis. Numerical experiments with both simulated and real data show that the new method is a valuable tool for high-dimensional data analysis.



2021 ◽  
Author(s):  
Kehinde Olobatuyi

Abstract Similar to many Machine Learning models, both accuracy and speed of the Cluster weighted models (CWMs) can be hampered by high-dimensional data, leading to previous works on a parsimonious technique to reduce the effect of ”Curse of dimensionality” on mixture models. In this work, we review the background study of the cluster weighted models (CWMs). We further show that parsimonious technique is not sufficient for mixture models to thrive in the presence of huge high-dimensional data. We discuss a heuristic for detecting the hidden components by choosing the initial values of location parameters using the default values in the ”FlexCWM” R package. We introduce a dimensionality reduction technique called T-distributed stochastic neighbor embedding (TSNE) to enhance the parsimonious CWMs in high-dimensional space. Originally, CWMs are suited for regression but for classification purposes, all multi-class variables are transformed logarithmically with some noise. The parameters of the model are obtained via expectation maximization algorithm. The effectiveness of the discussed technique is demonstrated using real data sets from different fields.



2021 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
Johannes Hertrich ◽  
Dang-Phuong-Lan Nguyen ◽  
Jean-Francois Aujol ◽  
Dominique Bernard ◽  
Yannick Berthoumieu ◽  
...  

<p style='text-indent:20px;'>Despite the rapid development of computational hardware, the treatment of large and high dimensional data sets is still a challenging problem. The contribution of this paper to the topic is twofold. First, we propose a Gaussian mixture model in conjunction with a reduction of the dimensionality of the data in each component of the model by principal component analysis, which we call PCA-GMM. To learn the (low dimensional) parameters of the mixture model we propose an EM algorithm whose M-step requires the solution of constrained optimization problems. Fortunately, these constrained problems do not depend on the usually large number of samples and can be solved efficiently by an (inertial) proximal alternating linearized minimization algorithm. Second, we apply our PCA-GMM for the superresolution of 2D and 3D material images based on the approach of Sandeep and Jacob. Numerical results confirm the moderate influence of the dimensionality reduction on the overall superresolution result.</p>



2013 ◽  
Vol 141 (6) ◽  
pp. 1737-1760 ◽  
Author(s):  
Thomas Sondergaard ◽  
Pierre F. J. Lermusiaux

Abstract This work introduces and derives an efficient, data-driven assimilation scheme, focused on a time-dependent stochastic subspace that respects nonlinear dynamics and captures non-Gaussian statistics as it occurs. The motivation is to obtain a filter that is applicable to realistic geophysical applications, but that also rigorously utilizes the governing dynamical equations with information theory and learning theory for efficient Bayesian data assimilation. Building on the foundations of classical filters, the underlying theory and algorithmic implementation of the new filter are developed and derived. The stochastic Dynamically Orthogonal (DO) field equations and their adaptive stochastic subspace are employed to predict prior probabilities for the full dynamical state, effectively approximating the Fokker–Planck equation. At assimilation times, the DO realizations are fit to semiparametric Gaussian Mixture Models (GMMs) using the Expectation-Maximization algorithm and the Bayesian Information Criterion. Bayes’s law is then efficiently carried out analytically within the evolving stochastic subspace. The resulting GMM-DO filter is illustrated in a very simple example. Variations of the GMM-DO filter are also provided along with comparisons with related schemes.



2021 ◽  
Vol 87 (9) ◽  
pp. 615-630
Author(s):  
Longjie Ye ◽  
Ka Zhang ◽  
Wen Xiao ◽  
Yehua Sheng ◽  
Dong Su ◽  
...  

This paper proposes a Gaussian mixture model of a ground filtering method based on hierarchical curvature constraints. Firstly, the thin plate spline function is iteratively applied to interpolate the reference surface. Secondly, gradually changing grid size and curvature threshold are used to construct hierarchical constraints. Finally, an adaptive height difference classifier based on the Gaussian mixture model is proposed. Using the latent variables obtained by the expectation-maximization algorithm, the posterior probability of each point is computed. As a result, ground and objects can be marked separately according to the calculated possibility. 15 data samples provided by the International Society for Photogrammetry and Remote Sensing are used to verify the proposed method, which is also compared with eight classical filtering algorithms. Experimental results demonstrate that the average total errors and average Cohen's kappa coefficient of the proposed method are 6.91% and 80.9%, respectively. In general, it has better performance in areas with terrain discontinuities and bridges.



Author(s):  
Zachary R. McCaw ◽  
Hanna Julienne ◽  
Hugues Aschard

AbstractAlthough missing data are prevalent in applications, existing implementations of Gaussian mixture models (GMMs) require complete data. Standard practice is to perform complete case analysis or imputation prior to model fitting. Both approaches have serious drawbacks, potentially resulting in biased and unstable parameter estimates. Here we present MGMM, an R package for fitting GMMs in the presence of missing data. Using three case studies on real and simulated data sets, we demonstrate that, when the underlying distribution is near-to a GMM, MGMM is more effective at recovering the true cluster assignments than state of the art imputation followed by standard GMM. Moreover, MGMM provides an accurate assessment of cluster assignment uncertainty even when the generative distribution is not a GMM. This assessment may be used to identify unassignable observations. MGMM is available as an R package on CRAN: https://CRAN.R-project.org/package=MGMM.



2018 ◽  
Author(s):  
Bryan C. Souza ◽  
Vítor Lopes-dos-Santos ◽  
João Bacelo ◽  
Adriano B. L. Tort

AbstractThe shape of extracellularly recorded action potentials is a product of several variables, such as the biophysical and anatomical properties of the neuron and the relative position of the electrode. This allows for isolating spikes of different neurons recorded in the same channel into clusters based on waveform features. However, correctly classifying spike waveforms into their underlying neuronal sources remains a main challenge. This process, called spike sorting, typically consists of two steps: (1) extracting relevant waveform features (e.g., height, width), and (2) clustering them into non-overlapping groups believed to correspond to different neurons. In this study, we explored the performance of Gaussian mixture models (GMMs) in these two steps. We extracted relevant waveform features using a combination of common techniques (e.g., principal components and wavelets) and GMM fitting parameters (e.g., standard deviations and peak distances). Then, we developed an approach to perform unsupervised clustering using GMMs, which estimates cluster properties in a data-driven way. Our results show that the proposed GMM-based framework outperforms previously established methods when using realistic simulations of extracellular spikes and actual extracellular recordings to evaluate sorting performance. We also discuss potentially better techniques for feature extraction than the widely used principal components. Finally, we provide a friendly graphical user interface in MATLAB to run our algorithm, which allows for manual adjustment of the automatic results.



2022 ◽  
pp. 27-50
Author(s):  
Rajalaxmi Prabhu B. ◽  
Seema S.

A lot of user-generated data is available these days from huge platforms, blogs, websites, and other review sites. These data are usually unstructured. Analyzing sentiments from these data automatically is considered an important challenge. Several machine learning algorithms are implemented to check the opinions from large data sets. A lot of research has been undergone in understanding machine learning approaches to analyze sentiments. Machine learning mainly depends on the data required for model building, and hence, suitable feature exactions techniques also need to be carried. In this chapter, several deep learning approaches, its challenges, and future issues will be addressed. Deep learning techniques are considered important in predicting the sentiments of users. This chapter aims to analyze the deep-learning techniques for predicting sentiments and understanding the importance of several approaches for mining opinions and determining sentiment polarity.



Sign in / Sign up

Export Citation Format

Share Document