Gaussian Mixture Models Based on Principal Components and Applications

Mathematical Problems in Engineering ◽

10.1155/2020/1202307 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Nada A. Alqahtani ◽

Zakiah I. Kalantan

Keyword(s):

Mixture Model ◽

Principal Components ◽

Gaussian Mixture Models ◽

Expectation Maximization Algorithm ◽

Large Data ◽

Real Data ◽

R Package ◽

Gaussian Mixture ◽

Machine Learning Algorithms ◽

Data Sets

Data scientists use various machine learning algorithms to discover patterns in large data that can lead to actionable insights. In general, high-dimensional data are reduced by obtaining a set of principal components so as to highlight similarities and differences. In this work, we deal with the reduced data using a bivariate mixture model and learning with a bivariate Gaussian mixture model. We discuss a heuristic for detecting important components by choosing the initial values of location parameters using two different techniques: cluster means, k-means and hierarchical clustering, and default values in the “mixtools” R package. The parameters of the model are obtained via an expectation maximization algorithm. The criteria from Bayesian point are evaluated for both techniques, demonstrating that both techniques are efficient with respect to computation capacity. The effectiveness of the discussed techniques is demonstrated through a simulation study and using real data sets from different fields.

Mixture Model Approach for Acoustic Emission Control of Cylindrical Pressure Equipment

Journal of Pressure Vessel Technology ◽

10.1115/1.2222377 ◽

2005 ◽

Vol 128 (3) ◽

pp. 479-483

Author(s):

Hani Hamdan ◽

Gérard Govaert

Keyword(s):

Acoustic Emission ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Expectation Maximization ◽

Gaussian Mixture Models ◽

Expectation Maximization Algorithm ◽

Real Data ◽

Gaussian Mixture ◽

Mixture Model Approach ◽

Model Approach

In this paper, we present a new and original mixture model approach for acoustic emission (AE) data clustering. AE techniques have been used in a variety of applications in industrial plants. These techniques can provide the most sophisticated monitoring test and can generally be done with the plant/pressure equipment operating at several conditions. Since the AE clusters may present several constraints (different proportions, volumes, orientations, and shapes), we propose to base the AE cluster analysis on Gaussian mixture models, which will be, in such situations, a powerful approach. Furthermore, the diagonal Gaussian mixture model seems to be well adapted to the detection and monitoring of defect classes since the weldings of cylindrical pressure equipment are lengthened horizontally and vertically (cluster shapes lengthened along the axes). The EM (Expectation-Maximization) algorithm applied to a diagonal Gaussian mixture model provides a satisfactory solution but the real time constraints imposed in our problem make the application of this algorithm impossible if the number of points becomes too big. The solution that we propose is to use the CEM (Classification Expectation-Maximization) algorithm, which converges faster and generates comparable solutions in terms of resulting partition. The practical results on real data are very satisfactory from the experts point of view.

Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models

Neural Computation ◽

10.1162/neco_a_00128 ◽

2011 ◽

Vol 23 (6) ◽

pp. 1605-1622 ◽

Cited By ~ 12

Author(s):

Lingyan Ruan ◽

Ming Yuan ◽

Hui Zou

Keyword(s):

Parameter Estimation ◽

Mixture Models ◽

Gaussian Mixture Models ◽

Expectation Maximization Algorithm ◽

Real Data ◽

Gaussian Mixture ◽

High Dimensional ◽

Model Based Clustering ◽

Text Type ◽

Effective Dimensionality

Finite gaussian mixture models are widely used in statistics thanks to their great flexibility. However, parameter estimation for gaussian mixture models with high dimensionality can be challenging because of the large number of parameters that need to be estimated. In this letter, we propose a penalized likelihood estimator to address this difficulty. The [Formula: see text]-type penalty we impose on the inverse covariance matrices encourages sparsity on its entries and therefore helps to reduce the effective dimensionality of the problem. We show that the proposed estimate can be efficiently computed using an expectation-maximization algorithm. To illustrate the practical merits of the proposed method, we consider its applications in model-based clustering and mixture discriminant analysis. Numerical experiments with both simulated and real data show that the new method is a valuable tool for high-dimensional data analysis.

Cluster Weighted Model Based on TSNE Algorithm for High-Dimensional Data

10.21203/rs.3.rs-347795/v1 ◽

2021 ◽

Author(s):

Kehinde Olobatuyi

Keyword(s):

Mixture Models ◽

Dimensional Space ◽

High Dimensional Data ◽

Expectation Maximization Algorithm ◽

Real Data ◽

R Package ◽

High Dimensional ◽

Data Sets ◽

Dimensionality Reduction Technique ◽

Weighted Model

Abstract Similar to many Machine Learning models, both accuracy and speed of the Cluster weighted models (CWMs) can be hampered by high-dimensional data, leading to previous works on a parsimonious technique to reduce the effect of ”Curse of dimensionality” on mixture models. In this work, we review the background study of the cluster weighted models (CWMs). We further show that parsimonious technique is not sufficient for mixture models to thrive in the presence of huge high-dimensional data. We discuss a heuristic for detecting the hidden components by choosing the initial values of location parameters using the default values in the ”FlexCWM” R package. We introduce a dimensionality reduction technique called T-distributed stochastic neighbor embedding (TSNE) to enhance the parsimonious CWMs in high-dimensional space. Originally, CWMs are suited for regression but for classification purposes, all multi-class variables are transformed logarithmically with some noise. The parameters of the model are obtained via expectation maximization algorithm. The effectiveness of the discussed technique is demonstrated using real data sets from different fields.

PCA reduced Gaussian mixture models with applications in superresolution

Inverse Problems and Imaging ◽

10.3934/ipi.2021053 ◽

2021 ◽

Vol 0 (0) ◽

pp. 0

Author(s):

Johannes Hertrich ◽

Dang-Phuong-Lan Nguyen ◽

Jean-Francois Aujol ◽

Dominique Bernard ◽

Yannick Berthoumieu ◽

...

Keyword(s):

Mixture Model ◽

Optimization Problems ◽

Rapid Development ◽

Gaussian Mixture Models ◽

Principal Component ◽

Gaussian Mixture ◽

Data Sets ◽

Constrained Problems ◽

Dimensional Parameters ◽

Low Dimensional

<p style='text-indent:20px;'>Despite the rapid development of computational hardware, the treatment of large and high dimensional data sets is still a challenging problem. The contribution of this paper to the topic is twofold. First, we propose a Gaussian mixture model in conjunction with a reduction of the dimensionality of the data in each component of the model by principal component analysis, which we call PCA-GMM. To learn the (low dimensional) parameters of the mixture model we propose an EM algorithm whose M-step requires the solution of constrained optimization problems. Fortunately, these constrained problems do not depend on the usually large number of samples and can be solved efficiently by an (inertial) proximal alternating linearized minimization algorithm. Second, we apply our PCA-GMM for the superresolution of 2D and 3D material images based on the approach of Sandeep and Jacob. Numerical results confirm the moderate influence of the dimensionality reduction on the overall superresolution result.</p>

Data Assimilation with Gaussian Mixture Models Using the Dynamically Orthogonal Field Equations. Part I: Theory and Scheme

Monthly Weather Review ◽

10.1175/mwr-d-11-00295.1 ◽

2013 ◽

Vol 141 (6) ◽

pp. 1737-1760 ◽

Cited By ~ 31

Author(s):

Thomas Sondergaard ◽

Pierre F. J. Lermusiaux

Keyword(s):

Data Assimilation ◽

Mixture Models ◽

Gaussian Mixture Models ◽

Expectation Maximization Algorithm ◽

Planck Equation ◽

Information Criterion ◽

Gaussian Mixture ◽

Field Equations ◽

Dynamical Equations ◽

Prior Probabilities

Abstract This work introduces and derives an efficient, data-driven assimilation scheme, focused on a time-dependent stochastic subspace that respects nonlinear dynamics and captures non-Gaussian statistics as it occurs. The motivation is to obtain a filter that is applicable to realistic geophysical applications, but that also rigorously utilizes the governing dynamical equations with information theory and learning theory for efficient Bayesian data assimilation. Building on the foundations of classical filters, the underlying theory and algorithmic implementation of the new filter are developed and derived. The stochastic Dynamically Orthogonal (DO) field equations and their adaptive stochastic subspace are employed to predict prior probabilities for the full dynamical state, effectively approximating the Fokker–Planck equation. At assimilation times, the DO realizations are fit to semiparametric Gaussian Mixture Models (GMMs) using the Expectation-Maximization algorithm and the Bayesian Information Criterion. Bayes’s law is then efficiently carried out analytically within the evolving stochastic subspace. The resulting GMM-DO filter is illustrated in a very simple example. Variations of the GMM-DO filter are also provided along with comparisons with related schemes.

Gaussian Mixture Model of Ground Filtering Based on Hierarchical Curvature Constraints for Airborne Lidar Point Clouds

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.20-00080 ◽

2021 ◽

Vol 87 (9) ◽

pp. 615-630

Author(s):

Longjie Ye ◽

Ka Zhang ◽

Wen Xiao ◽

Yehua Sheng ◽

Dong Su ◽

...

Keyword(s):

Gaussian Mixture Model ◽

Mixture Model ◽

Expectation Maximization ◽

Latent Variables ◽

Expectation Maximization Algorithm ◽

Point Clouds ◽

Gaussian Mixture ◽

Airborne Lidar ◽

Height Difference ◽

Hierarchical Constraints

This paper proposes a Gaussian mixture model of a ground filtering method based on hierarchical curvature constraints. Firstly, the thin plate spline function is iteratively applied to interpolate the reference surface. Secondly, gradually changing grid size and curvature threshold are used to construct hierarchical constraints. Finally, an adaptive height difference classifier based on the Gaussian mixture model is proposed. Using the latent variables obtained by the expectation-maximization algorithm, the posterior probability of each point is computed. As a result, ground and objects can be marked separately according to the calculated possibility. 15 data samples provided by the International Society for Photogrammetry and Remote Sensing are used to verify the proposed method, which is also compared with eight classical filtering algorithms. Experimental results demonstrate that the average total errors and average Cohen's kappa coefficient of the proposed method are 6.91% and 80.9%, respectively. In general, it has better performance in areas with terrain discontinuities and bridges.

MGMM: An R Package for fitting Gaussian Mixture Models on Incomplete Data

10.1101/2019.12.20.884551 ◽

2019 ◽

Cited By ~ 1

Author(s):

Zachary R. McCaw ◽

Hanna Julienne ◽

Hugues Aschard

Keyword(s):

Missing Data ◽

Mixture Models ◽

Gaussian Mixture Models ◽

Model Fitting ◽

Simulated Data ◽

R Package ◽

Gaussian Mixture ◽

Parameter Estimates ◽

Cluster Assignment ◽

Underlying Distribution

AbstractAlthough missing data are prevalent in applications, existing implementations of Gaussian mixture models (GMMs) require complete data. Standard practice is to perform complete case analysis or imputation prior to model fitting. Both approaches have serious drawbacks, potentially resulting in biased and unstable parameter estimates. Here we present MGMM, an R package for fitting GMMs in the presence of missing data. Using three case studies on real and simulated data sets, we demonstrate that, when the underlying distribution is near-to a GMM, MGMM is more effective at recovering the true cluster assignments than state of the art imputation followed by standard GMM. Moreover, MGMM provides an accurate assessment of cluster assignment uncertainty even when the generative distribution is not a GMM. This assessment may be used to identify unassignable observations. MGMM is available as an R package on CRAN: https://CRAN.R-project.org/package=MGMM.

Spike sorting with Gaussian mixture models

10.1101/248864 ◽

2018 ◽

Cited By ~ 2

Author(s):

Bryan C. Souza ◽

Vítor Lopes-dos-Santos ◽

João Bacelo ◽

Adriano B. L. Tort

Keyword(s):

Mixture Models ◽

Principal Components ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Spike Sorting ◽

Main Challenge ◽

Manual Adjustment ◽

Cluster Properties ◽

Spike Waveforms ◽

Friendly Graphical User Interface

AbstractThe shape of extracellularly recorded action potentials is a product of several variables, such as the biophysical and anatomical properties of the neuron and the relative position of the electrode. This allows for isolating spikes of different neurons recorded in the same channel into clusters based on waveform features. However, correctly classifying spike waveforms into their underlying neuronal sources remains a main challenge. This process, called spike sorting, typically consists of two steps: (1) extracting relevant waveform features (e.g., height, width), and (2) clustering them into non-overlapping groups believed to correspond to different neurons. In this study, we explored the performance of Gaussian mixture models (GMMs) in these two steps. We extracted relevant waveform features using a combination of common techniques (e.g., principal components and wavelets) and GMM fitting parameters (e.g., standard deviations and peak distances). Then, we developed an approach to perform unsupervised clustering using GMMs, which estimates cluster properties in a data-driven way. Our results show that the proposed GMM-based framework outperforms previously established methods when using realistic simulations of extracellular spikes and actual extracellular recordings to evaluate sorting performance. We also discuss potentially better techniques for feature extraction than the widely used principal components. Finally, we provide a friendly graphical user interface in MATLAB to run our algorithm, which allows for manual adjustment of the automatic results.

Deep Learning Approaches for Sentiment Analysis Challenges and Future Issues

10.4018/978-1-7998-8161-2.ch003 ◽

2022 ◽

pp. 27-50

Author(s):

Rajalaxmi Prabhu B. ◽

Seema S.

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Model Building ◽

Large Data ◽

Machine Learning Algorithms ◽

Large Data Sets ◽

Data Sets ◽

Learning Approaches ◽

Learning Techniques ◽

Important Challenge

A lot of user-generated data is available these days from huge platforms, blogs, websites, and other review sites. These data are usually unstructured. Analyzing sentiments from these data automatically is considered an important challenge. Several machine learning algorithms are implemented to check the opinions from large data sets. A lot of research has been undergone in understanding machine learning approaches to analyze sentiments. Machine learning mainly depends on the data required for model building, and hence, suitable feature exactions techniques also need to be carried. In this chapter, several deep learning approaches, its challenges, and future issues will be addressed. Deep learning techniques are considered important in predicting the sentiments of users. This chapter aims to analyze the deep-learning techniques for predicting sentiments and understanding the importance of several approaches for mining opinions and determining sentiment polarity.

A Gaussian Mixture Model with Firm Expectation-Maximization Algorithm for Effective Signal Power Coverage Estimation

Communications in Computer and Information Science - Information and Communication Technology and Applications ◽

10.1007/978-3-030-69143-1_8 ◽

2021 ◽

pp. 93-106

Author(s):

Isabona Joseph ◽

Ojuh O. Divine

Keyword(s):

Gaussian Mixture Model ◽

Mixture Model ◽

Expectation Maximization ◽

Expectation Maximization Algorithm ◽

Gaussian Mixture ◽

Signal Power ◽

Effective Signal ◽

Coverage Estimation