Finite mixture models and model-based clustering

Model-based clustering with finite mixture models has become a widely used clustering method. One of the recent implementations is MCLUST. When objects to be clustered are summary statistics, such as regression coefficient estimates, they are naturally associated with estimation errors, whose covariance matrices can often be calculated exactly or approximated using asymptotic theory. This article proposes an extension to Gaussian finite mixture modeling—called MCLUST-ME—that properly accounts for the estimation errors. More specifically, we assume that the distribution of each observation consists of an underlying true component distribution and an independent measurement error distribution. Under this assumption, each unique value of estimation error covariance corresponds to its own classification boundary, which consequently results in a different grouping from MCLUST. Through simulation and application to an RNA-Seq data set, we discovered that under certain circumstances, explicitly, modeling estimation errors, improves clustering performance or provides new insights into the data, compared with when errors are simply ignored, whereas the degree of improvement depends on factors such as the distribution of error covariance matrices.

Download Full-text

Summarizing Finite Mixture Model with Overlapping Quantification

Entropy ◽

10.3390/e23111503 ◽

2021 ◽

Vol 23 (11) ◽

pp. 1503

Author(s):

Shunki Kyoya ◽

Kenji Yamanishi

Keyword(s):

Mutual Information ◽

Mixture Models ◽

Mixture Model ◽

Finite Mixture Models ◽

Finite Mixture Model ◽

Finite Mixture ◽

Model Based Clustering ◽

Multiple Components ◽

Clustering Data ◽

Novel Concept

Finite mixture models are widely used for modeling and clustering data. When they are used for clustering, they are often interpreted by regarding each component as one cluster. However, this assumption may be invalid when the components overlap. It leads to the issue of analyzing such overlaps to correctly understand the models. The primary purpose of this paper is to establish a theoretical framework for interpreting the overlapping mixture models by estimating how they overlap, using measures of information such as entropy and mutual information. This is achieved by merging components to regard multiple components as one cluster and summarizing the merging results. First, we propose three conditions that any merging criterion should satisfy. Then, we investigate whether several existing merging criteria satisfy the conditions and modify them to fulfill more conditions. Second, we propose a novel concept named clustering summarization to evaluate the merging results. In it, we can quantify how overlapped and biased the clusters are, using mutual information-based criteria. Using artificial and real datasets, we empirically demonstrate that our methods of modifying criteria and summarizing results are effective for understanding the cluster structures. We therefore give a new view of interpretability/explainability for model-based clustering.

Download Full-text

A Finite Mixture Modelling Perspective for Combining Experts’ Opinions with an Application to Quantile-Based Risk Measures

Risks ◽

10.3390/risks9060115 ◽

2021 ◽

Vol 9 (6) ◽

pp. 115

Author(s):

Despoina Makariou ◽

Pauline Barrieu ◽

George Tzougas

Keyword(s):

Decision Making ◽

Mixture Models ◽

Finite Mixture Models ◽

Risk Measures ◽

Finite Mixture ◽

Parametric Family ◽

Collective Decision Making ◽

Multiple Sources ◽

Mixture Modelling ◽

Expert Opinions

The key purpose of this paper is to present an alternative viewpoint for combining expert opinions based on finite mixture models. Moreover, we consider that the components of the mixture are not necessarily assumed to be from the same parametric family. This approach can enable the agent to make informed decisions about the uncertain quantity of interest in a flexible manner that accounts for multiple sources of heterogeneity involved in the opinions expressed by the experts in terms of the parametric family, the parameters of each component density, and also the mixing weights. Finally, the proposed models are employed for numerically computing quantile-based risk measures in a collective decision-making context.

Download Full-text