Robust Clustering Method in the Presence of Scattered Observations

Contamination of scattered observations, which are either featureless or unlike the other observations, frequently degrades the performance of standard methods such as K-means and model-based clustering. In this letter, we propose a robust clustering method in the presence of scattered observations called Gamma-clust. Gamma-clust is based on a robust estimation for cluster centers using gamma-divergence. It provides a proper solution for clustering in which the distributions for clustered data are nonnormal, such as t-distributions with different variance-covariance matrices and degrees of freedom. As demonstrated in a simulation study and data analysis, Gamma-clust is more flexible and provides superior results compared to the robustified K-means and model-based clustering.

Download Full-text

Model-based clustering with sparse covariance matrices

Statistics and Computing ◽

10.1007/s11222-018-9838-y ◽

2018 ◽

Vol 29 (4) ◽

pp. 791-819 ◽

Cited By ~ 3

Author(s):

Michael Fop ◽

Thomas Brendan Murphy ◽

Luca Scrucca

Keyword(s):

Covariance Matrices ◽

Model Based Clustering ◽

Model Based

Download Full-text

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables

Electronic Journal of Statistics ◽

10.1214/08-ejs194 ◽

2008 ◽

Vol 2 (0) ◽

pp. 168-212 ◽

Cited By ~ 27

Author(s):

Benhuai Xie ◽

Wei Pan ◽

Xiaotong Shen

Keyword(s):

Covariance Matrices ◽

Model Based Clustering ◽

Model Based ◽

Grouped Variables

Download Full-text

Potentials of using social media to infer the longitudinal travel behavior: A sequential model-based clustering method

Transportation Research Part C Emerging Technologies ◽

10.1016/j.trc.2017.10.005 ◽

2017 ◽

Vol 85 ◽

pp. 396-414 ◽

Cited By ~ 28

Author(s):

Zhenhua Zhang ◽

Qing He ◽

Shanjiang Zhu

Keyword(s):

Social Media ◽

Travel Behavior ◽

Clustering Method ◽

Sequential Model ◽

Model Based Clustering ◽

Model Based

Download Full-text

Model-Based Clustering with Measurement or Estimation Errors

Genes ◽

10.3390/genes11020185 ◽

2020 ◽

Vol 11 (2) ◽

pp. 185 ◽

Cited By ~ 2

Author(s):

Wanli Zhang ◽

Yanming Di

Keyword(s):

Finite Mixture Models ◽

Estimation Error ◽

Covariance Matrices ◽

Finite Mixture ◽

Estimation Errors ◽

Data Set ◽

Component Distribution ◽

Model Based Clustering ◽

Model Based ◽

Error Covariance

Model-based clustering with finite mixture models has become a widely used clustering method. One of the recent implementations is MCLUST. When objects to be clustered are summary statistics, such as regression coefficient estimates, they are naturally associated with estimation errors, whose covariance matrices can often be calculated exactly or approximated using asymptotic theory. This article proposes an extension to Gaussian finite mixture modeling—called MCLUST-ME—that properly accounts for the estimation errors. More specifically, we assume that the distribution of each observation consists of an underlying true component distribution and an independent measurement error distribution. Under this assumption, each unique value of estimation error covariance corresponds to its own classification boundary, which consequently results in a different grouping from MCLUST. Through simulation and application to an RNA-Seq data set, we discovered that under certain circumstances, explicitly, modeling estimation errors, improves clustering performance or provides new insights into the data, compared with when errors are simply ignored, whereas the degree of improvement depends on factors such as the distribution of error covariance matrices.

Download Full-text

Penalized model-based clustering with unconstrained covariance matrices

Electronic Journal of Statistics ◽

10.1214/09-ejs487 ◽

2009 ◽

Vol 3 (0) ◽

pp. 1473-1496 ◽

Cited By ~ 24

Author(s):

Hui Zhou ◽

Wei Pan ◽

Xiaotong Shen

Keyword(s):

Covariance Matrices ◽

Model Based Clustering ◽

Model Based

Download Full-text

VAR Model Based Clustering Method for Multivariate Time Series Data

Journal of Mathematical Sciences ◽

10.1007/s10958-019-04201-4 ◽

2019 ◽

Vol 237 (6) ◽

pp. 754-765 ◽

Cited By ~ 1

Author(s):

S. Deb

Keyword(s):

Time Series ◽

Time Series Data ◽

Multivariate Time Series ◽

Series Data ◽

Var Model ◽

Clustering Method ◽

Model Based Clustering ◽

Model Based

Download Full-text

A Model-Based Clustering Method for Genomic Structural Variant Prediction and Genotyping Using Paired-End Sequencing Data

PLoS ONE ◽

10.1371/journal.pone.0052881 ◽

2012 ◽

Vol 7 (12) ◽

pp. e52881 ◽

Cited By ~ 11

Author(s):

Matthew Hayes ◽

Yoon Soo Pyon ◽

Jing Li

Keyword(s):

Sequencing Data ◽

Clustering Method ◽

Model Based Clustering ◽

Structural Variant ◽

Model Based ◽

Paired End Sequencing

Download Full-text

Model based clustering method as a new multivariate technique in high energy physics

Journal of Physics Conference Series ◽

10.1088/1742-6596/490/1/012225 ◽

2014 ◽

Vol 490 ◽

pp. 012225

Author(s):

Michal Štěpánek ◽

Jiří Franc ◽

Václav Kůs

Keyword(s):

High Energy Physics ◽

High Energy ◽

Clustering Method ◽

Model Based Clustering ◽

Multivariate Technique ◽

Model Based ◽

Energy Physics

Download Full-text

Clustering Methods in Game Data Science

10.1093/oso/9780192897879.003.0006 ◽

2021 ◽

pp. 133-178

Author(s):

Magy Seif El-Nasr ◽

Truong Huy Nguyen Dinh ◽

Alessandro Canossa ◽

Anders Drachen

Keyword(s):

Hierarchical Clustering ◽

Data Science ◽

The Other ◽

Clustering Methods ◽

Clustering Techniques ◽

Model Based Clustering ◽

Fuzzy C Means ◽

Model Based

This chapter discusses different clustering methods and their application to game data. In particular, the chapter details K-means, Fuzzy C-Means, Hierarchical Clustering, Archetypical Analysis, and Model-based clustering techniques. It discusses the disadvantages and advantages of the different methods and discusses when you may use one method vs. the other. It also identifies and shows you ways to visualize the results to make sense of the resulting clusters. It also includes details on how one would evaluate such clusters or go about applying the algorithms to a game dataset. The chapter includes labs to delve deeper into the application of these algorithms on real game data.

Download Full-text