Robust Clustering Method in the Presence of Scattered Observations

2016 ◽  
Vol 28 (6) ◽  
pp. 1141-1162
Author(s):  
Akifumi Notsu ◽  
Shinto Eguchi

Contamination of scattered observations, which are either featureless or unlike the other observations, frequently degrades the performance of standard methods such as K-means and model-based clustering. In this letter, we propose a robust clustering method in the presence of scattered observations called Gamma-clust. Gamma-clust is based on a robust estimation for cluster centers using gamma-divergence. It provides a proper solution for clustering in which the distributions for clustered data are nonnormal, such as t-distributions with different variance-covariance matrices and degrees of freedom. As demonstrated in a simulation study and data analysis, Gamma-clust is more flexible and provides superior results compared to the robustified K-means and model-based clustering.

2018 ◽  
Vol 29 (4) ◽  
pp. 791-819 ◽  
Author(s):  
Michael Fop ◽  
Thomas Brendan Murphy ◽  
Luca Scrucca

Genes ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 185 ◽  
Author(s):  
Wanli Zhang ◽  
Yanming Di

Model-based clustering with finite mixture models has become a widely used clustering method. One of the recent implementations is MCLUST. When objects to be clustered are summary statistics, such as regression coefficient estimates, they are naturally associated with estimation errors, whose covariance matrices can often be calculated exactly or approximated using asymptotic theory. This article proposes an extension to Gaussian finite mixture modeling—called MCLUST-ME—that properly accounts for the estimation errors. More specifically, we assume that the distribution of each observation consists of an underlying true component distribution and an independent measurement error distribution. Under this assumption, each unique value of estimation error covariance corresponds to its own classification boundary, which consequently results in a different grouping from MCLUST. Through simulation and application to an RNA-Seq data set, we discovered that under certain circumstances, explicitly, modeling estimation errors, improves clustering performance or provides new insights into the data, compared with when errors are simply ignored, whereas the degree of improvement depends on factors such as the distribution of error covariance matrices.


2009 ◽  
Vol 3 (0) ◽  
pp. 1473-1496 ◽  
Author(s):  
Hui Zhou ◽  
Wei Pan ◽  
Xiaotong Shen

2021 ◽  
pp. 133-178
Author(s):  
Magy Seif El-Nasr ◽  
Truong Huy Nguyen Dinh ◽  
Alessandro Canossa ◽  
Anders Drachen

This chapter discusses different clustering methods and their application to game data. In particular, the chapter details K-means, Fuzzy C-Means, Hierarchical Clustering, Archetypical Analysis, and Model-based clustering techniques. It discusses the disadvantages and advantages of the different methods and discusses when you may use one method vs. the other. It also identifies and shows you ways to visualize the results to make sense of the resulting clusters. It also includes details on how one would evaluate such clusters or go about applying the algorithms to a game dataset. The chapter includes labs to delve deeper into the application of these algorithms on real game data.


Sign in / Sign up

Export Citation Format

Share Document