Choosing the Number of Clusters, Subset Selection of Variables, and Outlier Detection in the Standard Mixture-Model Cluster Analysis

AbstractA basic problem of cluster analysis is the determination or selection of the number of clusters evinced in any set of data. We address this issue with multinomial data using Akaike’s information criterion and demonstrate its utility in identifying an appropriate number of clusters of tumor types with similar profiles of cell surface antigens.

Download Full-text

Robust mixture model cluster analysis using adaptive kernels

Journal of Applied Statistics ◽

10.1080/02664763.2012.740630 ◽

2013 ◽

Vol 40 (2) ◽

pp. 320-336 ◽

Cited By ~ 3

Author(s):

J. Andrew Howe ◽

Hamparsum Bozdogan

Keyword(s):

Cluster Analysis ◽

Mixture Model ◽

Model Cluster

Download Full-text

ABOUT PARAMETRIZATION OF SELECTION OF SIGNIFICANT CLUSTERS

Interexpo GEO-Siberia ◽

10.33764/2618-981x-2019-4-1-64-67 ◽

2019 ◽

Vol 4 (1) ◽

pp. 64-67

Author(s):

Pavel Kim

Keyword(s):

Cluster Analysis ◽

A Priori ◽

Multidimensional Data ◽

Number Of Clusters ◽

Clustering Model ◽

Measure Of Similarity ◽

Selection Of

One of the fundamental tasks of cluster analysis is the partitioning of multidimensional data samples into groups of clusters – objects, which are closed in the sense of some given measure of similarity. In a some of problems, the number of clusters is set a priori, but more often it is required to determine them in the course of solving clustering. With a large number of clusters, especially if the data is “noisy,” the task becomes difficult for analyzing by experts, so it is artificially reduces the number of consideration clusters. The formal means of merging the “neighboring” clusters are considered, creating the basis for parameterizing the number of significant clusters in the “natural” clustering model [1].

Download Full-text

Mixture-Model Cluster Analysis Using Model Selection Criteria and a New Informational Measure of Complexity

Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach ◽

10.1007/978-94-011-0800-3_3 ◽

1994 ◽

pp. 69-113 ◽

Cited By ~ 52

Author(s):

Hamparsum Bozdogan

Keyword(s):

Cluster Analysis ◽

Model Selection ◽

Mixture Model ◽

Selection Criteria ◽

Model Cluster ◽

Model Selection Criteria

Download Full-text

Mixture-model cluster analysis using information theoretical criteria

Intelligent Data Analysis ◽

10.3233/ida-2007-11204 ◽

2007 ◽

Vol 11 (2) ◽

pp. 155-173 ◽

Cited By ~ 57

Author(s):

Jaime R.S. Fonseca ◽

Margarida G.M.S. Cardoso

Keyword(s):

Cluster Analysis ◽

Mixture Model ◽

Model Cluster

Download Full-text

SELECTION OF VARIABLES IN MARKETING BINARY DATA CLUSTER ANALYSIS

Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu ◽

10.15611/pn.2018.508.09 ◽

2018 ◽

pp. 89-95

Author(s):

Jerzy Korzeniewski

Keyword(s):

Cluster Analysis ◽

Binary Data ◽

Selection Of Variables ◽

Selection Of

Download Full-text

Multi-Attribute Utility Theory Based K-Means Clustering Applications

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2017040101 ◽

2017 ◽

Vol 13 (2) ◽

pp. 1-12 ◽

Cited By ~ 2

Author(s):

Jungmok Ma

Keyword(s):

Cluster Analysis ◽

Utility Theory ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

User Preferences ◽

Number Of Clusters ◽

Clustering Problem ◽

Multi Attribute Utility Theory ◽

Systematic Framework ◽

Selection Of

One of major obstacles in the application of the k-means clustering algorithm is the selection of the number of clusters k. The multi-attribute utility theory (MAUT)-based k-means clustering algorithm is proposed to tackle the problem by incorporating user preferences. Using MAUT, the decision maker's value structure for the number of clusters and other attributes can be quantitatively modeled, and it can be used as an objective function of the k-means. A target clustering problem for military targeting process is used to demonstrate the MAUT-based k-means and provide a comparative study. The result shows that the existing clustering algorithms do not necessarily reflect user preferences while the MAUT-based k-means provides a systematic framework of preferences modeling in cluster analysis.

Download Full-text

Weighting and selection of variables for cluster analysis

Journal of Classification ◽

10.1007/bf01202271 ◽

1995 ◽

Vol 12 (1) ◽

pp. 113-136 ◽

Cited By ~ 85

Author(s):

R. Gnanadesikan ◽

J. R. Kettenring ◽

S. L. Tsao

Keyword(s):

Cluster Analysis ◽

Selection Of Variables ◽

Selection Of

Download Full-text

Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures

Psychometrika ◽

10.1007/s11336-007-9019-y ◽

2007 ◽

Vol 73 (1) ◽

pp. 125-144 ◽

Cited By ~ 67

Author(s):

Douglas Steinley ◽

Michael J. Brusco

Keyword(s):

Cluster Analysis ◽

Empirical Comparison ◽

Selection Of Variables ◽

Selection Of

Download Full-text

Influence of meteorological input data on backtrajectory cluster analysis – a seven-year study for southeastern Spain

Advances in Science and Research ◽

10.5194/asr-2-65-2008 ◽

2008 ◽

Vol 2 (1) ◽

pp. 65-70 ◽

Cited By ~ 12

Author(s):

M. Cabello ◽

J. A. G. Orza ◽

V. Galiano ◽

G. Ruiz

Keyword(s):

Cluster Analysis ◽

Input Data ◽

Meteorological Data ◽

Data Sets ◽

Number Of Clusters ◽

Southeastern Spain ◽

Meteorological Input ◽

Initial Selection ◽

Southeast Spain ◽

Selection Of

Abstract. Backtrajectory differences and clustering sensitivity to the meteorological input data are studied. Trajectories arriving in Southeast Spain (Elche), at 3000, 1500 and 500 m for the 7-year period 2000–2006 have been computed employing two widely used meteorological data sets: the NCEP/NCAR Reanalysis and the FNL data sets. Differences between trajectories grow linearly at least up to 48 h, showing faster growing after 72 h. A k-means cluster analysis performed on each set of trajectories shows differences in the identified clusters (main flows), partially because the number of clusters of each clustering solution differs for the trajectories arriving at 3000 and 1500 m. Trajectory membership to the identified flows is in general more sensitive to the input meteorological data than to the initial selection of cluster centroids.

Download Full-text