Spectral Clustering of Mixed-Type Data

Felix Mbuga; Cristina Tortora

doi:10.3390/stats5010001

Spectral Clustering of Mixed-Type Data

Stats ◽

10.3390/stats5010001 ◽

2021 ◽

Vol 5 (1) ◽

pp. 1-11

Author(s):

Felix Mbuga ◽

Cristina Tortora

Keyword(s):

Spectral Clustering ◽

Mixed Type ◽

New Method ◽

Gaussian Kernel ◽

Categorical Variables ◽

Continuous Data ◽

Dissimilarity Matrix ◽

Kernel Parameter ◽

Variable Weight ◽

Type Data

Cluster analysis seeks to assign objects with similar characteristics into groups called clusters so that objects within a group are similar to each other and dissimilar to objects in other groups. Spectral clustering has been shown to perform well in different scenarios on continuous data: it can detect convex and non-convex clusters, and can detect overlapping clusters. However, the constraint on continuous data can be limiting in real applications where data are often of mixed-type, i.e., data that contains both continuous and categorical features. This paper looks at extending spectral clustering to mixed-type data. The new method replaces the Euclidean-based similarity distance used in conventional spectral clustering with different dissimilarity measures for continuous and categorical variables. A global dissimilarity measure is than computed using a weighted sum, and a Gaussian kernel is used to convert the dissimilarity matrix into a similarity matrix. The new method includes an automatic tuning of the variable weight and kernel parameter. The performance of spectral clustering in different scenarios is compared with that of two state-of-the-art mixed-type data clustering methods, k-prototypes and KAMILA, using several simulated and real data sets.

Download Full-text

Towards Geostatistical Learning for the Geosciences: A Case Study in Improving the Spatial Awareness of Spectral Clustering

Mathematical Geosciences ◽

10.1007/s11004-020-09867-0 ◽

2020 ◽

Vol 52 (8) ◽

pp. 1035-1048

Author(s):

H. Talebi ◽

L. J. M. Peeters ◽

U. Mueller ◽

R. Tolosana-Delgado ◽

K. G. van den Boogaart

Keyword(s):

Statistical Learning ◽

Spatial Data ◽

Spectral Clustering ◽

Learning Algorithms ◽

Categorical Variables ◽

Spatial Awareness ◽

Dissimilarity Matrix ◽

Model Free ◽

Minimum Criteria ◽

Physical Realism

AbstractThe particularities of geosystems and geoscience data must be understood before any development or implementation of statistical learning algorithms. Without such knowledge, the predictions and inferences may not be accurate and physically consistent. Accuracy, transparency and interpretability, credibility, and physical realism are minimum criteria for statistical learning algorithms when applied to the geosciences. This study briefly reviews several characteristics of geoscience data and challenges for novel statistical learning algorithms. A novel spatial spectral clustering approach is introduced to illustrate how statistical learners can be adapted for modelling geoscience data. The spatial awareness and physical realism of the spectral clustering are improved by utilising a dissimilarity matrix based on nonparametric higher-order spatial statistics. The proposed model-free technique can identify meaningful spatial clusters (i.e. meaningful geographical subregions) from multivariate spatial data at different scales without the need to define a model of co-dependence. Several mixed (e.g. continuous and categorical) variables can be used as inputs to the proposed clustering technique. The proposed technique is illustrated using synthetic and real mining datasets. The results of the case studies confirm the usefulness of the proposed method for modelling spatial data.

Download Full-text

A Solution to Treat Mixed-Type Human Datasets from Socio-Ecological Systems

Journal of Environmental Geography ◽

10.2478/jengeo-2020-0012 ◽

2020 ◽

Vol 13 (3-4) ◽

pp. 51-60

Author(s):

Lisa B. Clark ◽

Eduardo González ◽

Annie L. Henry ◽

Anna A. Sher

Keyword(s):

Mixed Type ◽

Invasive Plant ◽

Mixed Data ◽

Categorical Variables ◽

Human Dimension ◽

Data Types ◽

Dissimilarity Matrix ◽

Homogeneous Groups ◽

Natural Systems ◽

Partitioning Around Medoids

Abstract Coupled human and natural systems (CHANS) are frequently represented by large datasets with varied data including continuous, ordinal, and categorical variables. Conventional multivariate analyses cannot handle these mixed data types. In this paper, our goal was to show how a clustering method that has not before been applied to understanding the human dimension of CHANS: a Gower dissimilarity matrix with partitioning around medoids (PAM) can be used to treat mixed-type human datasets. A case study of land managers responsible for invasive plant control projects across rivers of the southwestern U.S. was used to characterize managers’ backgrounds and decisions, and project properties through clustering. Results showed that managers could be classified as “federal multitaskers” or as “educated specialists”. Decisions were characterized by being either “quick and active” or “thorough and careful”. Project goals were either comprehensive with ecological goals or more limited in scope. This study shows that clustering with Gower and PAM can simplify the complex human dimension of this system, demonstrating the utility of this approach for systems frequently composed of mixed-type data such as CHANS. This clustering approach can be used to direct scientific recommendations towards homogeneous groups of managers and project types.

Download Full-text

Integrated dimensionality reduction technique for mixed-type data involving categorical values

Applied Soft Computing ◽

10.1016/j.asoc.2016.02.015 ◽

2016 ◽

Vol 43 ◽

pp. 199-209 ◽

Cited By ~ 5

Author(s):

Chung-Chian Hsu ◽

Wei-Hao Huang

Keyword(s):

Dimensionality Reduction ◽

Mixed Type ◽

Reduction Technique ◽

Dimensionality Reduction Technique ◽

Type Data

Download Full-text

Growing Self-Organizing Map with cross insert for mixed-type data clustering

Applied Soft Computing ◽

10.1016/j.asoc.2012.04.004 ◽

2012 ◽

Vol 12 (9) ◽

pp. 2856-2866 ◽

Cited By ~ 11

Author(s):

Wei-Shen Tai ◽

Chung-Chian Hsu

Keyword(s):

Data Clustering ◽

Mixed Type ◽

Self Organizing Map ◽

Type Data ◽

Self Organizing

Download Full-text

Clustering Mixed-Type Data: A Benchmark Study on KAMILA and K-Prototypes

Data Analysis and Rationality in a Complex World - Studies in Classification, Data Analysis, and Knowledge Organization ◽

10.1007/978-3-030-60104-1_10 ◽

2021 ◽

pp. 83-91

Author(s):

Jarrett Jimeno ◽

Madhumita Roy ◽

Cristina Tortora

Keyword(s):

Mixed Type ◽

Benchmark Study ◽

Type Data

Download Full-text

Co-clustering Based Exploratory Analysis of Mixed-Type Data Tables

Advances in Knowledge Discovery and Management - Studies in Computational Intelligence ◽

10.1007/978-3-030-18129-1_2 ◽

2019 ◽

pp. 23-41 ◽

Cited By ~ 1

Author(s):

Aichetou Bouchareb ◽

Marc Boullé ◽

Fabrice Clérot ◽

Fabrice Rossi

Keyword(s):

Mixed Type ◽

Exploratory Analysis ◽

Data Tables ◽

Type Data

Download Full-text

Cluster Analysis: An Application to a Real Mixed-Type Data Set

Models and Theories in Social Systems - Studies in Systems, Decision and Control ◽

10.1007/978-3-030-00084-4_27 ◽

2018 ◽

pp. 525-533 ◽

Cited By ~ 2

Author(s):

G. Caruso ◽

S. A. Gattone ◽

A. Balzanella ◽

T. Di Battista

Keyword(s):

Cluster Analysis ◽

Mixed Type ◽

Data Set ◽

Type Data

Download Full-text

An improved mixed-type data based kernel clustering algorithm

2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) ◽

10.1109/fskd.2016.7603350 ◽

2016 ◽

Author(s):

Min Ren ◽

Peiyu Liu ◽

Zhihao Wang ◽

Xiao Pan

Keyword(s):

Mixed Type ◽

Clustering Algorithm ◽

Kernel Clustering ◽

Type Data

Download Full-text

An Estimation of the Optimal Gaussian Kernel Parameter for Support Vector Classification

Lecture Notes in Computer Science - Advances in Neural Networks - ISNN 2008 ◽

10.1007/978-3-540-87732-5_70 ◽

2008 ◽

pp. 627-635 ◽

Cited By ~ 1

Author(s):

Wenjian Wang ◽

Liang Ma

Keyword(s):

Gaussian Kernel ◽

Support Vector ◽

Kernel Parameter

Download Full-text

An Adaptive Three-Way Clustering Algorithm for Mixed-Type Data

Lecture Notes in Computer Science - Foundations of Intelligent Systems ◽

10.1007/978-3-030-01851-1_36 ◽

2018 ◽

pp. 379-388 ◽

Cited By ~ 1

Author(s):

Jing Xiong ◽

Hong Yu

Keyword(s):

Mixed Type ◽

Clustering Algorithm ◽

Type Data

Download Full-text