Initial Seed Selection for Mixed Data Using Modified K-means Clustering Algorithm

This work provides a procedure with which to construct and visualize profiles, i.e., groups of individuals with similar characteristics, for weighted and mixed data by combining two classical multivariate techniques, multidimensional scaling (MDS) and the k-prototypes clustering algorithm. The well-known drawback of classical MDS in large datasets is circumvented by selecting a small random sample of the dataset, whose individuals are clustered by means of an adapted version of the k-prototypes algorithm and mapped via classical MDS. Gower’s interpolation formula is used to project remaining individuals onto the previous configuration. In all the process, Gower’s distance is used to measure the proximity between individuals. The methodology is illustrated on a real dataset, obtained from the Survey of Health, Ageing and Retirement in Europe (SHARE), which was carried out in 19 countries and represents over 124 million aged individuals in Europe. The performance of the method was evaluated through a simulation study, whose results point out that the new proposal solves the high computational cost of the classical MDS with low error.

Download Full-text

Variable Selection for Mixed Data Clustering: Application in Human Population Genomics

Journal of Classification ◽

10.1007/s00357-018-9301-y ◽

2019 ◽

Vol 37 (1) ◽

pp. 124-142 ◽

Cited By ~ 1

Author(s):

Matthieu Marbac ◽

Mohammed Sedki ◽

Tienne Patin

Keyword(s):

Variable Selection ◽

Data Clustering ◽

Human Population ◽

Population Genomics ◽

Mixed Data ◽

Selection For

Download Full-text

Energy Efficient Distributed Unequal Clustering Algorithm with Relay Node Selection for Underwater Wireless Sensor Networks

Emerging Trends in Computing and Expert Technology - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-3-030-32150-5_154 ◽

2019 ◽

pp. 1526-1536

Author(s):

M. Priyanga ◽

S. Leones Sherwin Vimalraj ◽

J. Lydia

Keyword(s):

Wireless Sensor Networks ◽

Sensor Networks ◽

Energy Efficient ◽

Clustering Algorithm ◽

Relay Node ◽

Wireless Sensor ◽

Underwater Wireless Sensor Networks ◽

Unequal Clustering ◽

Relay Node Selection ◽

Selection For

Download Full-text

A SELF-ORGANIZING MAP FOR MIXED CONTINUOUS AND CATEGORICAL DATA

International Journal of Computing ◽

10.47839/ijc.10.1.733 ◽

2011 ◽

pp. 24-32 ◽

Cited By ~ 1

Author(s):

Nicoleta Rogovschi ◽

Mustapha Lebbah ◽

Younès Bennani

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Mixed Data ◽

Categorical Variables ◽

Data Sets ◽

Self Organizing Map ◽

Data Set ◽

Public Data ◽

Self Organizing

Most traditional clustering algorithms are limited to handle data sets that contain either continuous or categorical variables. However data sets with mixed types of variables are commonly used in data mining field. In this paper we introduce a weighted self-organizing map for clustering, analysis and visualization mixed data (continuous/binary). The learning of weights and prototypes is done in a simultaneous manner assuring an optimized data clustering. More variables has a high weight, more the clustering algorithm will take into account the informations transmitted by these variables. The learning of these topological maps is combined with a weighting process of different variables by computing weights which influence the quality of clustering. We illustrate the power of this method with data sets taken from a public data set repository: a handwritten digit data set, Zoo data set and other three mixed data sets. The results show a good quality of the topological ordering and homogenous clustering.

Download Full-text