A Comparison of Categorical Attribute Data Clustering Methods

A study on Two-Stage Mixed Attribute Data Clustering Based on Density Peaks

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/5/2 ◽

2021 ◽

Vol 18 (5) ◽

Author(s):

Shihua Liu ◽

Hao Zhang ◽

Xianghua Liu

Keyword(s):

Data Clustering ◽

Clustering Algorithm ◽

Two Stage ◽

One Dimensional ◽

Attribute Data ◽

Numerical Attributes ◽

Density Peaks ◽

Density Peaks Clustering ◽

Categorical Attribute ◽

Attribute Clustering

A Two-stage clustering framework and a clustering algorithm for mixed attribute data based on density peaks and Goodall distance are proposed. Firstly, the subset of numerical attributes of the dataset is clustered, and then the result is mapped into one-dimensional categorical attribute and added to the subset of categorical attribute data. Finally, the new dataset is clustered by the density peaks clustering algorithm to obtain the final result. Experiments on three commonly used UCI datasets show that this algorithm can effectively realize mixed attribute clustering and produce better clustering results than the traditional K-prototypes algorithm do. The clustering accuracy on the Acute, Heart and Credit datasets are 17%, 24%, and 21% higher on average than that of the K-prototypes, respectively.

Download Full-text

KOMPARASI METODE CLUSTERING K-MEANS DAN K-MEDOIDS DENGAN MODEL FUZZY RFM UNTUK PENGELOMPOKAN PELANGGAN

Evolusi : Jurnal Sains dan Manajemen ◽

10.31294/evolusi.v6i2.4600 ◽

2018 ◽

Vol 6 (2) ◽

Author(s):

Elly Muningsih - AMIK BSI Yogyakarta

Keyword(s):

Data Clustering ◽

Small Data ◽

Clustering Methods ◽

Monetary Model ◽

Clustering Method ◽

Online Sales ◽

Rfm Model ◽

Potential Customers ◽

Cluster 2 ◽

Better Than

Abstract ~ The K-Means method is one of the clustering methods that is widely used in data clustering research. While the K-Medoids method is an efficient method used for processing small data. This study aims to compare two clustering methods by grouping customers into 3 clusters according to their characteristics, namely very potential (loyal) customers, potential customers and non potential customers. The method used in this study is the K-Means clustering method and the K-Medoids method. The data used is online sales transaction. The clustering method testing is done by using a Fuzzy RFM (Recency, Frequenty and Monetary) model where the average (mean) of the third value is taken. From the data testing is known that the K-Means method is better than the K-Medoids method with an accuracy value of 90.47%. Whereas from the data processing carried out is known that cluster 1 has 16 members (customers), cluster 2 has 11 members and cluster 3 has 15 members. Keywords : clustering, K-Means method, K-Medoids method, customer, Fuzzy RFM model. Abstrak ~ Metode K-Means merupakan salah satu metode clustering yang banyak digunakan dalam penelitian pengelompokan data. Sedangkan metode K-Medoids merupakan metode yang efisien digunakan untuk pengolahan data yang kecil. Penelitian ini bertujuan untuk membandingkan atau mengkomparasi dua metode clustering dengan cara mengelompokkan pelanggan menjadi 3 cluster sesuai dengan karakteristiknya, yaitu pelanggan sangat potensial (loyal), pelanggan potensial dan pelanggan kurang (tidak) potensial. Metode yang digunakan dalam penelitian ini adalah metode clustering K-Means dan metode K-Medoids. Data yang digunakan adalah data transaksi penjualan online. Pengujian metode clustering yang dilakukan adalah dengan menggunakan model Fuzzy RFM (Recency, Frequenty dan Monetary) dimana diambil rata-rata (mean) dari nilai ketiga tersebut. Dari pengujian data diketahui bahwa metode K-Means lebih baik dari metode K-Medoids dengan nilai akurasi 90,47%. Sedangkan dari pengolahan data yang dilakukan diketahui bahwa cluster 1 memiliki 16 anggota (pelanggan), cluster 2 memiliki 11 anggota dan cluster 3 memiliki 15 anggota. Kata kunci : clustering, metode K-Means, metode K-Medoids, pelanggan, model Fuzzy RFM.

Download Full-text

Enhanced Affinity for Spectral Clustering using Topological Node Features (TNFS)

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a9450.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 974-987

Keyword(s):

Local Structure ◽

Data Clustering ◽

Spectral Clustering ◽

Clustering Coefficient ◽

Complex Data ◽

Clustering Methods ◽

Pairwise Similarity ◽

Synthetic Datasets ◽

Summation Index ◽

Affinity Measure

Data clustering is an active topic of research as it has applications in various fields such as biology, management, statistics, pattern recognition, etc. Spectral Clustering (SC) has gained popularity in recent times due to its ability to handle complex data and ease of implementation. A crucial step in spectral clustering is the construction of the affinity matrix, which is based on a pairwise similarity measure. The varied characteristics of datasets affect the performance of a spectral clustering technique. In this paper, we have proposed an affinity measure based on Topological Node Features (TNFs) viz., Clustering Coefficient (CC) and Summation index (SI) to define the notion of density and local structure. It has been shown that these features improve the performance of SC in clustering the data. The experiments were conducted on synthetic datasets, UCI datasets, and the MNIST handwritten datasets. The results show that the proposed affinity metric outperforms several recent spectral clustering methods in terms of accuracy.

Download Full-text

Data Clustering Algorithms Using Rough Sets

Handbook of Research on Computational Intelligence for Engineering, Science, and Business ◽

10.4018/978-1-4666-2518-1.ch012 ◽

2013 ◽

pp. 297-327 ◽

Cited By ~ 6

Author(s):

B.K. Tripathy ◽

Adhir Ghosh

Keyword(s):

Comparative Study ◽

Rough Set ◽

Fuzzy Clustering ◽

Fuzzy Set ◽

Rough Sets ◽

Data Clustering ◽

Clustering Algorithms ◽

Clustering Methods ◽

Future Studies ◽

Multiple Clusters

Developing Data Clustering algorithms have been pursued by researchers since the introduction of k-means algorithm (Macqueen 1967; Lloyd 1982). These algorithms were subsequently modified to handle categorical data. In order to handle the situations where objects can have memberships in multiple clusters, fuzzy clustering and rough clustering methods were introduced (Lingras et al 2003, 2004a). There are many extensions of these initial algorithms (Lingras et al 2004b; Lingras 2007; Mitra 2004; Peters 2006, 2007). The MMR algorithm (Parmar et al 2007), its extensions (Tripathy et al 2009, 2011a, 2011b) and the MADE algorithm (Herawan et al 2010) use rough set techniques for clustering. In this chapter, the authors focus on rough set based clustering algorithms and provide a comparative study of all the fuzzy set based and rough set based clustering algorithms in terms of their efficiency. They also present problems for future studies in the direction of the topics covered.

Download Full-text

Data Clustering

Web Data Management Practices ◽

10.4018/978-1-59904-228-2.ch001 ◽

2007 ◽

pp. 1-33 ◽

Cited By ~ 4

Author(s):

Dušan Husek ◽

Jaroslav Pokorny ◽

Hana Rezankova ◽

Václav Snasel

Keyword(s):

Information Retrieval ◽

Data Clustering ◽

Important Task ◽

Clustering Methods ◽

Web Documents ◽

Web Communities

Document and information retrieval (IR) is an important task for Web communities. In this chapter, we introduce some clustering methods and focus on their use for the clustering, classification, and retrieval of Web documents.

Download Full-text

Single-cell RNA-seq data clustering: A survey with performance comparison study

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720020400053 ◽

2020 ◽

Vol 18 (04) ◽

pp. 2040005

Author(s):

Ruiyi Li ◽

Jihong Guan ◽

Shuigeng Zhou

Keyword(s):

Single Cell ◽

Data Clustering ◽

Performance Metrics ◽

Clustering Algorithms ◽

Cell Types ◽

Performance Comparison ◽

Cellular Heterogeneity ◽

Clustering Methods ◽

Multiple Perspectives ◽

Underlying Mechanisms

Clustering analysis has been widely applied to single-cell RNA-sequencing (scRNA-seq) data to discover cell types and cell states. Algorithms developed in recent years have greatly helped the understanding of cellular heterogeneity and the underlying mechanisms of biological processes. However, these algorithms often use different techniques, were evaluated on different datasets and compared with some of their counterparts usually using different performance metrics. Consequently, there lacks an accurate and complete picture of their merits and demerits, which makes it difficult for users to select proper algorithms for analyzing their data. To fill this gap, we first do a review on the major existing scRNA-seq data clustering methods, and then conduct a comprehensive performance comparison among them from multiple perspectives. We consider 13 state of the art scRNA-seq data clustering algorithms, and collect 12 publicly available real scRNA-seq datasets from the existing works to evaluate and compare these algorithms. Our comparative study shows that the existing methods are very diverse in performance. Even the top-performance algorithms do not perform well on all datasets, especially those with complex structures. This suggests that further research is required to explore more stable, accurate, and efficient clustering algorithms for scRNA-seq data.

Download Full-text

Evolutionary Algorithms for Robust Density-Based Data Clustering

ISRN Computational Mathematics ◽

10.1155/2013/931019 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Amit Banerjee

Keyword(s):

Evolutionary Algorithms ◽

Evolutionary Computation ◽

Data Clustering ◽

Relational Data ◽

Clustering Methods ◽

Density Based Clustering ◽

Selection Of

Density-based clustering methods are known to be robust against outliers in data; however, they are sensitive to user-specified parameters, the selection of which is not trivial. Moreover, relational data clustering is an area that has received considerably less attention than object data clustering. In this paper, two approaches to robust density-based clustering for relational data using evolutionary computation are investigated.

Download Full-text

Performance Comparison of Social Spider Optimization for Data Clustering with Other Clustering Methods

2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS) ◽

10.1109/iccons.2018.8662994 ◽

2018 ◽

Author(s):

T. Ravi Chandran ◽

A. V. Reddy ◽

B. Janet

Keyword(s):

Data Clustering ◽

Performance Comparison ◽

Clustering Methods ◽

Social Spider ◽

Social Spider Optimization

Download Full-text

Bayesian network model for quality control with categorical attribute data

Applied Soft Computing ◽

10.1016/j.asoc.2019.105746 ◽

2019 ◽

Vol 84 ◽

pp. 105746 ◽

Cited By ~ 2

Author(s):

Barry R. Cobb ◽

Linda Li

Keyword(s):

Quality Control ◽

Bayesian Network ◽

Network Model ◽

Bayesian Network Model ◽

Attribute Data ◽

Categorical Attribute

Download Full-text

A Unified Entropy-Based Distance Metric for Ordinal-and-Nominal-Attribute Data Clustering

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2019.2899381 ◽

2020 ◽

Vol 31 (1) ◽

pp. 39-52 ◽

Cited By ~ 1

Author(s):

Yiqun Zhang ◽

Yiu-Ming Cheung ◽

Kay Chen Tan

Keyword(s):

Data Clustering ◽

Distance Metric ◽

Attribute Data

Download Full-text