Improving categorical data clustering algorithm by weighting uncommon attribute value matches

This paper presents an improved Squeezer algorithm for categorical data clustering by giving greater weight to uncommon attribute value matches in similarity computations. Experimental results on real life datasets show that, the modified algorithm is superior to the original Squeezer algorithm and other clustering algorithm with respect to clustering accuracy.

Download Full-text

Universality of Logarithmic Loss in Fixed-Length Lossy Compression

Entropy ◽

10.3390/e21060580 ◽

2019 ◽

Vol 21 (6) ◽

pp. 580

Author(s):

Albert No

Keyword(s):

Categorical Data ◽

Data Clustering ◽

Clustering Algorithm ◽

Original Problem ◽

Lossy Compression ◽

Finite Alphabet ◽

Clustering Problem ◽

Fixed Length ◽

Logarithmic Loss ◽

Categorical Data Clustering

We established a universality of logarithmic loss over a finite alphabet as a distortion criterion in fixed-length lossy compression. For any fixed-length lossy-compression problem under an arbitrary distortion criterion, we show that there is an equivalent lossy-compression problem under logarithmic loss. The equivalence is in the strong sense that we show that finding good schemes in corresponding lossy compression under logarithmic loss is essentially equivalent to finding good schemes in the original problem. This equivalence relation also provides an algebraic structure in the reconstruction alphabet, which allows us to use known techniques in the clustering literature. Furthermore, our result naturally suggests a new clustering algorithm in the categorical data-clustering problem.

Download Full-text

A categorical data clustering algorithm and its efficient parallel implementation

2016 5th International Conference on Computer Science and Network Technology (ICCSNT) ◽

10.1109/iccsnt.2016.8070153 ◽

2016 ◽

Author(s):

Xiangwu Ding ◽

Jia Tan ◽

Mei Wang

Keyword(s):

Categorical Data ◽

Data Clustering ◽

Clustering Algorithm ◽

Parallel Implementation ◽

Categorical Data Clustering

Download Full-text

Generalized similarity measure for categorical data clustering

2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI) ◽

10.1109/icacci.2016.7732138 ◽

2016 ◽

Cited By ~ 1

Author(s):

Shruti Sharma ◽

Manoj Singh

Keyword(s):

Similarity Measure ◽

Categorical Data ◽

Data Clustering ◽

Categorical Data Clustering

Download Full-text

A Novel Cosine Similarity Like Data Clustering Method for Effective Data Classification in Data Mining

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.h6417.069820 ◽

2020 ◽

Vol 9 (8) ◽

pp. 340-346

Keyword(s):

Data Mining ◽

Similarity Measure ◽

Categorical Data ◽

Data Clustering ◽

Similarity Measures ◽

Numerical Data ◽

Data Classification ◽

Fundamental Goal ◽

Learning Technique ◽

Categorical Data Clustering

In data mining ample techniques use distance based measures for data clustering. Improving clustering performance is the fundamental goal in cluster domain related tasks. Many techniques are available for clustering numerical data as well as categorical data. Clustering is an unsupervised learning technique and objects are grouped or clustered based on similarity among the objects. A new cluster similarity finding measure, which is cosine like cluster similarity measure (CLCSM), is proposed in this paper. The proposed cluster similarity measure is used for data classification. Extensive experiments are conducted by taking UCI machine learning datasets. The experimental results have shown that the proposed cosinelike cluster similarity measure is superior to many of the existing cluster similarity measures for data classification.

Download Full-text

Improved minimum-minimum roughness algorithm for clustering categorical data

International Journal of ADVANCED AND APPLIED SCIENCES ◽

10.21833/ijaas.2021.10.006 ◽

2021 ◽

Vol 8 (10) ◽

pp. 43-50

Author(s):

Truong et al. ◽

Keyword(s):

Machine Learning ◽

Data Mining ◽

Hierarchical Clustering ◽

Categorical Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Experimental Results ◽

Data Sets ◽

Top Down ◽

Hierarchical Clustering Algorithm

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.

Download Full-text

Classification of Web Documents using Fuzzy Logic Categorical Data Clustering

IFIP The International Federation for Information Processing - Artificial Intelligence and Innovations 2007: from Theory to Applications ◽

10.1007/978-0-387-74161-1_11 ◽

2007 ◽

pp. 93-100 ◽

Cited By ~ 6

Author(s):

George E. Tsekouras ◽

Christos Anagnostopoulos ◽

Damianos Gavalas ◽

Economou Dafhi

Keyword(s):

Fuzzy Logic ◽

Categorical Data ◽

Data Clustering ◽

Web Documents ◽

Categorical Data Clustering

Download Full-text

High-performance link-based cluster ensemble approach for categorical data clustering

The Journal of Supercomputing ◽

10.1007/s11227-018-2526-z ◽

2018 ◽

Vol 76 (6) ◽

pp. 4556-4579 ◽

Cited By ~ 3

Author(s):

N. Yuvaraj ◽

C. Suresh Ghana Dhas

Keyword(s):

Categorical Data ◽

Data Clustering ◽

High Performance ◽

Cluster Ensemble ◽

Ensemble Approach ◽

Categorical Data Clustering

Download Full-text

Weighted k-Prototypes Clustering Algorithm Based on the Hybrid Dissimilarity Coefficient

Mathematical Problems in Engineering ◽

10.1155/2020/5143797 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Ziqi Jia ◽

Ling Song

Keyword(s):

Categorical Data ◽

Clustering Algorithm ◽

Numerical Data ◽

Experimental Results ◽

Cluster Center ◽

Real Dataset ◽

Dissimilarity Coefficient ◽

Initial Cluster ◽

Data Objects ◽

Selection Of

The k-prototypes algorithm is a hybrid clustering algorithm that can process Categorical Data and Numerical Data. In this study, the method of initial Cluster Center selection was improved and a new Hybrid Dissimilarity Coefficient was proposed. Based on the proposed Hybrid Dissimilarity Coefficient, a weighted k-prototype clustering algorithm based on the hybrid dissimilarity coefficient was proposed (WKPCA). The proposed WKPCA algorithm not only improves the selection of initial Cluster Centers, but also puts a new method to calculate the dissimilarity between data objects and Cluster Centers. The real dataset of UCI was used to test the WKPCA algorithm. Experimental results show that WKPCA algorithm is more efficient and robust than other k-prototypes algorithms.

Download Full-text