Soft Set Multivariate Distribution for Categorical Data Clustering

In data mining ample techniques use distance based measures for data clustering. Improving clustering performance is the fundamental goal in cluster domain related tasks. Many techniques are available for clustering numerical data as well as categorical data. Clustering is an unsupervised learning technique and objects are grouped or clustered based on similarity among the objects. A new cluster similarity finding measure, which is cosine like cluster similarity measure (CLCSM), is proposed in this paper. The proposed cluster similarity measure is used for data classification. Extensive experiments are conducted by taking UCI machine learning datasets. The experimental results have shown that the proposed cosinelike cluster similarity measure is superior to many of the existing cluster similarity measures for data classification.

Download Full-text

Classification of Web Documents using Fuzzy Logic Categorical Data Clustering

IFIP The International Federation for Information Processing - Artificial Intelligence and Innovations 2007: from Theory to Applications ◽

10.1007/978-0-387-74161-1_11 ◽

2007 ◽

pp. 93-100 ◽

Cited By ~ 6

Author(s):

George E. Tsekouras ◽

Christos Anagnostopoulos ◽

Damianos Gavalas ◽

Economou Dafhi

Keyword(s):

Fuzzy Logic ◽

Categorical Data ◽

Data Clustering ◽

Web Documents ◽

Categorical Data Clustering

Download Full-text

High-performance link-based cluster ensemble approach for categorical data clustering

The Journal of Supercomputing ◽

10.1007/s11227-018-2526-z ◽

2018 ◽

Vol 76 (6) ◽

pp. 4556-4579 ◽

Cited By ~ 3

Author(s):

N. Yuvaraj ◽

C. Suresh Ghana Dhas

Keyword(s):

Categorical Data ◽

Data Clustering ◽

High Performance ◽

Cluster Ensemble ◽

Ensemble Approach ◽

Categorical Data Clustering

Download Full-text

Categorical Data Clustering

Encyclopedia of Machine Learning and Data Mining ◽

10.1007/978-1-4899-7502-7_35-1 ◽

2016 ◽

pp. 1-6

Author(s):

Periklis Andritsos ◽

Panayiotis Tsaparas

Keyword(s):

Categorical Data ◽

Data Clustering ◽

Categorical Data Clustering

Download Full-text

Categorical Data Clustering Based on Cluster Ensemble Process

Proceedings of the International Congress on Information and Communication Technology - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-10-0755-2_12 ◽

2016 ◽

pp. 101-111

Author(s):

D. Veeraiah ◽

D. Vasumathi

Keyword(s):

Categorical Data ◽

Data Clustering ◽

Cluster Ensemble ◽

Categorical Data Clustering

Download Full-text

Categorical Data Clustering

Encyclopedia of Machine Learning ◽

10.1007/978-0-387-30164-8_99 ◽

2011 ◽

pp. 154-159

Author(s):

Thomas R. Shultz ◽

Scott E. Fahlman ◽

Susan Craw ◽

Periklis Andritsos ◽

Panayiotis Tsaparas ◽

...

Keyword(s):

Categorical Data ◽

Data Clustering ◽

Categorical Data Clustering

Download Full-text

Categorical Data Clustering Method Based on Improved Fruit Fly Optimization Algorithm

Advances in Intelligent, Interactive Systems and Applications - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-030-02804-6_96 ◽

2019 ◽

pp. 736-744

Author(s):

Dong Li ◽

Huifeng Xue ◽

Wenyu Zhang ◽

Yan Zhang

Keyword(s):

Optimization Algorithm ◽

Categorical Data ◽

Data Clustering ◽

Fruit Fly ◽

Fruit Fly Optimization Algorithm ◽

Clustering Method ◽

Fruit Fly Optimization ◽

Categorical Data Clustering

Download Full-text

Understanding and Enhancement of Internal Clustering Validation Indexes for Categorical Data

Algorithms ◽

10.3390/a11110177 ◽

2018 ◽

Vol 11 (11) ◽

pp. 177 ◽

Cited By ~ 2

Author(s):

Xuedong Gao ◽

Minghan Yang

Keyword(s):

Machine Learning ◽

Categorical Data ◽

Data Clustering ◽

Information Gain ◽

Clustering Algorithms ◽

Number Of Clusters ◽

Cluster Compactness ◽

Clustering Validation ◽

Categorical Data Clustering

Clustering is one of the main tasks of machine learning. Internal clustering validation indexes (CVIs) are used to measure the quality of several clustered partitions to determine the local optimal clustering results in an unsupervised manner, and can act as the objective function of clustering algorithms. In this paper, we first studied several well-known internal CVIs for categorical data clustering, and proved the ineffectiveness of evaluating the partitions of different numbers of clusters without any inter-cluster separation measures or assumptions; the accurateness of separation, along with its coordination with the intra-cluster compactness measures, can notably affect performance. Then, aiming to enhance the internal clustering validation measurement, we proposed a new internal CVI—clustering utility based on the averaged information gain of isolating each cluster (CUBAGE)—which measures both the compactness and the separation of the partition. The experimental results supported our findings with regard to the existing internal CVIs, and showed that the proposed CUBAGE outperforms other internal CVIs with or without a pre-known number of clusters.

Download Full-text