MiCS-P:Parallel Mutual-information Computation of Big Categorical Data on Spark

Author(s):  
Junli Li ◽  
Chaowei Zhang ◽  
Jifu Zhang ◽  
Xiao Qin ◽  
Lihua Hu
2009 ◽  
Vol 52 (1) ◽  
pp. 17-31 ◽  
Author(s):  
Chong Sun Hong ◽  
Beom Jun Kim

2008 ◽  
Vol 9 (2) ◽  
pp. 223-233 ◽  
Author(s):  
Zengyou He ◽  
Xiaofei Xu ◽  
Shengchun Deng

Author(s):  
Iwan Tri Riyadi Yanto ◽  
Ririn Setiyowati ◽  
Edi Sutoyo ◽  
Nur Azizah ◽  
Rasyidah

Clustering is a process of grouping a set of objects into multiple clusters, so that the collection of similar objects will be grouped into the same cluster and dissimilar objects will be grouped into other clusters. Fuzzy k-means algorithm is one of clustering algorithm by partitioning data into k clusters employing Euclidean distance as a distance function. This research discusses clustering categorical data using Fuzzy k-Means Kullback-Leibler Divergence. In the determination of the distance between data and center of cluster uses mutual information known as Kullback-Leibler Divergence distance between the joint distribution and the product distribution from two marginal distributions. Extensive theoretical analysis was performed to show the effectiveness of the proposed method. Moreover, the comparison results of the proposed method with Fuzzy Centroid and Fuzzy k-Partition approaches in terms of response time and clustering accuracy were also performed employing several datasets from UCI Machine Learning. The experiment results show that the proposed algorithm provides good results both from clustering quality and accuracy for clustering categorical data as compared to Fuzzy Centroid and Fuzzy k-Partition.


Sign in / Sign up

Export Citation Format

Share Document