A dissimilarity measure for the k-Modes clustering algorithm

2012 ◽  
Vol 26 ◽  
pp. 120-127 ◽  
Author(s):  
Fuyuan Cao ◽  
Jiye Liang ◽  
Deyu Li ◽  
Liang Bai ◽  
Chuangyin Dang
2017 ◽  
Vol 2017 ◽  
pp. 1-7 ◽  
Author(s):  
Hongfang Zhou ◽  
Yihui Zhang ◽  
Yibin Liu

Thek-modes clustering algorithm has been widely used to cluster categorical data. In this paper, we firstly analyzed thek-modes algorithm and its dissimilarity measure. Based on this, we then proposed a novel dissimilarity measure, which is named as GRD. GRD considers not only the relationships between the object and all cluster modes but also the differences of different attributes. Finally the experiments were made on four real data sets from UCI. And the corresponding results show that GRD achieves better performance than two existing dissimilarity measures used ink-modes and Cao’s algorithms.


2007 ◽  
Vol 29 (3) ◽  
pp. 503-507 ◽  
Author(s):  
Michael Ng ◽  
Mark Li ◽  
Joshua Huang ◽  
Zengyou He

2016 ◽  
Vol 43 (4) ◽  
pp. 480-491 ◽  
Author(s):  
Dilip Singh Sisodia ◽  
Shrish Verma ◽  
Om Prakash Vyas

Clustering is a very useful technique to categorise Web users with common browsing activities, access patterns and navigational behaviour. Web user clustering is used to build Web visitor profiles that make the core of a personalised information recommender system. These systems are used to comprehend Web users surfing activities by offering tailored content to Web users with similar interests. The principle objective of Web user sessions clustering is to maximise the intra-group while minimising the inter-group similarity. Efficient clustering of Web users’ sessions not only depend on the clustering algorithm’s nature but also depend on how well user concerns are captured and accommodated by the dissimilarity measure that are used. Determining the right dissimilarity measure to capture the access behaviour of the Web user is very significant for substantial clustering. In this paper, an intuitive dissimilarity measure is presented to estimate a Web user’s concern from augmented Web user sessions. The proposed usage dissimilarity measure between two Web user sessions is based on the accessing page relevance, the syntactic structure of page URL and hierarchical structure of the website. This proposed intuitive dissimilarity measure was used with K-Medoids Clustering algorithm for experimentation and results were compared with other independent dissimilarity measures. The worth of the generated clusters were evaluated by two unsupervised cluster validity indexes. The experimental results show that intuitive augmented session dissimilarity measure is more realistic and superior as compared to the other independent dissimilarity measures regarding cluster validity indexes.


Entropy ◽  
2019 ◽  
Vol 21 (2) ◽  
pp. 215 ◽  
Author(s):  
Dragutin Mihailović ◽  
Emilija Nikolić-Đorić ◽  
Slavica Malinović-Milićević ◽  
Vijay Singh ◽  
Anja Mihailović ◽  
...  

The purpose of this paper was to choose an appropriate information dissimilarity measure for hierarchical clustering of daily streamflow discharge data, from twelve gauging stations on the Brazos River in Texas (USA), for the period 1989–2016. For that purpose, we selected and compared the average-linkage clustering hierarchical algorithm based on the compression-based dissimilarity measure (NCD), permutation distribution dissimilarity measure (PDDM), and Kolmogorov distance (KD). The algorithm was also compared with K-means clustering based on Kolmogorov complexity (KC), the highest value of Kolmogorov complexity spectrum (KCM), and the largest Lyapunov exponent (LLE). Using a dissimilarity matrix based on NCD, PDDM, and KD for daily streamflow, the agglomerative average-linkage hierarchical algorithm was applied. The key findings of this study are that: (i) The KD clustering algorithm is the most suitable among others; (ii) ANOVA analysis shows that there exist highly significant differences between mean values of four clusters, confirming that the choice of the number of clusters was suitably done; and (iii) from the clustering we found that the predictability of streamflow data of the Brazos River given by the Lyapunov time (LT), corrected for randomness by Kolmogorov time (KT) in days, lies in the interval from two to five days.


2020 ◽  
Vol 39 (6) ◽  
pp. 8139-8147
Author(s):  
Ranganathan Arun ◽  
Rangaswamy Balamurugan

In Wireless Sensor Networks (WSN) the energy of Sensor nodes is not certainly sufficient. In order to optimize the endurance of WSN, it is essential to minimize the utilization of energy. Head of group or Cluster Head (CH) is an eminent method to develop the endurance of WSN that aggregates the WSN with higher energy. CH for intra-cluster and inter-cluster communication becomes dependent. For complete, in WSN, the Energy level of CH extends its life of cluster. While evolving cluster algorithms, the complicated job is to identify the energy utilization amount of heterogeneous WSNs. Based on Chaotic Firefly Algorithm CH (CFACH) selection, the formulated work is named “Novel Distributed Entropy Energy-Efficient Clustering Algorithm”, in short, DEEEC for HWSNs. The formulated DEEEC Algorithm, which is a CH, has two main stages. In the first stage, the identification of temporary CHs along with its entropy value is found using the correlative measure of residual and original energy. Along with this, in the clustering algorithm, the rotating epoch and its entropy value must be predicted automatically by its sensor nodes. In the second stage, if any member in the cluster having larger residual energy, shall modify the temporary CHs in the direction of the deciding set. The target of the nodes with large energy has the probability to be CHs which is determined by the above two stages meant for CH selection. The MATLAB is required to simulate the DEEEC Algorithm. The simulated results of the formulated DEEEC Algorithm produce good results with respect to the energy and increased lifetime when it is correlated with the current traditional clustering protocols being used in the Heterogeneous WSNs.


Author(s):  
Mohana Priya K ◽  
Pooja Ragavi S ◽  
Krishna Priya G

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%


Sign in / Sign up

Export Citation Format

Share Document