CLUSTERING QUALITY MEASURES BASED ON COMPARING THE PROXIMITY MATRICES FOR THE MEMBERSHIP VECTORS AND THE OBJECTS

Author(s):  
ROELOF K. BROUWER

There are several commonly accepted clustering quality measures (clustering quality as opposed to cluster quality) such as the rand index, the adjusted rand index and the jacquard index. Each of these however is based on comparing the partition produced by the clustering process to a correct partition. They can therefore only be used to determine the quality of a clustering process when the correct partition is known. This paper therefore proposes another clustering quality measure that does not require the comparison to a correct partition. The proposed metric is based on the assumption that the proximities between the membership vectors should correlate positively with the proximities between the objects which may be the proximities between their feature vectors. The values of the components of the membership vector, corresponding to a pattern, are the membership degrees of the pattern in the various clusters. The membership vector is just another object data vector or type of feature vector with the feature values for an object being the membership values of the object in the various clusters. Based on this premise, this paper describes some new cluster quality metrics derived from standard correlation measures and other proposed correlation metrics. Simulations on data with a wide range of clusterability or separability show that the approach of comparing the proximity matrix based on the membership matrix to the object proximity matrix is quite effective.

2018 ◽  
Vol 16 (2) ◽  
pp. 107-119
Author(s):  
Supavit KONGWUDHIKUNAKORN ◽  
Kitsana WAIYAMAI

This paper presents a method for clustering short text documents, such as instant messages, SMS, or news headlines. Vocabularies in the texts are expanded using external knowledge sources and represented by a Distributed Word Representation. Clustering is done using the K-means algorithm with Word Mover's Distance as the distance metric. Experiments were done to compare the clustering quality of this method, and several leading methods, using large datasets from BBC headlines, SearchSnippets, StackExchange, and Twitter. For all datasets, the proposed algorithm produced document clusters with higher accuracy, precision, F1-score, and Adjusted Rand Index. We also observe that cluster description can be inferred from keywords represented in each cluster.


Entropy ◽  
2021 ◽  
Vol 23 (4) ◽  
pp. 421
Author(s):  
Dariusz Puchala ◽  
Kamil Stokfiszewski ◽  
Mykhaylo Yatsymirskyy

In this paper, the authors analyze in more details an image encryption scheme, proposed by the authors in their earlier work, which preserves input image statistics and can be used in connection with the JPEG compression standard. The image encryption process takes advantage of fast linear transforms parametrized with private keys and is carried out prior to the compression stage in a way that does not alter those statistical characteristics of the input image that are crucial from the point of view of the subsequent compression. This feature makes the encryption process transparent to the compression stage and enables the JPEG algorithm to maintain its full compression capabilities even though it operates on the encrypted image data. The main advantage of the considered approach is the fact that the JPEG algorithm can be used without any modifications as a part of the encrypt-then-compress image processing framework. The paper includes a detailed mathematical model of the examined scheme allowing for theoretical analysis of the impact of the image encryption step on the effectiveness of the compression process. The combinatorial and statistical analysis of the encryption process is also included and it allows to evaluate its cryptographic strength. In addition, the paper considers several practical use-case scenarios with different characteristics of the compression and encryption stages. The final part of the paper contains the additional results of the experimental studies regarding general effectiveness of the presented scheme. The results show that for a wide range of compression ratios the considered scheme performs comparably to the JPEG algorithm alone, that is, without the encryption stage, in terms of the quality measures of reconstructed images. Moreover, the results of statistical analysis as well as those obtained with generally approved quality measures of image cryptographic systems, prove high strength and efficiency of the scheme’s encryption stage.


Author(s):  
Vijay Kumar ◽  
Dinesh Kumar

The clustering techniques suffer from cluster centers initialization and local optima problems. In this chapter, the new metaheuristic algorithm, Sine Cosine Algorithm (SCA), is used as a search method to solve these problems. The SCA explores the search space of given dataset to find out the near-optimal cluster centers. The center based encoding scheme is used to evolve the cluster centers. The proposed SCA-based clustering technique is evaluated on four real-life datasets. The performance of SCA-based clustering is compared with recently developed clustering techniques. The experimental results reveal that SCA-based clustering gives better values in terms of cluster quality measures.


1996 ◽  
Vol 13 (1) ◽  
pp. 169-172 ◽  
Author(s):  
Robert Saltstone ◽  
Ken Stange

Author(s):  
Kazushi Okamoto ◽  

This study proposes the concept of families of triangular norm (t-norm)-based kernel functions, and discusses their positive-definite property and the conditions for applicable t-norms. A clustering experiment with kernel k-means is performed in order to analyze the characteristics of the proposed concept, as well as the effects of the t-norm and parameter selections. It is evaluated that the clusters obtained in terms of the adjusted rand index and the experimental results suggested the following : (1) the adjusted rand index values obtained by the proposed method were almost the same or higher than those produced using the linear kernel for all of the data sets; (2) the proposed method slightly improved the adjusted rand index values for some data sets compared with the radial basis function (RBF) kernel; (3) the proposed method tended to map data to a higher dimensional feature space than the linear kernel but the dimension was lower than that using the RBF kernel.


2011 ◽  
Vol 12 (Suppl 9) ◽  
pp. S9 ◽  
Author(s):  
Dunarel Badescu ◽  
Alix Boc ◽  
Abdoulaye Diallo ◽  
Vladimir Makarenkov

2018 ◽  
Vol 2018 ◽  
pp. 1-7 ◽  
Author(s):  
D. Ho-Kieu ◽  
T. Vo-Van ◽  
T. Nguyen-Trang

This paper proposes a novel and efficient clustering algorithm for probability density functions based on k-medoids. Further, a scheme used for selecting the powerful initial medoids is suggested, which speeds up the computational time significantly. Also, a general proof for convergence of the proposed algorithm is presented. The effectiveness and feasibility of the proposed algorithm are verified and compared with various existing algorithms through both artificial and real datasets in terms of adjusted Rand index, computational time, and iteration number. The numerical results reveal an outstanding performance of the proposed algorithm as well as its potential applications in real life.


Sign in / Sign up

Export Citation Format

Share Document