CLUSTERING QUALITY MEASURES BASED ON COMPARING THE PROXIMITY MATRICES FOR THE MEMBERSHIP VECTORS AND THE OBJECTS

There are several commonly accepted clustering quality measures (clustering quality as opposed to cluster quality) such as the rand index, the adjusted rand index and the jacquard index. Each of these however is based on comparing the partition produced by the clustering process to a correct partition. They can therefore only be used to determine the quality of a clustering process when the correct partition is known. This paper therefore proposes another clustering quality measure that does not require the comparison to a correct partition. The proposed metric is based on the assumption that the proximities between the membership vectors should correlate positively with the proximities between the objects which may be the proximities between their feature vectors. The values of the components of the membership vector, corresponding to a pattern, are the membership degrees of the pattern in the various clusters. The membership vector is just another object data vector or type of feature vector with the feature values for an object being the membership values of the object in the various clusters. Based on this premise, this paper describes some new cluster quality metrics derived from standard correlation measures and other proposed correlation metrics. Simulations on data with a wide range of clusterability or separability show that the approach of comparing the proximity matrix based on the membership matrix to the object proximity matrix is quite effective.

Download Full-text

Short Text Document Clustering using Distributed Word Representation and Document Distance

Walailak Journal of Science and Technology (WJST) ◽

10.48048/wjst.2019.4133 ◽

2018 ◽

Vol 16 (2) ◽

pp. 107-119

Author(s):

Supavit KONGWUDHIKUNAKORN ◽

Kitsana WAIYAMAI

Keyword(s):

Large Datasets ◽

Rand Index ◽

Adjusted Rand Index ◽

Text Documents ◽

Short Text ◽

Text Document ◽

Clustering Quality ◽

Word Representation ◽

News Headlines

This paper presents a method for clustering short text documents, such as instant messages, SMS, or news headlines. Vocabularies in the texts are expanded using external knowledge sources and represented by a Distributed Word Representation. Clustering is done using the K-means algorithm with Word Mover's Distance as the distance metric. Experiments were done to compare the clustering quality of this method, and several leading methods, using large datasets from BBC headlines, SearchSnippets, StackExchange, and Twitter. For all datasets, the proposed algorithm produced document clusters with higher accuracy, precision, F1-score, and Adjusted Rand Index. We also observe that cluster description can be inferred from keywords represented in each cluster.

Download Full-text

Image Statistics Preserving Encrypt-then-Compress Scheme Dedicated for JPEG Compression Standard

Entropy ◽

10.3390/e23040421 ◽

2021 ◽

Vol 23 (4) ◽

pp. 421

Author(s):

Dariusz Puchala ◽

Kamil Stokfiszewski ◽

Mykhaylo Yatsymirskyy

Keyword(s):

Statistical Analysis ◽

Image Encryption ◽

Experimental Studies ◽

Quality Measures ◽

Input Image ◽

Jpeg Compression ◽

Image Statistics ◽

Compression Stage ◽

Wide Range ◽

The Impact

In this paper, the authors analyze in more details an image encryption scheme, proposed by the authors in their earlier work, which preserves input image statistics and can be used in connection with the JPEG compression standard. The image encryption process takes advantage of fast linear transforms parametrized with private keys and is carried out prior to the compression stage in a way that does not alter those statistical characteristics of the input image that are crucial from the point of view of the subsequent compression. This feature makes the encryption process transparent to the compression stage and enables the JPEG algorithm to maintain its full compression capabilities even though it operates on the encrypted image data. The main advantage of the considered approach is the fact that the JPEG algorithm can be used without any modifications as a part of the encrypt-then-compress image processing framework. The paper includes a detailed mathematical model of the examined scheme allowing for theoretical analysis of the impact of the image encryption step on the effectiveness of the compression process. The combinatorial and statistical analysis of the encryption process is also included and it allows to evaluate its cryptographic strength. In addition, the paper considers several practical use-case scenarios with different characteristics of the compression and encryption stages. The final part of the paper contains the additional results of the experimental studies regarding general effectiveness of the presented scheme. The results show that for a wide range of compression ratios the considered scheme performs comparably to the JPEG algorithm alone, that is, without the encryption stage, in terms of the quality measures of reconstructed images. Moreover, the results of statistical analysis as well as those obtained with generally approved quality measures of image cryptographic systems, prove high strength and efficiency of the scheme’s encryption stage.

Download Full-text

Clustering Quality Measures for Point Cloud Segmentation Tasks

Computer Vision and Graphics - Lecture Notes in Computer Science ◽

10.1007/978-3-030-00692-1_16 ◽

2018 ◽

pp. 173-186 ◽

Cited By ~ 2

Author(s):

Jakub Walczak ◽

Adam Wojciechowski

Keyword(s):

Point Cloud ◽

Quality Measures ◽

Point Cloud Segmentation ◽

Clustering Quality

Download Full-text

On the Use of the Adjusted Rand Index as a Metric for Evaluating Supervised Classification

Artificial Neural Networks – ICANN 2009 - Lecture Notes in Computer Science ◽

10.1007/978-3-642-04277-5_18 ◽

2009 ◽

pp. 175-184 ◽

Cited By ~ 82

Author(s):

Jorge M. Santos ◽

Mark Embrechts

Keyword(s):

Supervised Classification ◽

Rand Index ◽

Adjusted Rand Index

Download Full-text

Data Clustering Using Sine Cosine Algorithm

Handbook of Research on Machine Learning Innovations and Trends - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-2229-4.ch031 ◽

2017 ◽

pp. 715-726 ◽

Cited By ~ 11

Author(s):

Vijay Kumar ◽

Dinesh Kumar

Keyword(s):

Real Life ◽

Quality Measures ◽

Search Space ◽

Search Method ◽

Local Optima ◽

Encoding Scheme ◽

Clustering Techniques ◽

Sine Cosine Algorithm ◽

Cluster Quality ◽

Optimal Cluster

The clustering techniques suffer from cluster centers initialization and local optima problems. In this chapter, the new metaheuristic algorithm, Sine Cosine Algorithm (SCA), is used as a search method to solve these problems. The SCA explores the search space of given dataset to find out the near-optimal cluster centers. The center based encoding scheme is used to evolve the cluster centers. The proposed SCA-based clustering technique is evaluated on four real-life datasets. The performance of SCA-based clustering is compared with recently developed clustering techniques. The experimental results reveal that SCA-based clustering gives better values in terms of cluster quality measures.

Download Full-text

A computer program to calculate Hubert and Arabie's adjusted rand index

Journal of Classification ◽

10.1007/bf01202587 ◽

1996 ◽

Vol 13 (1) ◽

pp. 169-172 ◽

Cited By ~ 5

Author(s):

Robert Saltstone ◽

Ken Stange

Keyword(s):

Computer Program ◽

Rand Index ◽

Adjusted Rand Index

Download Full-text

Families of Triangular Norm-Based Kernel Functions and Their Application to Kernel k-Means

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2017.p0534 ◽

2017 ◽

Vol 21 (3) ◽

pp. 534-542

Author(s):

Kazushi Okamoto ◽

Keyword(s):

Feature Space ◽

Kernel Functions ◽

Rand Index ◽

Adjusted Rand Index ◽

Data Sets ◽

Linear Kernel ◽

Triangular Norm ◽

Higher Dimensional ◽

Rbf Kernel ◽

Map Data

This study proposes the concept of families of triangular norm (t-norm)-based kernel functions, and discusses their positive-definite property and the conditions for applicable t-norms. A clustering experiment with kernel k-means is performed in order to analyze the characteristics of the proposed concept, as well as the effects of the t-norm and parameter selections. It is evaluated that the clusters obtained in terms of the adjusted rand index and the experimental results suggested the following : (1) the adjusted rand index values obtained by the proposed method were almost the same or higher than those produced using the linear kernel for all of the data sets; (2) the proposed method slightly improved the adjusted rand index values for some data sets compared with the radial basis function (RBF) kernel; (3) the proposed method tended to map data to a higher dimensional feature space than the linear kernel but the dimension was lower than that using the RBF kernel.

Download Full-text

A Way to Obtain the Quality of a Partition by Adjusted Rand Index

2013 2nd Workshop-School on Theoretical Computer Science ◽

10.1109/weit.2013.33 ◽

2013 ◽

Cited By ~ 4

Author(s):

Rogerio R. de Vargas ◽

Benjamin R.C. Bedregal

Keyword(s):

Rand Index ◽

Adjusted Rand Index

Download Full-text

Detecting genomic regions associated with a disease using variability functions and Adjusted Rand Index

BMC Bioinformatics ◽

10.1186/1471-2105-12-s9-s9 ◽

2011 ◽

Vol 12 (Suppl 9) ◽

pp. S9 ◽

Cited By ~ 3

Author(s):

Dunarel Badescu ◽

Alix Boc ◽

Abdoulaye Diallo ◽

Vladimir Makarenkov

Keyword(s):

Rand Index ◽

Adjusted Rand Index ◽

Genomic Regions

Download Full-text

Clustering for Probability Density Functions by New k-Medoids Method

Scientific Programming ◽

10.1155/2018/2764016 ◽

2018 ◽

Vol 2018 ◽

pp. 1-7 ◽

Cited By ~ 1

Author(s):

D. Ho-Kieu ◽

T. Vo-Van ◽

T. Nguyen-Trang

Keyword(s):

Probability Density ◽

Clustering Algorithm ◽

Real Life ◽

Probability Density Functions ◽

Rand Index ◽

Computational Time ◽

Adjusted Rand Index ◽

Density Functions ◽

Potential Applications ◽

Iteration Number

This paper proposes a novel and efficient clustering algorithm for probability density functions based on k-medoids. Further, a scheme used for selecting the powerful initial medoids is suggested, which speeds up the computational time significantly. Also, a general proof for convergence of the proposed algorithm is presented. The effectiveness and feasibility of the proposed algorithm are verified and compared with various existing algorithms through both artificial and real datasets in terms of adjusted Rand index, computational time, and iteration number. The numerical results reveal an outstanding performance of the proposed algorithm as well as its potential applications in real life.

Download Full-text