CLUSTERING QUALITY MEASURES BASED ON COMPARING THE PROXIMITY MATRICES FOR THE MEMBERSHIP VECTORS AND THE OBJECTS
There are several commonly accepted clustering quality measures (clustering quality as opposed to cluster quality) such as the rand index, the adjusted rand index and the jacquard index. Each of these however is based on comparing the partition produced by the clustering process to a correct partition. They can therefore only be used to determine the quality of a clustering process when the correct partition is known. This paper therefore proposes another clustering quality measure that does not require the comparison to a correct partition. The proposed metric is based on the assumption that the proximities between the membership vectors should correlate positively with the proximities between the objects which may be the proximities between their feature vectors. The values of the components of the membership vector, corresponding to a pattern, are the membership degrees of the pattern in the various clusters. The membership vector is just another object data vector or type of feature vector with the feature values for an object being the membership values of the object in the various clusters. Based on this premise, this paper describes some new cluster quality metrics derived from standard correlation measures and other proposed correlation metrics. Simulations on data with a wide range of clusterability or separability show that the approach of comparing the proximity matrix based on the membership matrix to the object proximity matrix is quite effective.