Active Sampling for Constrained Clustering
Constrained clustering is a framework for improving clustering performance by using constraints about data pairs. Since performance of constrained clustering depends on the set of constraints used, a method is needed to select good constraints that promote clustering performance. In this paper, we propose an active sampling method working with a constrained cluster ensemble algorithm that aggregates clustering results that a modified COP-Kmeans iteratively produces by changing the priorities of constraints. Our method follows the approach of uncertainty sampling and measures uncertainty using variations of clustering results where data pairs are clustered together in some results but not in others. It selects the data pair to be labeled that has the most variable result during cluster ensemble process. Experimental results show that our method outperforms random sampling. We further investigate the effect of important parameters.