Semi-Supervised Point Prototype Clustering
This paper describes a class of models we call semi-supervised clustering. Algorithms in this category are clustering methods that use information possessed by labeled training data Xd⊂ ℜp as well as structural information that resides in the unlabeled data Xu⊂ ℜp. The labels are used in conjunction with the unlabeled data to help clustering algorithms partition Xu ⊂ ℜp which then terminate without the capability to label other points in ℜp. This is very different from supervised learning, wherein the training data subsequently endow a classifier with the ability to label every point in ℜp. The methodology is applicable in domains such as image segmentation, where users may have a small set of labeled data, and can use it to semi-supervise classification of the remaining pixels in a single image. The model can be used with many different point prototype clustering algorithms. We illustrate how to attach it to a particular algorithm (fuzzy c-means). Then we give two numerical examples to show that it overcomes the failure of many point prototype clustering schemes when confronted with data that possess overlapping and/or non uniformly distributed clusters. Finally, the new method compares favorably to the fully supervised k nearest neighbor rule when applied to the Iris data.