Graph Based Semi-Supervised Learning Method for Imbalanced Dataset
In real application areas, the dataset used may be highly imbalanced and the number of instances for some classes are much higher than that of the other classes. When learning from highly imbalanced dataset, the classifier tends to be adapted to suit the majority class, which might make classifier to obtain a high predictive accuracy over the majority class, but poor accuracy over the minority class. To solve this problem, we put forward a novel graph based semi-supervised learning method for imbalanced dataset, called GSMID. GSMID characterize the class equilibrium constraint as the smoothness of class labels. It’s expected to derive the optimal assignment of class membership to unlabeled samples by maximizing the correlations of classes and simultaneously as smooth as possible on instance graph. The experiments comparing GSMID to SVM and other graph based semi-supervised learning methods on several real-world datasets show GSMM can effectively improve the classification accuracy on imbalanced dataset, especially when data is highly skewed.