Graph Based Semi-Supervised Learning Method for Imbalanced Dataset

2014 ◽  
Vol 556-562 ◽  
pp. 4040-4044
Author(s):  
Chen Guang Zhang ◽  
Yan Zhang ◽  
Xia Huan Zhang

In real application areas, the dataset used may be highly imbalanced and the number of instances for some classes are much higher than that of the other classes. When learning from highly imbalanced dataset, the classifier tends to be adapted to suit the majority class, which might make classifier to obtain a high predictive accuracy over the majority class, but poor accuracy over the minority class. To solve this problem, we put forward a novel graph based semi-supervised learning method for imbalanced dataset, called GSMID. GSMID characterize the class equilibrium constraint as the smoothness of class labels. It’s expected to derive the optimal assignment of class membership to unlabeled samples by maximizing the correlations of classes and simultaneously as smooth as possible on instance graph. The experiments comparing GSMID to SVM and other graph based semi-supervised learning methods on several real-world datasets show GSMM can effectively improve the classification accuracy on imbalanced dataset, especially when data is highly skewed.

Classification is a supervised learning task based on categorizing things in groups on the basis of class labels. Algorithms are trained with labeled datasets for accomplishing the task of classification. In the process of classification, datasets plays an important role. If in a dataset, instances of one label/class (majority class) are much more than instances of another label/class (minority class), such that it becomes hard to understand and learn characteristics of minority class for a classifier, such dataset is termed an imbalanced dataset. These types of datasets raise the problem of biased prediction or misclassification in the real world, as models based on such datasets may give very high accuracy during training, but as not familiar with minority class instances, would not be able to predict minority class and thus fails poorly. A survey on various techniques proposed by the researchers for handling imbalanced data has been presented and a comparison of the techniques based on f-measure has been identified and discussed.


2020 ◽  
Author(s):  
Xu Zheng ◽  
Yan Song ◽  
Jie Yan ◽  
Li-Rong Dai ◽  
Ian McLoughlin ◽  
...  

Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 779
Author(s):  
Ruriko Yoshida

A tropical ball is a ball defined by the tropical metric over the tropical projective torus. In this paper we show several properties of tropical balls over the tropical projective torus and also over the space of phylogenetic trees with a given set of leaf labels. Then we discuss its application to the K nearest neighbors (KNN) algorithm, a supervised learning method used to classify a high-dimensional vector into given categories by looking at a ball centered at the vector, which contains K vectors in the space.


2014 ◽  
Vol 144 ◽  
pp. 526-536 ◽  
Author(s):  
Jinling Wang ◽  
Ammar Belatreche ◽  
Liam Maguire ◽  
Thomas Martin McGinnity

Sign in / Sign up

Export Citation Format

Share Document