Graph Based Semi-Supervised Learning Method for Imbalanced Dataset

In real application areas, the dataset used may be highly imbalanced and the number of instances for some classes are much higher than that of the other classes. When learning from highly imbalanced dataset, the classifier tends to be adapted to suit the majority class, which might make classifier to obtain a high predictive accuracy over the majority class, but poor accuracy over the minority class. To solve this problem, we put forward a novel graph based semi-supervised learning method for imbalanced dataset, called GSMID. GSMID characterize the class equilibrium constraint as the smoothness of class labels. It’s expected to derive the optimal assignment of class membership to unlabeled samples by maximizing the correlations of classes and simultaneously as smooth as possible on instance graph. The experiments comparing GSMID to SVM and other graph based semi-supervised learning methods on several real-world datasets show GSMM can effectively improve the classification accuracy on imbalanced dataset, especially when data is highly skewed.

Download Full-text

A Survey on Imbalanced Data Handling Techniques for Classification

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2021/089102021 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1341-1347

Keyword(s):

Real World ◽

Imbalanced Data ◽

Learning Task ◽

High Accuracy ◽

Data Handling ◽

Imbalanced Dataset ◽

Minority Class ◽

Class Labels ◽

Very High ◽

F Measure

Classification is a supervised learning task based on categorizing things in groups on the basis of class labels. Algorithms are trained with labeled datasets for accomplishing the task of classification. In the process of classification, datasets plays an important role. If in a dataset, instances of one label/class (majority class) are much more than instances of another label/class (minority class), such that it becomes hard to understand and learn characteristics of minority class for a classifier, such dataset is termed an imbalanced dataset. These types of datasets raise the problem of biased prediction or misclassification in the real world, as models based on such datasets may give very high accuracy during training, but as not familiar with minority class instances, would not be able to predict minority class and thus fails poorly. A survey on various techniques proposed by the researchers for handling imbalanced data has been presented and a comparison of the techniques based on f-measure has been identified and discussed.

Download Full-text

The Lateral Conflict Risk Assessment for Low-altitude Training Airspace using Weakly Supervised Learning Method

Intelligent Automation & Soft Computing ◽

10.31209/2018.100000027 ◽

2018 ◽

Vol 24 (3) ◽

pp. 603-611

Author(s):

Kaijun Xu ◽

Xueting Chen ◽

Yusheng Yao ◽

Shanshan Li

Keyword(s):

Risk Assessment ◽

Supervised Learning ◽

Altitude Training ◽

Learning Method ◽

Weakly Supervised Learning ◽

Low Altitude ◽

Weakly Supervised ◽

Conflict Risk

Download Full-text

An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection

10.21437/interspeech.2020-2329 ◽

2020 ◽

Author(s):

Xu Zheng ◽

Yan Song ◽

Jie Yan ◽

Li-Rong Dai ◽

Ian McLoughlin ◽

...

Keyword(s):

Supervised Learning ◽

Event Detection ◽

Learning Method ◽

Sound Event ◽

Sound Event Detection

Download Full-text

A Semi-Supervised Learning Method for MiRNA-Disease Association Prediction Based on Variational Autoencoder

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2021.3067338 ◽

2021 ◽

pp. 1-1

Author(s):

Cunmei Ji ◽

Yu-Tian Wang ◽

Zhen Gao ◽

Lei Li ◽

Jian-Cheng Ni ◽

...

Keyword(s):

Supervised Learning ◽

Disease Association ◽

Learning Method ◽

Variational Autoencoder

Download Full-text

Tropical Balls and Its Applications to K Nearest Neighbor over the Space of Phylogenetic Trees

Mathematics ◽

10.3390/math9070779 ◽

2021 ◽

Vol 9 (7) ◽

pp. 779

Author(s):

Ruriko Yoshida

Keyword(s):

Supervised Learning ◽

Phylogenetic Trees ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

High Dimensional ◽

Learning Method ◽

Dimensional Vector ◽

K Nearest Neighbor ◽

K Nearest Neighbors

A tropical ball is a ball defined by the tropical metric over the tropical projective torus. In this paper we show several properties of tropical balls over the tropical projective torus and also over the space of phylogenetic trees with a given set of leaf labels. Then we discuss its application to the K nearest neighbors (KNN) algorithm, a supervised learning method used to classify a high-dimensional vector into given categories by looking at a ball centered at the vector, which contains K vectors in the space.

Download Full-text