A design of information granule-based under-sampling method in imbalanced data classification

Learning of class imbalanced data becomes a challenging issue in the machine learning community as all classification algorithms are designed to work for balanced datasets. Several methods are available to tackle this issue, among which the resampling techniques- undersampling and oversampling are more flexible and versatile. This paper introduces a new concept for undersampling based on Center of Gravity principle which helps to reduce the excess instances of majority class. This work is suited for binary class problems. The proposed technique –CoGBUS- overcomes the class imbalance problem and brings best results in the study. We take F-Score, GMean and ROC for the performance evaluation of the method.

Download Full-text

Evolutionary under-sampling based bagging ensemble method for imbalanced data classification

Frontiers of Computer Science ◽

10.1007/s11704-016-5306-z ◽

2018 ◽

Vol 12 (2) ◽

pp. 331-350 ◽

Cited By ~ 17

Author(s):

Bo Sun ◽

Haiyan Chen ◽

Jiandong Wang ◽

Hua Xie

Keyword(s):

Imbalanced Data ◽

Data Classification ◽

Ensemble Method ◽

Imbalanced Data Classification ◽

Under Sampling ◽

Bagging Ensemble

Download Full-text

An imbalanced data classification method based on automatic clustering under-sampling

2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC) ◽

10.1109/pccc.2016.7820640 ◽

2016 ◽

Cited By ~ 5

Author(s):

Xiaoheng Deng ◽

Weijian Zhong ◽

Ju Ren ◽

Detian Zeng ◽

Honggang Zhang

Keyword(s):

Imbalanced Data ◽

Data Classification ◽

Classification Method ◽

Automatic Clustering ◽

Imbalanced Data Classification ◽

Under Sampling

Download Full-text

A novel imbalanced data classification approach using both under and over sampling

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i5.2785 ◽

2021 ◽

Vol 10 (5) ◽

pp. 2789-2795

Author(s):

Seyyed Mohammad Javadi Moghaddam ◽

Asadollah Noroozi

Keyword(s):

Sampling Methods ◽

Data Distribution ◽

Imbalanced Data ◽

Data Classification ◽

Processing Technique ◽

Imbalanced Dataset ◽

Minority Class ◽

Imbalanced Data Classification ◽

Under Sampling ◽

Sampling Algorithms

The performance of the data classification has encountered a problem when the data distribution is imbalanced. This fact results in the classifiers tend to the majority class which has the most of the instances. One of the popular approaches is to balance the dataset using over and under sampling methods. This paper presents a novel pre-processing technique that performs both over and under sampling algorithms for an imbalanced dataset. The proposed method uses the SMOTE algorithm to increase the minority class. Moreover, a cluster-based approach is performed to decrease the majority class which takes into consideration the new size of the minority class. The experimental results on 10 imbalanced datasets show the suggested algorithm has better performance in comparison to previous approaches.

Download Full-text

Imbalanced Data Classification Algorithm Based on Clustering and SVM

Journal of Circuits System and Computers ◽

10.1142/s0218126621500365 ◽

2020 ◽

pp. 2150036

Author(s):

Bo Huang ◽

Yimin Zhu ◽

Zhongzhen Wang ◽

Zhijun Fang

Keyword(s):

Class Imbalance ◽

Imbalanced Data ◽

Data Classification ◽

Classification Algorithm ◽

Classification Model ◽

Imbalance Data ◽

Imbalance Problem ◽

Imbalanced Data Classification ◽

Under Sampling ◽

Feature Dimension

The class-imbalance learning is one of the most significant research topics in the data mining and machine learning. Imbalance problem means that one of the classes has much more samples than that of other classes. To deal with the issues of low classification accuracy and high time complexity, this paper proposes an novel imbalance data classification algorithm based on clustering and SVM. The algorithm suggests under-sampling in majority samples based on the distribution characteristics of minority samples. First, specific clusters are detected by cluster analysis on the minority. Second, a cluster boundary strategy is proposed to eliminate the bad influence of noise samples. To structure a balanced dataset for imbalance data, this paper proposes three principles of under-sampling on majority samples according to the characteristic of samples in the cluster. Finally, the optimal classification model from the linear combination of hybrid-kernel SVM is obtained. The experiments based on datasets in UCI and KEEL database show that our algorithm effectively decreases the interference of noise samples. Compared with the SMOTE and Fast-CBUS, the proposed algorithm not only reduces the feature dimension, but also improves the precision of the minor classes under the different labeled sample rates generally.

Download Full-text