Imbalanced Data Classification Algorithm Based on Clustering and SVM

The class-imbalance learning is one of the most significant research topics in the data mining and machine learning. Imbalance problem means that one of the classes has much more samples than that of other classes. To deal with the issues of low classification accuracy and high time complexity, this paper proposes an novel imbalance data classification algorithm based on clustering and SVM. The algorithm suggests under-sampling in majority samples based on the distribution characteristics of minority samples. First, specific clusters are detected by cluster analysis on the minority. Second, a cluster boundary strategy is proposed to eliminate the bad influence of noise samples. To structure a balanced dataset for imbalance data, this paper proposes three principles of under-sampling on majority samples according to the characteristic of samples in the cluster. Finally, the optimal classification model from the linear combination of hybrid-kernel SVM is obtained. The experiments based on datasets in UCI and KEEL database show that our algorithm effectively decreases the interference of noise samples. Compared with the SMOTE and Fast-CBUS, the proposed algorithm not only reduces the feature dimension, but also improves the precision of the minor classes under the different labeled sample rates generally.

Download Full-text

CoGBUS- Center of Gravity based under Sampling Method for Imbalanced Data Classification

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2077.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 2463-2468

Keyword(s):

Learning Community ◽

Sampling Method ◽

Class Imbalance ◽

Imbalanced Data ◽

Center Of Gravity ◽

Classification Algorithms ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Imbalanced Data Classification ◽

Under Sampling

Learning of class imbalanced data becomes a challenging issue in the machine learning community as all classification algorithms are designed to work for balanced datasets. Several methods are available to tackle this issue, among which the resampling techniques- undersampling and oversampling are more flexible and versatile. This paper introduces a new concept for undersampling based on Center of Gravity principle which helps to reduce the excess instances of majority class. This work is suited for binary class problems. The proposed technique –CoGBUS- overcomes the class imbalance problem and brings best results in the study. We take F-Score, GMean and ROC for the performance evaluation of the method.

Download Full-text

An Under-Sampling Method with Support Vectors in Multi-class Imbalanced Data Classification

2019 13th International Conference on Software, Knowledge, Information Management and Applications (SKIMA) ◽

10.1109/skima47702.2019.8982391 ◽

2019 ◽

Cited By ~ 1

Author(s):

Md. Yasir Arafat ◽

Sabera Hoque ◽

Shuxiang Xu ◽

Dewan Md. Farid

Keyword(s):

Sampling Method ◽

Imbalanced Data ◽

Data Classification ◽

Support Vectors ◽

Imbalanced Data Classification ◽

Under Sampling

Download Full-text

Imbalanced data classification algorithm based on boosting and cascade model

2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/icsmc.2012.6378183 ◽

2012 ◽

Author(s):

Xiaolong Zhang ◽

Chao Cheng

Keyword(s):

Imbalanced Data ◽

Data Classification ◽

Classification Algorithm ◽

Cascade Model ◽

Imbalanced Data Classification

Download Full-text

Imbalanced data classification algorithm with support vector machine kernel extensions

Evolutionary Intelligence ◽

10.1007/s12065-018-0182-0 ◽

2018 ◽

Vol 12 (3) ◽

pp. 341-347 ◽

Cited By ~ 2

Author(s):

Feng Wang ◽

Shaojiang Liu ◽

Weichuan Ni ◽

Zhiming Xu ◽

Zemin Qiu ◽

...

Keyword(s):

Support Vector Machine ◽

Imbalanced Data ◽

Data Classification ◽

Classification Algorithm ◽

Support Vector ◽

Imbalanced Data Classification

Download Full-text

An under-sampling technique for imbalanced data classification based on DBSCAN algorithm

2020 8th Iranian Joint Congress on Fuzzy and intelligent Systems (CFIS) ◽

10.1109/cfis49607.2020.9238718 ◽

2020 ◽

Author(s):

Behzad Mirzaei ◽

Bahareh Nikpour ◽

Hossein Nezamabadi-Pour

Keyword(s):

Imbalanced Data ◽

Data Classification ◽

Sampling Technique ◽

Dbscan Algorithm ◽

Imbalanced Data Classification ◽

Under Sampling

Download Full-text

A Novel Model for Imbalanced Data Classification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6145 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6680-6687

Author(s):

Jian Yin ◽

Chunjing Gan ◽

Kaiqi Zhao ◽

Xuan Lin ◽

Zhe Quan ◽

...

Keyword(s):

Imbalanced Data ◽

Data Classification ◽

Classification Performance ◽

Classification Model ◽

Proposed Model ◽

Imbalanced Data Classification ◽

Public Datasets ◽

Distribution Cost ◽

Novel Model ◽

Learning Data

Recently, imbalanced data classification has received much attention due to its wide applications. In the literature, existing researches have attempted to improve the classification performance by considering various factors such as the imbalanced distribution, cost-sensitive learning, data space improvement, and ensemble learning. Nevertheless, most of the existing methods focus on only part of these main aspects/factors. In this work, we propose a novel imbalanced data classification model that considers all these main aspects. To evaluate the performance of our proposed model, we have conducted experiments based on 14 public datasets. The results show that our model outperforms the state-of-the-art methods in terms of recall, G-mean, F-measure and AUC.

Download Full-text