Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning

Imbalanced data classification problems endeavor to find a dependent variable in a skewed data distribution. Imbalanced data classification problems present in many application areas like, medical disease diagnosis, risk management, fault-detection, etc. It is a challenging problem in the field of machine learning and data mining. In this paper, K-Means cluster based oversampling algorithm is proposed to solve the imbalanced data classification problem. The experimental results show that the proposed algorithm outperforms the existing oversampling algorithms of previous studies.

A Survey on Solution of Imbalanced Data Classification Problem Using SMOTE and Extreme Learning Machine

Communication and Intelligent Systems - Lecture Notes in Networks and Systems ◽

10.1007/978-981-16-1089-9_4 ◽

2021 ◽

pp. 31-44

Author(s):

Ankur Goyal ◽

Likhita Rathore ◽

Sandeep Kumar

Keyword(s):

Extreme Learning Machine ◽

Imbalanced Data ◽

Data Classification ◽

Classification Problem ◽

Learning Machine

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

Multi Labeled Imbalanced Data Classification Based on Advanced Min-Max Machine Learning

10.35940/ijitee.l3718.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 1776-1778

Keyword(s):

Machine Learning ◽

Uncertain Data ◽

Imbalanced Data ◽

Data Classification ◽

Data Sets ◽

Different Types ◽

Measured System ◽

Traditional Approaches

Some true applications, for example, content arrangement and sub-cell confinement of protein successions, include multi-mark grouping with imbalanced information. Different types of traditional approaches are introduced to describe the relation of hubristic and undertaking formations, classification of different attributes with imbalanced for different uncertain data sets. Here this addresses the issues by utilizing the min-max particular system. The min-max measured system can break down a multi-mark issue into a progression of little two-class sub-issues, which would then be able to be consolidated by two straightforward standards. Additionally present a few decay procedures to improve the presentation of min-max particular systems. Trial results on sub-cellular restriction demonstrate that our strategy has preferable speculation execution over customary SVMs in settling the multi-name and imbalanced information issues. In addition, it is additionally a lot quicker than customary SVMs

Integrating cluster analysis with granular computing for imbalanced data classification problem – A case study on prostate cancer prognosis

Computers & Industrial Engineering ◽

10.1016/j.cie.2018.08.031 ◽

2018 ◽

Vol 125 ◽

pp. 319-332 ◽

Cited By ~ 6

Author(s):

R.J. Kuo ◽

P.Y. Su ◽

Ferani E. Zulvia ◽

C.C. Lin

Keyword(s):

Prostate Cancer ◽

Cluster Analysis ◽

Granular Computing ◽

Imbalanced Data ◽

Data Classification ◽

Classification Problem ◽

Cancer Prognosis ◽

Prostate Cancer Prognosis

Solving the Imbalanced Data Classification Problem with the Particle Swarm Optimization Based Support Vector Machine

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.134.788 ◽

2014 ◽

Vol 134 (6) ◽

pp. 788-795

Author(s):

Zhenyuan Xu ◽

Junzo Watada ◽

Mingnan Wu ◽

Zuwarie Ibrahim ◽

Marzuki Khalid

Keyword(s):

Support Vector Machine ◽

Particle Swarm Optimization ◽

Particle Swarm ◽

Imbalanced Data ◽

Data Classification ◽

Classification Problem ◽

Support Vector ◽

Swarm Optimization ◽

2019 20th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) ◽

Handling Imbalanced Data Classification Problem using Artificial Immune System with Mahalanobis Distance

10.1109/snpd.2019.8935760 ◽

2019 ◽

Author(s):

Duangjai Jitkongchuen ◽

Warattha Sukpongthai

Keyword(s):

Immune System ◽

Mahalanobis Distance ◽

Artificial Immune System ◽

Imbalanced Data ◽

Data Classification ◽

Classification Problem ◽

Artificial Immune ◽

K-Means Cluster Based Undersampling Ensemble for Imbalanced Data Classification

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c5188.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2074-2079

Keyword(s):

Imbalanced Data ◽

Data Classification ◽

Classification Problem ◽

Classification Algorithms ◽

Classification Problems ◽

Imbalanced Classification ◽

Traditional Classification ◽

Boosting Method ◽

Ensemble Algorithms

Imbalanced data classification is a critical and challenging problem in both data mining and machine learning. Imbalanced data classification problems present in many application areas like rare medical diagnosis, risk management, fault-detection, etc. The traditional classification algorithms yield poor results in imbalanced classification problems. In this paper, K-Means cluster based undersampling ensemble algorithm is proposed to solve the imbalanced data classification problem. The proposed method combines K-Means cluster based undersampling and boosting method. The experimental results show that the proposed algorithm outperforms the other sampling ensemble algorithms of previous studies.

Computer Information Systems and Industrial Management - Lecture Notes in Computer Science ◽

Empirical Assessment of Performance Measures for Preprocessing Moments in Imbalanced Data Classification Problem

10.1007/978-3-319-45378-1_17 ◽

2016 ◽

pp. 183-194 ◽

Cited By ~ 1

Author(s):

Paweł Szeszko ◽

Magdalena Topczewska

Keyword(s):

Performance Measures ◽

Imbalanced Data ◽

Data Classification ◽

Classification Problem ◽

Assessment Of Performance ◽

Empirical Assessment ◽

A rough-granular approach to the imbalanced data classification problem

Applied Soft Computing ◽

10.1016/j.asoc.2019.105607 ◽

2019 ◽

Vol 83 ◽

pp. 105607 ◽

Cited By ~ 1

Author(s):

K. Borowska ◽

J. Stepaniuk

Keyword(s):

Imbalanced Data ◽

Data Classification ◽

Classification Problem ◽

A Hybrid Sampling SVM Approach to Imbalanced Data Classification

Abstract and Applied Analysis ◽

10.1155/2014/972786 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 19

Author(s):

Qiang Wang

Keyword(s):

Real World ◽

Imbalanced Data ◽

Data Classification ◽

Classification Problem ◽

Experimental Results ◽

Training Dataset ◽

Real World Datasets ◽

Classification Information ◽

Hybrid Sampling

Imbalanced datasets are frequently found in many real applications. Resampling is one of the effective solutions due to generating a relatively balanced class distribution. In this paper, a hybrid sampling SVM approach is proposed combining an oversampling technique and an undersampling technique for addressing the imbalanced data classification problem. The proposed approach first uses an undersampling technique to delete some samples of the majority class with less classification information and then applies an oversampling technique to gradually create some new positive samples. Thus, a balanced training dataset is generated to replace the original imbalanced training dataset. Finally, through experimental results on the real-world datasets, our proposed approach has the ability to identify informative samples and deal with the imbalanced data classification problem.