Review of Imbalanced Data Classification and Approaches Relating to Real-Time Applications

Data Preprocessing, Active Learning, and Cost Perceptive Approaches for Resolving Data Imbalance - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-7371-6.ch001 ◽

2021 ◽

pp. 1-22

Author(s):

Anjali S. More ◽

Dipti P. Rana

Keyword(s):

Real Time ◽

Performance Metrics ◽

Data Distribution ◽

Imbalanced Data ◽

Data Classification ◽

Disease Diagnosis ◽

Skewed Data ◽

Data Imbalance ◽

Imbalanced Data Classification ◽

Real Time Applications

In today's era, multifarious data mining applications deal with leading challenges of handling imbalanced data classification and its impact on performance metrics. There is the presence of skewed data distribution in an ample range of existent time applications which engrossed the attention of researchers. Fraud detection in finance, disease diagnosis in medical applications, oil spill detection, pilfering in electricity, anomaly detection and intrusion detection in security, and other real-time applications constitute uneven data distribution. Data imbalance affects classification performance metrics and upturns the error rate. These leading challenges prompted researchers to investigate imbalanced data applications and related machine learning approaches. The intent of this research work is to review a wide variety of imbalanced data applications of skewed data distribution as binary class data unevenness and multiclass data disproportion, the problem encounters, the variety of approaches to resolve the data imbalance, and possible open research areas.

Download Full-text

K-Means Cluster Based Oversampling Algorithm for Imbalanced Data Classification

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e6535.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 3436-3440

Keyword(s):

Machine Learning ◽

Data Distribution ◽

Imbalanced Data ◽

Data Classification ◽

Classification Problem ◽

Disease Diagnosis ◽

Skewed Data ◽

Challenging Problem ◽

Classification Problems ◽

Imbalanced Data Classification

Imbalanced data classification problems endeavor to find a dependent variable in a skewed data distribution. Imbalanced data classification problems present in many application areas like, medical disease diagnosis, risk management, fault-detection, etc. It is a challenging problem in the field of machine learning and data mining. In this paper, K-Means cluster based oversampling algorithm is proposed to solve the imbalanced data classification problem. The experimental results show that the proposed algorithm outperforms the existing oversampling algorithms of previous studies.

Download Full-text

A novel imbalanced data classification approach using both under and over sampling

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i5.2785 ◽

2021 ◽

Vol 10 (5) ◽

pp. 2789-2795

Author(s):

Seyyed Mohammad Javadi Moghaddam ◽

Asadollah Noroozi

Keyword(s):

Sampling Methods ◽

Data Distribution ◽

Imbalanced Data ◽

Data Classification ◽

Processing Technique ◽

Imbalanced Dataset ◽

Minority Class ◽

Imbalanced Data Classification ◽

Under Sampling ◽

Sampling Algorithms

The performance of the data classification has encountered a problem when the data distribution is imbalanced. This fact results in the classifiers tend to the majority class which has the most of the instances. One of the popular approaches is to balance the dataset using over and under sampling methods. This paper presents a novel pre-processing technique that performs both over and under sampling algorithms for an imbalanced dataset. The proposed method uses the SMOTE algorithm to increase the minority class. Moreover, a cluster-based approach is performed to decrease the majority class which takes into consideration the new size of the minority class. The experimental results on 10 imbalanced datasets show the suggested algorithm has better performance in comparison to previous approaches.

Download Full-text

A Data-Distribution-Based Imbalanced Data Classification Method for Credit Scoring Using Neural Networks

2013 Sixth International Conference on Business Intelligence and Financial Engineering ◽

10.1109/bife.2013.116 ◽

2013 ◽

Author(s):

Dailing Zhang ◽

Wei Xu

Keyword(s):

Neural Networks ◽

Credit Scoring ◽

Data Distribution ◽

Imbalanced Data ◽

Data Classification ◽

Classification Method ◽

Imbalanced Data Classification

Download Full-text

GWO Optimized K-Means Cluster based Oversampling Algorithm

INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING ◽

10.47164/ijngc.v12i3.694 ◽

2021 ◽

Vol 12 (3) ◽

Author(s):

Santha Subbulaxmi S ◽

Arumugam G

Keyword(s):

Data Distribution ◽

Imbalanced Data ◽

Classification Problem ◽

The Other ◽

Classification Algorithms ◽

Skewed Data ◽

Grey Wolf ◽

Imbalanced Data Classification ◽

Traditional Classification ◽

Real World Applications

Skewed data distribution prevails in many real world applications. The skewedness is due to imbalance in the class distribution and it deteriorates the performance of the traditional classification algorithms. In this paper, we provide a Grey wolf optimized K-Means cluster based oversampling algorithm to handle the skewedness and solve the imbalanced data classification problem. Experiments are conducted on the proposed algorithm and compared it with the benchmarking popular algorithms. The results reveal that the proposed algorithm outperforms the other benchmarking algorithms.

Download Full-text