A Data-Distribution-Based Imbalanced Data Classification Method for Credit Scoring Using Neural Networks

Many real world data is imbalanced, i.e. one category contains significantly more samples than other categories. Traditional classification methods take different categories equally and are often ineffective. Based on the comprehensive analysis of existing researches, we propose a new imbalanced data classification method based on clustering. The method clusters both majority class and minority class at first. Then, clustered minority class will be over-sampled by SMOTE while clustered majority class be under-sampled randomly. Through clustering, the proposed method can avoid the loss of useful information while resampling. Experiments on several UCI datasets show that the proposed method can effectively improve the classification results on imbalanced data.

Download Full-text

Evolving Neural Networks with Maximum AUC for Imbalanced Data Classification

Lecture Notes in Computer Science - Hybrid Artificial Intelligence Systems ◽

10.1007/978-3-642-13769-3_41 ◽

2010 ◽

pp. 335-342 ◽

Cited By ~ 2

Author(s):

Xiaofen Lu ◽

Ke Tang ◽

Xin Yao

Keyword(s):

Neural Networks ◽

Imbalanced Data ◽

Data Classification ◽

Imbalanced Data Classification

Download Full-text

An Ensemble Learning Imbalanced Data Classification Method Based on Sample Combination Optimization

Journal of Physics Conference Series ◽

10.1088/1742-6596/1284/1/012035 ◽

2019 ◽

Vol 1284 ◽

pp. 012035

Author(s):

Yuxin Wang

Keyword(s):

Ensemble Learning ◽

Imbalanced Data ◽

Data Classification ◽

Classification Method ◽

Combination Optimization ◽

Imbalanced Data Classification

Download Full-text

Imbalanced Data Classification Method Based on Ensemble Learning

Lecture Notes in Electrical Engineering - Communications, Signal Processing, and Systems ◽

10.1007/978-981-13-6508-9_3 ◽

2019 ◽

pp. 18-24 ◽

Cited By ~ 1

Author(s):

Yu Xiang ◽

Yongping Xie

Keyword(s):

Ensemble Learning ◽

Imbalanced Data ◽

Data Classification ◽

Classification Method ◽

Imbalanced Data Classification

Download Full-text

An Improved SMOTE Imbalanced Data Classification Method Based on Support Degree

2014 International Conference on Identification, Information and Knowledge in the Internet of Things ◽

10.1109/iiki.2014.14 ◽

2014 ◽

Cited By ~ 7

Author(s):

Kewen Li ◽

Wenrong Zhang ◽

Qinghua Lu ◽

Xianghua Fang

Keyword(s):

Imbalanced Data ◽

Data Classification ◽

Classification Method ◽

Imbalanced Data Classification

Download Full-text

ADANOISE: Training neural networks with adaptive noise for imbalanced data classification

Expert Systems with Applications ◽

10.1016/j.eswa.2021.116364 ◽

2021 ◽

pp. 116364

Author(s):

Kyoham Shin ◽

Seokho Kang

Keyword(s):

Neural Networks ◽

Imbalanced Data ◽

Data Classification ◽

Imbalanced Data Classification ◽

Adaptive Noise

Download Full-text

Imbalanced Data Classification Method Based on Clustering and Voting Mechanism

Lecture Notes in Electrical Engineering - Proceedings of the 2012 International Conference on Information Technology and Software Engineering ◽

10.1007/978-3-642-34522-7_71 ◽

2012 ◽

pp. 667-674 ◽

Cited By ~ 1

Author(s):

Rui Tang ◽

Yuquan Zhu ◽

Geng Chen

Keyword(s):

Imbalanced Data ◽

Data Classification ◽

Classification Method ◽

Imbalanced Data Classification ◽

Voting Mechanism

Download Full-text

An imbalanced data classification method based on automatic clustering under-sampling

2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC) ◽

10.1109/pccc.2016.7820640 ◽

2016 ◽

Cited By ~ 5

Author(s):

Xiaoheng Deng ◽

Weijian Zhong ◽

Ju Ren ◽

Detian Zeng ◽

Honggang Zhang

Keyword(s):

Imbalanced Data ◽

Data Classification ◽

Classification Method ◽

Automatic Clustering ◽

Imbalanced Data Classification ◽

Under Sampling

Download Full-text

K-Means Cluster Based Oversampling Algorithm for Imbalanced Data Classification

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e6535.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 3436-3440

Keyword(s):

Machine Learning ◽

Data Distribution ◽

Imbalanced Data ◽

Data Classification ◽

Classification Problem ◽

Disease Diagnosis ◽

Skewed Data ◽

Challenging Problem ◽

Classification Problems ◽

Imbalanced Data Classification

Imbalanced data classification problems endeavor to find a dependent variable in a skewed data distribution. Imbalanced data classification problems present in many application areas like, medical disease diagnosis, risk management, fault-detection, etc. It is a challenging problem in the field of machine learning and data mining. In this paper, K-Means cluster based oversampling algorithm is proposed to solve the imbalanced data classification problem. The experimental results show that the proposed algorithm outperforms the existing oversampling algorithms of previous studies.

Download Full-text

A novel imbalanced data classification approach using both under and over sampling

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i5.2785 ◽

2021 ◽

Vol 10 (5) ◽

pp. 2789-2795

Author(s):

Seyyed Mohammad Javadi Moghaddam ◽

Asadollah Noroozi

Keyword(s):

Sampling Methods ◽

Data Distribution ◽

Imbalanced Data ◽

Data Classification ◽

Processing Technique ◽

Imbalanced Dataset ◽

Minority Class ◽

Imbalanced Data Classification ◽

Under Sampling ◽

Sampling Algorithms

The performance of the data classification has encountered a problem when the data distribution is imbalanced. This fact results in the classifiers tend to the majority class which has the most of the instances. One of the popular approaches is to balance the dataset using over and under sampling methods. This paper presents a novel pre-processing technique that performs both over and under sampling algorithms for an imbalanced dataset. The proposed method uses the SMOTE algorithm to increase the minority class. Moreover, a cluster-based approach is performed to decrease the majority class which takes into consideration the new size of the minority class. The experimental results on 10 imbalanced datasets show the suggested algorithm has better performance in comparison to previous approaches.

Download Full-text