GWO Optimized K-Means Cluster based Oversampling Algorithm

Skewed data distribution prevails in many real world applications. The skewedness is due to imbalance in the class distribution and it deteriorates the performance of the traditional classification algorithms. In this paper, we provide a Grey wolf optimized K-Means cluster based oversampling algorithm to handle the skewedness and solve the imbalanced data classification problem. Experiments are conducted on the proposed algorithm and compared it with the benchmarking popular algorithms. The results reveal that the proposed algorithm outperforms the other benchmarking algorithms.

Download Full-text

K-Means Cluster Based Undersampling Ensemble for Imbalanced Data Classification

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c5188.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2074-2079

Keyword(s):

Imbalanced Data ◽

Data Classification ◽

Classification Problem ◽

Classification Algorithms ◽

Classification Problems ◽

Imbalanced Classification ◽

Imbalanced Data Classification ◽

Traditional Classification ◽

Boosting Method ◽

Ensemble Algorithms

Imbalanced data classification is a critical and challenging problem in both data mining and machine learning. Imbalanced data classification problems present in many application areas like rare medical diagnosis, risk management, fault-detection, etc. The traditional classification algorithms yield poor results in imbalanced classification problems. In this paper, K-Means cluster based undersampling ensemble algorithm is proposed to solve the imbalanced data classification problem. The proposed method combines K-Means cluster based undersampling and boosting method. The experimental results show that the proposed algorithm outperforms the other sampling ensemble algorithms of previous studies.

Download Full-text

K-Means Cluster Based Oversampling Algorithm for Imbalanced Data Classification

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e6535.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 3436-3440

Keyword(s):

Machine Learning ◽

Data Distribution ◽

Imbalanced Data ◽

Data Classification ◽

Classification Problem ◽

Disease Diagnosis ◽

Skewed Data ◽

Challenging Problem ◽

Classification Problems ◽

Imbalanced Data Classification

Imbalanced data classification problems endeavor to find a dependent variable in a skewed data distribution. Imbalanced data classification problems present in many application areas like, medical disease diagnosis, risk management, fault-detection, etc. It is a challenging problem in the field of machine learning and data mining. In this paper, K-Means cluster based oversampling algorithm is proposed to solve the imbalanced data classification problem. The experimental results show that the proposed algorithm outperforms the existing oversampling algorithms of previous studies.

Download Full-text

Review of Imbalanced Data Classification and Approaches Relating to Real-Time Applications

Data Preprocessing, Active Learning, and Cost Perceptive Approaches for Resolving Data Imbalance - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-7371-6.ch001 ◽

2021 ◽

pp. 1-22

Author(s):

Anjali S. More ◽

Dipti P. Rana

Keyword(s):

Real Time ◽

Performance Metrics ◽

Data Distribution ◽

Imbalanced Data ◽

Data Classification ◽

Disease Diagnosis ◽

Skewed Data ◽

Data Imbalance ◽

Imbalanced Data Classification ◽

Real Time Applications

In today's era, multifarious data mining applications deal with leading challenges of handling imbalanced data classification and its impact on performance metrics. There is the presence of skewed data distribution in an ample range of existent time applications which engrossed the attention of researchers. Fraud detection in finance, disease diagnosis in medical applications, oil spill detection, pilfering in electricity, anomaly detection and intrusion detection in security, and other real-time applications constitute uneven data distribution. Data imbalance affects classification performance metrics and upturns the error rate. These leading challenges prompted researchers to investigate imbalanced data applications and related machine learning approaches. The intent of this research work is to review a wide variety of imbalanced data applications of skewed data distribution as binary class data unevenness and multiclass data disproportion, the problem encounters, the variety of approaches to resolve the data imbalance, and possible open research areas.

Download Full-text

Imbalanced Data Classification Based on Clustering

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.443.741 ◽

2013 ◽

Vol 443 ◽

pp. 741-745

Author(s):

Hu Li ◽

Peng Zou ◽

Wei Hong Han ◽

Rong Ze Xia

Keyword(s):

Real World ◽

Imbalanced Data ◽

Data Classification ◽

Comprehensive Analysis ◽

Classification Method ◽

Classification Methods ◽

Real World Data ◽

Minority Class ◽

Imbalanced Data Classification ◽

Traditional Classification

Many real world data is imbalanced, i.e. one category contains significantly more samples than other categories. Traditional classification methods take different categories equally and are often ineffective. Based on the comprehensive analysis of existing researches, we propose a new imbalanced data classification method based on clustering. The method clusters both majority class and minority class at first. Then, clustered minority class will be over-sampled by SMOTE while clustered majority class be under-sampled randomly. Through clustering, the proposed method can avoid the loss of useful information while resampling. Experiments on several UCI datasets show that the proposed method can effectively improve the classification results on imbalanced data.

Download Full-text

A Survey on Solution of Imbalanced Data Classification Problem Using SMOTE and Extreme Learning Machine

Communication and Intelligent Systems - Lecture Notes in Networks and Systems ◽

10.1007/978-981-16-1089-9_4 ◽

2021 ◽

pp. 31-44

Author(s):

Ankur Goyal ◽

Likhita Rathore ◽

Sandeep Kumar

Keyword(s):

Extreme Learning Machine ◽

Imbalanced Data ◽

Data Classification ◽

Classification Problem ◽

Imbalanced Data Classification ◽

Learning Machine

Download Full-text

Comprehensive Assessment of Imbalanced Data Classification

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d7349.049420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1426-1431

Keyword(s):

Real World ◽

Imbalanced Data ◽

Predictive Modelling ◽

Classification Problem ◽

Minority Class ◽

Unequal Distribution ◽

Imbalanced Classification ◽

Imbalanced Data Classification ◽

Improved Performance ◽

And Performance

This is an attempt to address the various challenges opportunities and scope for formulating and designing new procedure in imbalanced classification problem which poses a challenge to a predictive modelling as many of AI ML n DL algorithms which are extensively used for classification are always designed from the perspective of with majority of focus on assuming equal number of examples for a class. It leads to poor efficiency and performance especially in minority class. As Minority class is always very crucial and sensitive to classification errors and also its utmost important in imbalanced classification. This chapter discusses addresses and gives novel as well as deep insights with unequal distribution of classes in training datasets. Largely real time and real world classifications are comprising imbalanced distribution so need specialized techniques for more challenging and sophisticated models with minimal errors and improved performance.

Download Full-text

CoGBUS- Center of Gravity based under Sampling Method for Imbalanced Data Classification

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2077.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 2463-2468

Keyword(s):

Learning Community ◽

Sampling Method ◽

Class Imbalance ◽

Imbalanced Data ◽

Center Of Gravity ◽

Classification Algorithms ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Imbalanced Data Classification ◽

Under Sampling

Learning of class imbalanced data becomes a challenging issue in the machine learning community as all classification algorithms are designed to work for balanced datasets. Several methods are available to tackle this issue, among which the resampling techniques- undersampling and oversampling are more flexible and versatile. This paper introduces a new concept for undersampling based on Center of Gravity principle which helps to reduce the excess instances of majority class. This work is suited for binary class problems. The proposed technique –CoGBUS- overcomes the class imbalance problem and brings best results in the study. We take F-Score, GMean and ROC for the performance evaluation of the method.

Download Full-text

Integrating cluster analysis with granular computing for imbalanced data classification problem – A case study on prostate cancer prognosis

Computers & Industrial Engineering ◽

10.1016/j.cie.2018.08.031 ◽

2018 ◽

Vol 125 ◽

pp. 319-332 ◽

Cited By ~ 6

Author(s):

R.J. Kuo ◽

P.Y. Su ◽

Ferani E. Zulvia ◽

C.C. Lin

Keyword(s):

Prostate Cancer ◽

Cluster Analysis ◽

Granular Computing ◽

Imbalanced Data ◽

Data Classification ◽

Classification Problem ◽

Cancer Prognosis ◽

Imbalanced Data Classification ◽

Prostate Cancer Prognosis

Download Full-text

Multi-label Problem Transformation Methods: a Case Study

CLEI electronic journal ◽

10.19153/cleiej.14.1.4 ◽

2011 ◽

Vol 14 (1) ◽

Cited By ~ 27

Author(s):

Everton Alvares Cherman ◽

Maria Carolina Monard ◽

Jean Metz

Keyword(s):

Classification Problem ◽

Learning Problems ◽

Classification Algorithms ◽

Classification Problems ◽

Practical Applications ◽

Problem Transformation ◽

Textual Data ◽

Traditional Classification ◽

Transformation Methods

Traditional classification algorithms consider learning problems that contain only one label, i.e., each example is associated with one single nominal target variable characterizing its property. However, the number of practical applications involving data with multiple target variables has increased. To learn from this sort of data, multi-label classification algorithms should be used. The task of learning from multi-label data can be addressed by methods that transform the multi-label classification problem into several single-label classification problems. In this work, two well known methods based on this approach are used, as well as a third method we propose to overcome some deficiencies of one of them, in a case study using textual data related to medical findings, which were structured using the bag-of-words approach. The experimental study using these three methods shows an improvement on the results obtained by our proposed multi-label classification method.

Download Full-text