An approach to class imbalance problem based on stacking and inverse random under sampling methods

2021 ◽

Vol 9 (9) ◽

pp. 1535-1543

Author(s):

Himani Tiwari

Keyword(s):

Learning Community ◽

Sampling Methods ◽

Evaluation Criteria ◽

Class Imbalance ◽

Imbalanced Data ◽

Simulated Data ◽

Data Sets ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Under Sampling

Abstract: Class Imbalance problem is one of the most challenging problems faced by the machine learning community. As we refer the imbalance to various instances in class of being relatively low as compare to other data. A number of over - sampling and under-sampling approaches have been applied in an attempt to balance the classes. This study provides an overview of the issue of class imbalance and attempts to examine various balancing methods for dealing with this problem. In order to illustrate the differences, an experiment is conducted using multiple simulated data sets for comparing the performance of these oversampling methods on different classifiers based on various evaluation criteria. In addition, the effect of different parameters, such as number of features and imbalance ratio, on the classifier performance is also evaluated. Keywords: Imbalanced learning, Over-sampling methods, Under-sampling methods, Classifier performances, Evaluationmetrices

Download Full-text

An Investigation of Imbalanced Ensemble Learning Methods for Cross-Project Defect Prediction

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001419590377 ◽

2019 ◽

Vol 33 (12) ◽

pp. 1959037 ◽

Cited By ~ 5

Author(s):

Shaojian Qiu ◽

Lu Lu ◽

Siyu Jiang ◽

Yang Guo

Keyword(s):

Ensemble Learning ◽

Class Imbalance ◽

Training Data ◽

Defect Prediction ◽

Class Imbalance Problem ◽

Learning Methods ◽

Imbalance Problem ◽

Intelligent Software ◽

Under Sampling ◽

Cross Project

Machine-learning-based software defect prediction (SDP) methods are receiving great attention from the researchers of intelligent software engineering. Most existing SDP methods are performed under a within-project setting. However, there usually is little to no within-project training data to learn an available supervised prediction model for a new SDP task. Therefore, cross-project defect prediction (CPDP), which uses labeled data of source projects to learn a defect predictor for a target project, was proposed as a practical SDP solution. In real CPDP tasks, the class imbalance problem is ubiquitous and has a great impact on performance of the CPDP models. Unlike previous studies that focus on subsampling and individual methods, this study investigated 15 imbalanced learning methods for CPDP tasks, especially for assessing the effectiveness of imbalanced ensemble learning (IEL) methods. We evaluated the 15 methods by extensive experiments on 31 open-source projects derived from five datasets. Through analyzing a total of 37504 results, we found that in most cases, the IEL method that combined under-sampling and bagging approaches will be more effective than the other investigated methods.

Download Full-text

CoGBUS- Center of Gravity based under Sampling Method for Imbalanced Data Classification

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2077.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 2463-2468

Keyword(s):

Learning Community ◽

Sampling Method ◽

Class Imbalance ◽

Imbalanced Data ◽

Center Of Gravity ◽

Classification Algorithms ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Imbalanced Data Classification ◽

Under Sampling

Learning of class imbalanced data becomes a challenging issue in the machine learning community as all classification algorithms are designed to work for balanced datasets. Several methods are available to tackle this issue, among which the resampling techniques- undersampling and oversampling are more flexible and versatile. This paper introduces a new concept for undersampling based on Center of Gravity principle which helps to reduce the excess instances of majority class. This work is suited for binary class problems. The proposed technique –CoGBUS- overcomes the class imbalance problem and brings best results in the study. We take F-Score, GMean and ROC for the performance evaluation of the method.

Download Full-text

Credit Card Fraud Detection: An Exploration of Different Sampling Methods to Solve the Class Imbalance Problem

Algorithms for Intelligent Systems - Proceedings of the International Conference on Paradigms of Communication, Computing and Data Sciences ◽

10.1007/978-981-16-5747-4_71 ◽

2022 ◽

pp. 825-837

Author(s):

Mythili Krishnan ◽

Madhan Kumar Srinivasan

Keyword(s):

Credit Card ◽

Sampling Methods ◽

Class Imbalance ◽

Fraud Detection ◽

Class Imbalance Problem ◽

Credit Card Fraud ◽

Imbalance Problem

Download Full-text

A Multiple Expert Approach to the Class Imbalance Problem Using Inverse Random under Sampling

Multiple Classifier Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-642-02326-2_9 ◽

2009 ◽

pp. 82-91 ◽

Cited By ~ 29

Author(s):

Muhammad Atif Tahir ◽

Josef Kittler ◽

Krystian Mikolajczyk ◽

Fei Yan

Keyword(s):

Class Imbalance ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Under Sampling

Download Full-text

Under-sampling by algorithm with performance guaranteed for class-imbalance problem

2014 International Computer Science and Engineering Conference (ICSEC) ◽

10.1109/icsec.2014.6978197 ◽

2014 ◽

Cited By ~ 4

Author(s):

Wattana Jindaluang ◽

Varin Chouvatut ◽

Sanpawat Kantabutra

Keyword(s):

Class Imbalance ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Under Sampling

Download Full-text

Resampling Methods for Solving Class Imbalance Problem in Traffic Incident Detection

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.744-746.1985 ◽

2015 ◽

Vol 744-746 ◽

pp. 1985-1989 ◽

Cited By ~ 1

Author(s):

Miao Hua Li ◽

Shu Yan Chen

Keyword(s):

Class Imbalance ◽

Detection Performance ◽

Incident Detection ◽

Traffic Data ◽

Class Imbalance Problem ◽

Classification Rate ◽

Resampling Methods ◽

Traffic Incident ◽

Imbalance Problem ◽

Under Sampling

Traffic data is highly skewed with rare traffic incidents in the real word while most of the existing automatic incident detection (AID) algorithms suffer from limitations due to their inability to detect incidents under imbalanced traffic data condition. This paper developed feasible AID algorithms based on resampling methods to process imbalanced traffic data. In order to obtain the optimal sampling method for incident detection, we compare the detection performance of different AID algorithms based on various resampling methods. The detection performance is evaluated by the common criteria including classification rate, detection rate, false alarm rate, mean time to detection and an integrated performance index. The I-880 dataset is finally used in experiments to verify the proposed algorithms. The experimental results indicate that the proposed AID algorithm based on resampling can achieve better performance through handling imbalanced traffic data problem. Moreover, the under-sampling is competitive than over-sampling for traffic incident detection.

Download Full-text