Under-sampling by algorithm with performance guaranteed for class-imbalance problem

Machine-learning-based software defect prediction (SDP) methods are receiving great attention from the researchers of intelligent software engineering. Most existing SDP methods are performed under a within-project setting. However, there usually is little to no within-project training data to learn an available supervised prediction model for a new SDP task. Therefore, cross-project defect prediction (CPDP), which uses labeled data of source projects to learn a defect predictor for a target project, was proposed as a practical SDP solution. In real CPDP tasks, the class imbalance problem is ubiquitous and has a great impact on performance of the CPDP models. Unlike previous studies that focus on subsampling and individual methods, this study investigated 15 imbalanced learning methods for CPDP tasks, especially for assessing the effectiveness of imbalanced ensemble learning (IEL) methods. We evaluated the 15 methods by extensive experiments on 31 open-source projects derived from five datasets. Through analyzing a total of 37504 results, we found that in most cases, the IEL method that combined under-sampling and bagging approaches will be more effective than the other investigated methods.

Download Full-text

CoGBUS- Center of Gravity based under Sampling Method for Imbalanced Data Classification

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2077.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 2463-2468

Keyword(s):

Learning Community ◽

Sampling Method ◽

Class Imbalance ◽

Imbalanced Data ◽

Center Of Gravity ◽

Classification Algorithms ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Imbalanced Data Classification ◽

Under Sampling

Learning of class imbalanced data becomes a challenging issue in the machine learning community as all classification algorithms are designed to work for balanced datasets. Several methods are available to tackle this issue, among which the resampling techniques- undersampling and oversampling are more flexible and versatile. This paper introduces a new concept for undersampling based on Center of Gravity principle which helps to reduce the excess instances of majority class. This work is suited for binary class problems. The proposed technique –CoGBUS- overcomes the class imbalance problem and brings best results in the study. We take F-Score, GMean and ROC for the performance evaluation of the method.

Download Full-text

A Multiple Expert Approach to the Class Imbalance Problem Using Inverse Random under Sampling

Multiple Classifier Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-642-02326-2_9 ◽

2009 ◽

pp. 82-91 ◽

Cited By ~ 29

Author(s):

Muhammad Atif Tahir ◽

Josef Kittler ◽

Krystian Mikolajczyk ◽

Fei Yan

Keyword(s):

Class Imbalance ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Under Sampling

Download Full-text

An approach to class imbalance problem based on stacking and inverse random under sampling methods

2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC) ◽

10.1109/icnsc.2018.8361344 ◽

2018 ◽

Cited By ~ 4

Author(s):

Yuwei Zhang ◽

Guanjun Liu ◽

Wenjing Luan ◽

Chungang Yan ◽

Changjun Jiang

Keyword(s):

Sampling Methods ◽

Class Imbalance ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Under Sampling

Download Full-text

Resampling Methods for Solving Class Imbalance Problem in Traffic Incident Detection

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.744-746.1985 ◽

2015 ◽

Vol 744-746 ◽

pp. 1985-1989 ◽

Cited By ~ 1

Author(s):

Miao Hua Li ◽

Shu Yan Chen

Keyword(s):

Class Imbalance ◽

Detection Performance ◽

Incident Detection ◽

Traffic Data ◽

Class Imbalance Problem ◽

Classification Rate ◽

Resampling Methods ◽

Traffic Incident ◽

Imbalance Problem ◽

Under Sampling

Traffic data is highly skewed with rare traffic incidents in the real word while most of the existing automatic incident detection (AID) algorithms suffer from limitations due to their inability to detect incidents under imbalanced traffic data condition. This paper developed feasible AID algorithms based on resampling methods to process imbalanced traffic data. In order to obtain the optimal sampling method for incident detection, we compare the detection performance of different AID algorithms based on various resampling methods. The detection performance is evaluated by the common criteria including classification rate, detection rate, false alarm rate, mean time to detection and an integrated performance index. The I-880 dataset is finally used in experiments to verify the proposed algorithms. The experimental results indicate that the proposed AID algorithm based on resampling can achieve better performance through handling imbalanced traffic data problem. Moreover, the under-sampling is competitive than over-sampling for traffic incident detection.

Download Full-text

A Hybrid Approach for Class Imbalance Problem in Customer Churn Prediction: A Novel Extension to Under-sampling

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2018.05.08 ◽

2018 ◽

Vol 10 (5) ◽

pp. 71-81

Author(s):

Uma R. Salunkhe ◽

◽

Suresh N. Mali

Keyword(s):

Hybrid Approach ◽

Class Imbalance ◽

Churn Prediction ◽

Class Imbalance Problem ◽

Customer Churn ◽

Imbalance Problem ◽

Under Sampling ◽

Customer Churn Prediction

Download Full-text

Inverse random under sampling for class imbalance problem and its application to multi-label classification

Pattern Recognition ◽

10.1016/j.patcog.2012.03.014 ◽

2012 ◽

Vol 45 (10) ◽

pp. 3738-3750 ◽

Cited By ~ 112

Author(s):

Muhammad Atif Tahir ◽

Josef Kittler ◽

Fei Yan

Keyword(s):

Class Imbalance ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Under Sampling

Download Full-text

Performance Analysis of Under-Sampling and Over-Sampling Techniques for Solving Class Imbalance Problem

SSRN Electronic Journal ◽

10.2139/ssrn.3356374 ◽

2019 ◽

Author(s):

Rekha G ◽

Amit Kumar Tyagi ◽

V. Krishna Reddy

Keyword(s):

Performance Analysis ◽

Class Imbalance ◽

Sampling Techniques ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Under Sampling

Download Full-text

RUESVMs: An Ensemble Method to Handle the Class Imbalance Problem in Land Cover Mapping Using Google Earth Engine

Remote Sensing ◽

10.3390/rs12213484 ◽

2020 ◽

Vol 12 (21) ◽

pp. 3484 ◽

Cited By ~ 2

Author(s):

Amin Naboureh ◽

Hamid Ebrahimy ◽

Mohsen Azadbakht ◽

Jinhu Bian ◽

Meisam Amani

Keyword(s):

Vegetation Index ◽

Normalized Difference Vegetation Index ◽

Geometric Mean ◽

Class Imbalance ◽

Google Earth ◽

Support Vector ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Percentage Points ◽

Under Sampling

Timely and accurate Land Cover (LC) information is required for various applications, such as climate change analysis and sustainable development. Although machine learning algorithms are most likely successful in LC mapping tasks, the class imbalance problem is known as a common challenge in this regard. This problem occurs during the training phase and reduces classification accuracy for infrequent and rare LC classes. To address this issue, this study proposes a new method by integrating random under-sampling of majority classes and an ensemble of Support Vector Machines, namely Random Under-sampling Ensemble of Support Vector Machines (RUESVMs). The performance of RUESVMs for LC classification was evaluated in Google Earth Engine (GEE) over two different case studies using Sentinel-2 time-series data and five well-known spectral indices, including the Normalized Difference Vegetation Index (NDVI), Green Normalized Difference Vegetation Index (GNDVI), Soil-Adjusted Vegetation Index (SAVI), Normalized Difference Built-up Index (NDBI), and Normalized Difference Water Index (NDWI). The performance of RUESVMs was also compared with the traditional SVM and combination of SVM with three benchmark data balancing techniques namely the Random Over-Sampling (ROS), Random Under-Sampling (RUS), and Synthetic Minority Over-sampling Technique (SMOTE). It was observed that the proposed method considerably improved the accuracy of LC classification, especially for the minority classes. After adopting RUESVMs, the overall accuracy of the generated LC map increased by approximately 4.95 percentage points, and this amount for the geometric mean of producer’s accuracies was almost 3.75 percentage points, in comparison to the most accurate data balancing method (i.e., SVM-SMOTE). Regarding the geometric mean of users’ accuracies, RUESVMs also outperformed the SVM-SMOTE method with an average increase of 6.45 percentage points.

Download Full-text

A Novel Hybrid-Based Ensemble for Class Imbalance Problem

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213018500252 ◽

2018 ◽

Vol 27 (06) ◽

pp. 1850025 ◽

Cited By ~ 1

Author(s):

Huaping Guo ◽

Jun Zhou ◽

Chang-an Wu ◽

Wei She

Keyword(s):

Class Imbalance ◽

Imbalanced Data ◽

Sampling Technique ◽

Original Training ◽

Training Set ◽

Class Imbalance Problem ◽

Minority Class ◽

Imbalance Problem ◽

Under Sampling ◽

Feature Projection

Class-imbalance is very common in real world. However, conventional advanced methods do not work well on imbalanced data due to imbalanced class distribution. This paper proposes a simple but effective Hybrid-based Ensemble (HE) to deal with two-class imbalanced problem. HE learns a hybrid ensemble using the following two stages: (1) learning several projection matrixes from the rebalanced data obtained by under-sampling the original training set and constructing new training sets by projecting the original training set to different spaces defined by the matrixes, and (2) undersampling several subsets from each new training set and training a model on each subset. Here, feature projection aims to improve the diversity between ensemble members and under-sampling technique is to improve generalization ability of individual members on minority class. Experimental results show that, compared with other state-of-the-art methods, HE shows significantly better performance on measures of AUC, G-mean, F-measure and recall.

Download Full-text

Under-sampling by algorithm with performance guaranteed for class-imbalance problem

An Investigation of Imbalanced Ensemble Learning Methods for Cross-Project Defect Prediction

CoGBUS- Center of Gravity based under Sampling Method for Imbalanced Data Classification

A Multiple Expert Approach to the Class Imbalance Problem Using Inverse Random under Sampling

An approach to class imbalance problem based on stacking and inverse random under sampling methods

Resampling Methods for Solving Class Imbalance Problem in Traffic Incident Detection

A Hybrid Approach for Class Imbalance Problem in Customer Churn Prediction: A Novel Extension to Under-sampling

Inverse random under sampling for class imbalance problem and its application to multi-label classification

Performance Analysis of Under-Sampling and Over-Sampling Techniques for Solving Class Imbalance Problem

RUESVMs: An Ensemble Method to Handle the Class Imbalance Problem in Land Cover Mapping Using Google Earth Engine

A Novel Hybrid-Based Ensemble for Class Imbalance Problem

Export Citation Format