Class Imbalance Learning to Heterogeneous Cross Software Projects Defect Prediction

doi:10.4018/ijsi.292021

Class Imbalance Learning to Heterogeneous Cross Software Projects Defect Prediction

International Journal of Software Innovation ◽

10.4018/ijsi.292021 ◽

2022 ◽

Vol 10 (1) ◽

pp. 0-0

Keyword(s):

Research Work ◽

Class Imbalance ◽

Training Dataset ◽

Software Projects ◽

Class Imbalance Problem ◽

Software Application ◽

Imbalance Problem ◽

Under Sampling ◽

Imbalance Learning ◽

Class Imbalance Learning

Heterogeneous CPDP (HCPDP) attempts to forecast defects in a software application having insufficient previous defect data. Nonetheless, with a Class Imbalance Problem (CIP) perspective, one should have a clear view of data distribution in the training dataset otherwise the trained model would lead to biased classification results. Class Imbalance Learning (CIL) is the method of achieving an equilibrium ratio between two classes in imbalanced datasets. There are a range of effective solutions to manage CIP such as resampling techniques like Over-Sampling (OS) & Under-Sampling (US) methods. The proposed research work employs Synthetic Minority Oversampling TEchnique (SMOTE) and Random Under Sampling (RUS) technique to handle CIP. In addition to this, the paper proposes a novel four-phase HCPDP model and contrasts the efficiency of basic HCPDP model with CIP and after handling CIP using SMOTE & RUS with three prediction pairs. Results show that training performance with SMOTE is substantially improved but RUS displays variations in relation to HCPDP for all three prediction pairs.

Download Full-text

Comparing the Behavior of Oversampling and Undersampling Approach of Class Imbalance Learning by Combining Class Imbalance Problem with Noise

Advances in Intelligent Systems and Computing - ICT Based Innovations ◽

10.1007/978-981-10-6602-3_3 ◽

2017 ◽

pp. 23-30 ◽

Cited By ~ 11

Author(s):

Prabhjot Kaur ◽

Anjana Gosain

Keyword(s):

Class Imbalance ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Imbalance Learning ◽

Class Imbalance Learning

Download Full-text

SMOTEMultiBoost: Leveraging the SMOTE with MultiBoost to Confront the Class Imbalance in Supervised Learning

Journal of Information Communication Technologies and Robotic Applications ◽

10.51239/jictra.v0i0.227 ◽

2020 ◽

Author(s):

Naveed Ahmad Khan Jhamat ◽

Ghulam Mustafa ◽

Zhendong Niu

Keyword(s):

False Negative ◽

Class Imbalance ◽

Sampling Technique ◽

Classification Algorithms ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Limiting Error ◽

Improved Performance ◽

Imbalance Learning ◽

Class Imbalance Learning

Class imbalance problem is being manifoldly confronted by researchers due to the increasing amount of complicated data. Common classification algorithms are impoverished to perform effectively on imbalanced datasets. Larger class cases typically outbalance smaller class cases in class imbalance learning. Common classification algorithms raise larger class performance owing to class imbalance in data and overall improvement in accuracy as their goal while lowering performance on smaller class. Furthermore, these algorithms deal false positive and false negative in an even way and regard equal cost of misclassifying cases. Meanwhile, different ensemble solutions have been proposed over the years for class imbalance learning but these approaches hamper the performance of larger class as emphasizing on the small class cases. The intuition of this overall degraded outcome would be the low diversity in ensemble solutions and overfitting or underfitting in data resampling techniques. To overcome these problems, we suggest a hybrid ensemble method by leveraging MultiBoost ensemble and Synthetic Minority Over-sampling TEchnique (SMOTE). Our suggested solution leverage the effectiveness of its elements. Therefore, it improves the outcome of the smaller class by reinforcing its space and limiting error in prediction. The proposed method shows improved performance as compare to numerous other algorithms and techniques in experiments.

Download Full-text

An Investigation of Imbalanced Ensemble Learning Methods for Cross-Project Defect Prediction

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001419590377 ◽

2019 ◽

Vol 33 (12) ◽

pp. 1959037 ◽

Cited By ~ 5

Author(s):

Shaojian Qiu ◽

Lu Lu ◽

Siyu Jiang ◽

Yang Guo

Keyword(s):

Ensemble Learning ◽

Class Imbalance ◽

Training Data ◽

Defect Prediction ◽

Class Imbalance Problem ◽

Learning Methods ◽

Imbalance Problem ◽

Intelligent Software ◽

Under Sampling ◽

Cross Project

Machine-learning-based software defect prediction (SDP) methods are receiving great attention from the researchers of intelligent software engineering. Most existing SDP methods are performed under a within-project setting. However, there usually is little to no within-project training data to learn an available supervised prediction model for a new SDP task. Therefore, cross-project defect prediction (CPDP), which uses labeled data of source projects to learn a defect predictor for a target project, was proposed as a practical SDP solution. In real CPDP tasks, the class imbalance problem is ubiquitous and has a great impact on performance of the CPDP models. Unlike previous studies that focus on subsampling and individual methods, this study investigated 15 imbalanced learning methods for CPDP tasks, especially for assessing the effectiveness of imbalanced ensemble learning (IEL) methods. We evaluated the 15 methods by extensive experiments on 31 open-source projects derived from five datasets. Through analyzing a total of 37504 results, we found that in most cases, the IEL method that combined under-sampling and bagging approaches will be more effective than the other investigated methods.

Download Full-text

A New Diversity Technique for Imbalance Learning Ensembles

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.11251 ◽

2018 ◽

Vol 7 (2.14) ◽

pp. 478 ◽

Cited By ~ 2

Author(s):

Hartono . ◽

Opim Salim Sitompul ◽

Erna Budhiarti Nababan ◽

Tulus . ◽

Dahlan Abdullah ◽

...

Keyword(s):

Hybrid Approach ◽

Class Imbalance ◽

Machine Learning Techniques ◽

Classifier Ensembles ◽

Classification Problems ◽

Class Imbalance Problem ◽

Weighting Method ◽

Imbalance Problem ◽

Learning Ensembles ◽

Imbalance Learning

Data mining and machine learning techniques designed to solve classification problems require balanced class distribution. However, in reality sometimes the classification of datasets indicates the existence of a class represented by a large number of instances whereas there are classes with far fewer instances. This problem is known as the class imbalance problem. Classifier Ensembles is a method often used in overcoming class imbalance problems. Data Diversity is one of the cornerstones of ensembles. An ideal ensemble system should have accurrate individual classifiers and if there is an error it is expected to occur on different objects or instances. This research will present the results of overview and experimental study using Hybrid Approach Redefinition (HAR) Method in handling class imbalance and at the same time expected to get better data diversity. This research will be conducted using 6 datasets with different imbalanced ratios and will be compared with SMOTEBoost which is one of the Re-Weighting method which is often used in handling class imbalance. This study shows that the data diversity is related to performance in the imbalance learning ensembles and the proposed methods can obtain better data diversity.

Download Full-text

An Empirical Study of Boosting Methods on Severely Imbalanced Data

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.2510 ◽

2014 ◽

Vol 513-517 ◽

pp. 2510-2513 ◽

Cited By ~ 1

Author(s):

Xu Ying Liu

Keyword(s):

Empirical Study ◽

Real World ◽

Class Imbalance ◽

Imbalanced Data ◽

Real World Applications ◽

Under Sampling ◽

The Difference ◽

Imbalance Learning ◽

Class Imbalance Learning ◽

F Measure

Nowadays there are large volumes of data in real-world applications, which poses great challenge to class-imbalance learning: the large amount of the majority class examples and severe class-imbalance. Previous studies on class-imbalance learning mainly focused on relatively small or moderate class-imbalance. In this paper we conduct an empirical study to explore the difference between learning with small or moderate class-imbalance and learning with severe class-imbalance. The experimental results show that: (1) Traditional methods cannot handle severe class-imbalance effectively. (2) AUC, G-mean and F-measure can be very inconsistent for severe class-imbalance, which seldom appears when class-imbalance is moderate. And G-mean is not appropriate for severe class-imbalance learning because it is not sensitive to the change of imbalance ratio. (3) When AUC and G-mean are evaluation metrics, EasyEnsemble is the best method, followed by BalanceCascade and under-sampling. (4) A little under-full balance is better for under-sampling to handle severe class-imbalance. And it is important to handle false positives when design methods for severe class-imbalance.

Download Full-text

CoGBUS- Center of Gravity based under Sampling Method for Imbalanced Data Classification

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2077.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 2463-2468

Keyword(s):

Learning Community ◽

Sampling Method ◽

Class Imbalance ◽

Imbalanced Data ◽

Center Of Gravity ◽

Classification Algorithms ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Imbalanced Data Classification ◽

Under Sampling

Learning of class imbalanced data becomes a challenging issue in the machine learning community as all classification algorithms are designed to work for balanced datasets. Several methods are available to tackle this issue, among which the resampling techniques- undersampling and oversampling are more flexible and versatile. This paper introduces a new concept for undersampling based on Center of Gravity principle which helps to reduce the excess instances of majority class. This work is suited for binary class problems. The proposed technique –CoGBUS- overcomes the class imbalance problem and brings best results in the study. We take F-Score, GMean and ROC for the performance evaluation of the method.

Download Full-text

A Multiple Expert Approach to the Class Imbalance Problem Using Inverse Random under Sampling

Multiple Classifier Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-642-02326-2_9 ◽

2009 ◽

pp. 82-91 ◽

Cited By ~ 29

Author(s):

Muhammad Atif Tahir ◽

Josef Kittler ◽

Krystian Mikolajczyk ◽

Fei Yan

Keyword(s):

Class Imbalance ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Under Sampling

Download Full-text

A novel framework for class imbalance learning using intelligent under-sampling

Progress in Artificial Intelligence ◽

10.1007/s13748-012-0038-2 ◽

2012 ◽

Vol 2 (1) ◽

pp. 73-84 ◽

Cited By ~ 4

Author(s):

Satuluri Naganjaneyulu ◽

Mrithyumjaya Rao Kuppa

Keyword(s):

Class Imbalance ◽

Under Sampling ◽

Imbalance Learning ◽

Class Imbalance Learning

Download Full-text

Under-sampling by algorithm with performance guaranteed for class-imbalance problem

2014 International Computer Science and Engineering Conference (ICSEC) ◽

10.1109/icsec.2014.6978197 ◽

2014 ◽

Cited By ~ 4

Author(s):

Wattana Jindaluang ◽

Varin Chouvatut ◽

Sanpawat Kantabutra

Keyword(s):

Class Imbalance ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Under Sampling

Download Full-text

An approach to class imbalance problem based on stacking and inverse random under sampling methods

2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC) ◽

10.1109/icnsc.2018.8361344 ◽

2018 ◽

Cited By ~ 4

Author(s):

Yuwei Zhang ◽

Guanjun Liu ◽

Wenjing Luan ◽

Chungang Yan ◽

Changjun Jiang

Keyword(s):

Sampling Methods ◽

Class Imbalance ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Under Sampling

Download Full-text