Random Forest and Novel Under-Sampling Strategy for Data Imbalance in Software Defect Prediction

Data imbalance is one among characteristics of software quality data sets that can have a negative effect on the performance of software defect prediction models. This study proposed an alternative to random under-sampling strategy by using only a subset of non-defective data which have been calculated as having biggest distance value to the centroid of defective data. Combined with random forest classification, the proposed method outperformed both the random under-sampling and non-sampling method on the basis of accuracy, AUC, f-measure, and true positive rate performance measures.

Download Full-text

Handling Class-Imbalance with KNN (Neighbourhood) Under-Sampling for Software Defect Prediction

Artificial Intelligence Review ◽

10.1007/s10462-021-10044-w ◽

2021 ◽

Author(s):

Somya Goyal

Keyword(s):

Class Imbalance ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Under Sampling

Download Full-text

Class Imbalance Issue in Software Defect Prediction Models by various Machine Learning Techniques: An Empirical Study

10.1109/icscc51209.2021.9528170 ◽

2021 ◽

Author(s):

Sushant Kumar Pandey ◽

Anil Kumar Tripathi

Keyword(s):

Machine Learning ◽

Empirical Study ◽

Prediction Models ◽

Class Imbalance ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Learning Techniques ◽

Defect Prediction Models

Download Full-text

Software Defect Prediction Using Random Forest Algorithm

2018 12th South East Asian Technical University Consortium (SEATUC) ◽

10.1109/seatuc.2018.8788881 ◽

2018 ◽

Cited By ~ 1

Author(s):

Yan Naung Soe ◽

Paulus Insap Santosa ◽

Rudy Hartanto

Keyword(s):

Random Forest ◽

Defect Prediction ◽

Software Defect Prediction ◽

Random Forest Algorithm ◽

Software Defect

Download Full-text

Impact of the Distribution Parameter of Data Sampling Approaches on Software Defect Prediction Models

2017 24th Asia-Pacific Software Engineering Conference (APSEC) ◽

10.1109/apsec.2017.76 ◽

2017 ◽

Author(s):

Kwabena Ebo Bennin ◽

Jacky Keung ◽

Akito Monden

Keyword(s):

Prediction Models ◽

Distribution Parameter ◽

Defect Prediction ◽

Software Defect Prediction ◽

Data Sampling ◽

Software Defect ◽

Defect Prediction Models

Download Full-text

On the assessment of software defect prediction models via ROC curves

Empirical Software Engineering ◽

10.1007/s10664-020-09861-4 ◽

2020 ◽

Vol 25 (5) ◽

pp. 3977-4019

Author(s):

Sandro Morasca ◽

Luigi Lavazza

Keyword(s):

Prediction Models ◽

Roc Curves ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Defect Prediction Models

Download Full-text

Evaluating Performance of Software Defect Prediction Models Using Area Under Precision-Recall Curve (AUC-PR)

2019 2nd International Conference on Advancements in Computational Sciences (ICACS) ◽

10.23919/icacs.2019.8689135 ◽

2019 ◽

Cited By ~ 1

Author(s):

Shahzad Ali Khan ◽

Zeeshan Ali Rana

Keyword(s):

Prediction Models ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Precision Recall Curve ◽

Defect Prediction Models ◽

Recall Curve

Download Full-text

Software Defect Prediction using Feature Selection and Random Forest Algorithm

2017 International Conference on New Trends in Computing Sciences (ICTCS) ◽

10.1109/ictcs.2017.39 ◽

2017 ◽

Cited By ~ 8

Author(s):

Dyana Rashid Ibrahim ◽

Rawan Ghnemat ◽

Amjad Hudaib

Keyword(s):

Feature Selection ◽

Random Forest ◽

Defect Prediction ◽

Software Defect Prediction ◽

Random Forest Algorithm ◽

Software Defect

Download Full-text

A Review of Software Defect Prediction Models

Data Management, Analytics and Innovation - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-13-1402-5_7 ◽

2018 ◽

pp. 89-97 ◽

Cited By ~ 1

Author(s):

Harshita Tanwar ◽

Misha Kakkar

Keyword(s):

Prediction Models ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Defect Prediction Models

Download Full-text

Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach

Applied Sciences ◽

10.3390/app9132764 ◽

2019 ◽

Vol 9 (13) ◽

pp. 2764 ◽

Cited By ~ 8

Author(s):

Abdullateef Oluwagbemiga Balogun ◽

Shuib Basri ◽

Said Jadid Abdulkadir ◽

Ahmad Sobri Hashim

Keyword(s):

Software Metrics ◽

Prediction Models ◽

Predictive Performance ◽

Search Method ◽

Feature Subset Selection ◽

Defect Prediction ◽

Feature Subset ◽

Software Defect Prediction ◽

Software Defect

Software Defect Prediction (SDP) models are built using software metrics derived from software systems. The quality of SDP models depends largely on the quality of software metrics (dataset) used to build the SDP models. High dimensionality is one of the data quality problems that affect the performance of SDP models. Feature selection (FS) is a proven method for addressing the dimensionality problem. However, the choice of FS method for SDP is still a problem, as most of the empirical studies on FS methods for SDP produce contradictory and inconsistent quality outcomes. Those FS methods behave differently due to different underlining computational characteristics. This could be due to the choices of search methods used in FS because the impact of FS depends on the choice of search method. It is hence imperative to comparatively analyze the FS methods performance based on different search methods in SDP. In this paper, four filter feature ranking (FFR) and fourteen filter feature subset selection (FSS) methods were evaluated using four different classifiers over five software defect datasets obtained from the National Aeronautics and Space Administration (NASA) repository. The experimental analysis showed that the application of FS improves the predictive performance of classifiers and the performance of FS methods can vary across datasets and classifiers. In the FFR methods, Information Gain demonstrated the greatest improvements in the performance of the prediction models. In FSS methods, Consistency Feature Subset Selection based on Best First Search had the best influence on the prediction models. However, prediction models based on FFR proved to be more stable than those based on FSS methods. Hence, we conclude that FS methods improve the performance of SDP models, and that there is no single best FS method, as their performance varied according to datasets and the choice of the prediction model. However, we recommend the use of FFR methods as the prediction models based on FFR are more stable in terms of predictive performance.

Download Full-text

A novel method for software defect prediction: Hybrid of FCM and random forest

2014 International Conference on Electronics and Communication Systems (ICECS) ◽

10.1109/ecs.2014.6892743 ◽

2014 ◽

Cited By ~ 4

Author(s):

T. P. Pushphavathi ◽

V. Suma ◽

V. Ramaswamy

Keyword(s):

Random Forest ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Novel Method

Download Full-text