SMOTE-Based Homogeneous Ensemble Methods for Software Defect Prediction

Software defect prediction (SDP) is the process of predicting defects in software modules, it identifies the modules that are defective and require extensive testing. Classification algorithms that help to predict software defects play a major role in software engineering process. Some studies have depicted that the use of ensembles is often more accurate than using single classifiers. However, variations exist from studies, which posited that the efficiency of learning algorithms might vary using different performance measures. This is because most studies on SDP consider the accuracy of the model or classifier above other performance metrics. This paper evaluated the performance of single classifiers (SMO, MLP, kNN and Decision Tree) and ensembles (Bagging, Boosting, Stacking and Voting) in SDP considering major performance metrics using Analytic Network Process (ANP) multi-criteria decision method. The experiment was based on 11 performance metrics over 11 software defect datasets. Boosted SMO, Voting and Stacking Ensemble methods ranked highest with a priority level of 0.0493, 0.0493 and 0.0445 respectively. Decision tree ranked highest in single classifiers with 0.0410. These clearly show that ensemble methods can give better classification results in SDP and Boosting method gave the best result. In essence, it is valid to say that before deciding which model or classifier is better for software defect prediction, all performance metrics should be considered.Keywords— Data mining, Machine Learning, Multi Criteria Decision Making, Software Defect Prediction

Download Full-text

Software Defect Prediction Based on Feature Subset Selection and Ensemble Classification

ECTI Transactions on Computer and Information Technology (ECTI-CIT) ◽

10.37936/ecti-cit.2020142.224489 ◽

2020 ◽

Vol 14 (2) ◽

pp. 213-228

Author(s):

Ahmad A Saifan ◽

Lina Abu-wardih

Keyword(s):

Feature Selection ◽

Ensemble Methods ◽

Feature Subset Selection ◽

Defect Prediction ◽

Software Defect Prediction ◽

Pearson’S Correlation ◽

Ensemble Models ◽

Software Defect ◽

Pearson's Correlation ◽

Feature Selection Techniques

Two primary issues have emerged in the machine learning and data mining community: how to deal with imbalanced data and how to choose appropriate features. These are of particular concern in the software engineering domain, and more specifically the field of software defect prediction. This research highlights a procedure which includes a feature selection technique to single out relevant attributes, and an ensemble technique to handle the class-imbalance issue. In order to determine the advantages of feature selection and ensemble methods we look at two potential scenarios: (1) Ensemble models constructed from the original datasets, without feature selection; (2) Ensemble models constructed from the reduced datasets after feature selection has been applied. Four feature selection techniques are employed: Principal Component Analysis (PCA), Pearson’s correlation, Greedy Stepwise Forward selection, and Information Gain (IG). The aim of this research is to assess the effectiveness of feature selection techniques using ensemble techniques. Five datasets, obtained from the PROMISE software depository, are analyzed; tentative results indicate that ensemble methods can improve the model's performance without the use of feature selection techniques. PCA feature selection and bagging based on K-NN perform better than both bagging based on SVM and boosting based on K-NN and SVM, and feature selection techniques including Pearson’s correlation, Greedy stepwise, and IG weaken the ensemble models’ performance.

Download Full-text

ENSEMBLE OF SOFTWARE DEFECT PREDICTORS: AN AHP-BASED EVALUATION METHOD

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622011004282 ◽

2011 ◽

Vol 10 (01) ◽

pp. 187-206 ◽

Cited By ~ 116

Author(s):

YI PENG ◽

GANG KOU ◽

GUOXUN WANG ◽

WENSHUAI WU ◽

YONG SHI

Keyword(s):

Evaluation Method ◽

Nearest Neighbor ◽

Performance Metrics ◽

Ensemble Methods ◽

Multicriteria Decision Making ◽

Defect Prediction ◽

Average Error ◽

Software Defect Prediction ◽

K Nearest Neighbor ◽

Software Defect

Classification algorithms that help to identify software defects or faults play a crucial role in software risk management. Experimental results have shown that ensemble of classifiers are often more accurate and robust to the effects of noisy data, and achieve lower average error rate than any of the constituent classifiers. However, inconsistencies exist in different studies and the performances of learning algorithms may vary using different performance measures and under different circumstances. Therefore, more research is needed to evaluate the performance of ensemble algorithms in software defect prediction. The goal of this paper is to assess the quality of ensemble methods in software defect prediction with the analytic hierarchy process (AHP), which is a multicriteria decision-making approach that prioritizes decision alternatives based on pairwise comparisons. Through the application of the AHP, this study compares experimentally the performance of several popular ensemble methods using 13 different performance metrics over 10 public-domain software defect datasets from the NASA Metrics Data Program (MDP) repository. The results indicate that ensemble methods can improve the classification results of software defect prediction in general and AdaBoost gives the best results. In addition, tree and rule based classifiers perform better in software defect prediction than other types of classifiers included in the experiment. In terms of single classifier, K-nearest-neighbor, C4.5, and Naïve Bayes tree ranked higher than other classifiers.

Download Full-text