A comparative study of iterative and non-iterative feature selection techniques for software defect prediction

2013 ◽  
Vol 16 (5) ◽  
pp. 801-822 ◽  
Author(s):  
Taghi M. Khoshgoftaar ◽  
Kehan Gao ◽  
Amri Napolitano ◽  
Randall Wald
Author(s):  
Ahmad A Saifan ◽  
Lina Abu-wardih

Two primary issues have emerged in the machine learning and data mining community: how to deal with imbalanced data and how to choose appropriate features. These are of particular concern in the software engineering domain, and more specifically the field of software defect prediction. This research highlights a procedure which includes a feature selection technique to single out relevant attributes, and an ensemble technique to handle the class-imbalance issue. In order to determine the advantages of feature selection and ensemble methods we look at two potential scenarios: (1) Ensemble models constructed from the original datasets, without feature selection; (2) Ensemble models constructed from the reduced datasets after feature selection has been applied. Four feature selection techniques are employed: Principal Component Analysis (PCA), Pearson’s correlation, Greedy Stepwise Forward selection, and Information Gain (IG). The aim of this research is to assess the effectiveness of feature selection techniques using ensemble techniques. Five datasets, obtained from the PROMISE software depository, are analyzed; tentative results indicate that ensemble methods can improve the model's performance without the use of feature selection techniques. PCA feature selection and bagging based on K-NN perform better than both bagging based on SVM and boosting based on K-NN and SVM, and feature selection techniques including Pearson’s correlation, Greedy stepwise, and IG weaken the ensemble models’ performance.


Sign in / Sign up

Export Citation Format

Share Document