Training Data Selection Using Ensemble Dataset Approach for Software Defect Prediction

Author(s):  
Md Fahimuzzman Sohan ◽  
Md Alamgir Kabir ◽  
Mostafijur Rahman ◽  
S. M. Hasan Mahmud ◽  
Touhid Bhuiyan
2021 ◽  
Vol 94 ◽  
pp. 107370
Author(s):  
Shang Zheng ◽  
Jinjing Gai ◽  
Hualong Yu ◽  
Haitao Zou ◽  
Shang Gao

IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 110059-110072
Author(s):  
Jie Zhang ◽  
Jiajing Wu ◽  
Chuan Chen ◽  
Zibin Zheng ◽  
Michael R. Lyu

Symmetry ◽  
2019 ◽  
Vol 11 (2) ◽  
pp. 212 ◽  
Author(s):  
Le Son ◽  
Nakul Pritam ◽  
Manju Khari ◽  
Raghvendra Kumar ◽  
Pham Phuong ◽  
...  

Software defect prediction has been one of the key areas of exploration in the domain of software quality. In this paper, we perform a systematic mapping to analyze all the software defect prediction literature available from 1995 to 2018 using a multi-stage process. A total of 156 studies are selected in the first step, and the final mapping is conducted based on these studies. The ability of a model to learn from data that does not come from the same project or organization will help organizations that do not have sufficient training data or are going to start work on new projects. The findings of this research are useful not only to the software engineering domain, but also to the empirical studies, which mainly focus on symmetry as they provide steps-by-steps solutions for questions raised in the article.


Sensors ◽  
2021 ◽  
Vol 21 (22) ◽  
pp. 7535
Author(s):  
Haoyu Luo ◽  
Heng Dai ◽  
Weiqiang Peng ◽  
Wenhua Hu ◽  
Fuyang Li

Ranking-oriented cross-project defect prediction (ROCPDP), which ranks software modules of a new target industrial project based on the predicted defect number or density, has been suggested in the literature. A major concern of ROCPDP is the distribution difference between the source project (aka. within-project) data and target project (aka. cross-project) data, which evidently degrades prediction performance. To investigate the impacts of training data selection methods on the performances of ROCPDP models, we examined the practical effects of nine training data selection methods, including a global filter, which does not filter out any cross-project data. Additionally, the prediction performances of ROCPDP models trained on the filtered cross-project data using the training data selection methods were compared with those of ranking-oriented within-project defect prediction (ROWPDP) models trained on sufficient and limited within-project data. Eleven available defect datasets from the industrial projects were considered and evaluated using two ranking performance measures, i.e., FPA and Norm(Popt). The results showed no statistically significant differences among these nine training data selection methods in terms of FPA and Norm(Popt). The performances of ROCPDP models trained on filtered cross-project data were not comparable with those of ROWPDP models trained on sufficient historical within-project data. However, ROCPDP models trained on filtered cross-project data achieved better performance values than ROWPDP models trained on limited historical within-project data. Therefore, we recommended that software quality teams exploit other project datasets to perform ROCPDP when there is no or limited within-project data.


2016 ◽  
Vol 136 (12) ◽  
pp. 898-907 ◽  
Author(s):  
Joao Gari da Silva Fonseca Junior ◽  
Hideaki Ohtake ◽  
Takashi Oozeki ◽  
Kazuhiko Ogimoto

Sign in / Sign up

Export Citation Format

Share Document