A differential evolution-based approach for effort-aware just-in-time software defect prediction

Author(s):  
Xingguang Yang ◽  
Huiqun Yu ◽  
Guisheng Fan ◽  
Kang Yang
Author(s):  
Xingguang Yang ◽  
Huiqun Yu ◽  
Guisheng Fan ◽  
Kang Yang

Software defect prediction is an effective approach to save testing resources and improve software quality, which is widely studied in the field of software engineering. The effort-aware just-in-time software defect prediction (JIT-SDP) aims to identify defective software changes in limited software testing resources. Although many methods have been proposed to solve the JIT-SDP, the effort-aware prediction performance of the existing models still needs to be further improved. To this end, we propose a differential evolution (DE) based supervised method DEJIT to build JIT-SDP models. Specifically, first we propose a metric called density-percentile-average (DPA), which is used as optimization objective on the training set. Then, we use logistic regression (LR) to build a prediction model. To make the LR obtain the maximum DPA on the training set, we use the DE algorithm to determine the coefficients of the LR. The experiment uses defect data sets from six open source projects. We compare the proposed method with state-of-the-art four supervised models and four unsupervised models in cross-validation, cross-project-validation and timewise-cross-validation scenarios. The empirical results demonstrate that the DEJIT method can significantly improve the effort-aware prediction performance in the three evaluation scenarios. Therefore, the DEJIT method is promising for the effort-aware JIT-SDP.


Author(s):  
Liqiong Chen ◽  
Shilong Song ◽  
Can Wang

Just-in-time software defect prediction (JIT-SDP) is a fine-grained software defect prediction technology, which aims to identify the defective code changes in software systems. Effort-aware software defect prediction is a software defect prediction technology that takes into consideration the cost of code inspection, which can find more defective code changes in limited test resources. The traditional effort-aware defect prediction model mainly measures the effort based on the number of lines of code (LOC) and rarely considers additional factors. This paper proposes a novel effort measure method called Multi-Metric Joint Calculation (MMJC). When measuring the effort, MMJC takes into account not only LOC, but also the distribution of modified code across different files (Entropy), the number of developers that changed the files (NDEV) and the developer experience (EXP). In the simulation experiment, MMJC is combined with Linear Regression, Decision Tree, Random Forest, LightGBM, Support Vector Machine and Neural Network, respectively, to build the software defect prediction model. Several comparative experiments are conducted between the models based on MMJC and baseline models. The results show that indicators ACC and [Formula: see text] of the models based on MMJC are improved by 35.3% and 15.9% on average in the three verification scenarios, respectively, compared with the baseline models.


2019 ◽  
Vol 2019 ◽  
pp. 1-13 ◽  
Author(s):  
Xingguang Yang ◽  
Huiqun Yu ◽  
Guisheng Fan ◽  
Kai Shi ◽  
Liqiong Chen

Just-in-time software defect prediction (JIT-SDP) is an active topic in software defect prediction, which aims to identify defect-inducing changes. Recently, some studies have found that the variability of defect data sets can affect the performance of defect predictors. By using local models, it can help improve the performance of prediction models. However, previous studies have focused on module-level defect prediction. Whether local models are still valid in the context of JIT-SDP is an important issue. To this end, we compare the performance of local and global models through a large-scale empirical study based on six open-source projects with 227417 changes. The experiment considers three evaluation scenarios of cross-validation, cross-project-validation, and timewise-cross-validation. To build local models, the experiment uses the k-medoids to divide the training set into several homogeneous regions. In addition, logistic regression and effort-aware linear regression (EALR) are used to build classification models and effort-aware prediction models, respectively. The empirical results show that local models perform worse than global models in the classification performance. However, local models have significantly better effort-aware prediction performance than global models in the cross-validation and cross-project-validation scenarios. Particularly, when the number of clusters k is set to 2, local models can obtain optimal effort-aware prediction performance. Therefore, local models are promising for effort-aware JIT-SDP.


2021 ◽  
Vol 11 (5) ◽  
pp. 2002
Author(s):  
Jonggu Kang ◽  
Sunjae Kwon ◽  
Duksan Ryu ◽  
Jongmoon Baik

Software is playing the most important role in recent vehicle innovations, and consequently the amount of software has rapidly grown in recent decades. The safety-critical nature of ships, one sort of vehicle, makes software quality assurance (SQA) a fundamental prerequisite. Just-in-time software defect prediction (JIT-SDP) aims to conduct software defect prediction (SDP) on commit-level code changes to achieve effective SQA resource allocation. The first case study of SDP in the maritime domain reported feasible prediction performance. However, we still consider that the prediction model has room for improvement since the parameters of the model are not optimized yet. Harmony search (HS) is a widely used music-inspired meta-heuristic optimization algorithm. In this article, we demonstrated that JIT-SDP can produce better performance of prediction by applying HS-based parameter optimization with balanced fitness value. Using two real-world datasets from the maritime software project, we obtained an optimized model that meets the performance criterion beyond the baseline of a previous case study throughout various defect to non-defect class imbalance ratio of datasets. Experiments with open source software also showed better recall for all datasets despite the fact that we considered balance as a performance index. HS-based parameter optimized JIT-SDP can be applied to the maritime domain software with a high class imbalance ratio. Finally, we expect that our research can be extended to improve the performance of JIT-SDP not only in maritime domain software but also in open source software.


Sign in / Sign up

Export Citation Format

Share Document