scholarly journals Local versus Global Models for Just-In-Time Software Defect Prediction

2019 ◽  
Vol 2019 ◽  
pp. 1-13 ◽  
Author(s):  
Xingguang Yang ◽  
Huiqun Yu ◽  
Guisheng Fan ◽  
Kai Shi ◽  
Liqiong Chen

Just-in-time software defect prediction (JIT-SDP) is an active topic in software defect prediction, which aims to identify defect-inducing changes. Recently, some studies have found that the variability of defect data sets can affect the performance of defect predictors. By using local models, it can help improve the performance of prediction models. However, previous studies have focused on module-level defect prediction. Whether local models are still valid in the context of JIT-SDP is an important issue. To this end, we compare the performance of local and global models through a large-scale empirical study based on six open-source projects with 227417 changes. The experiment considers three evaluation scenarios of cross-validation, cross-project-validation, and timewise-cross-validation. To build local models, the experiment uses the k-medoids to divide the training set into several homogeneous regions. In addition, logistic regression and effort-aware linear regression (EALR) are used to build classification models and effort-aware prediction models, respectively. The empirical results show that local models perform worse than global models in the classification performance. However, local models have significantly better effort-aware prediction performance than global models in the cross-validation and cross-project-validation scenarios. Particularly, when the number of clusters k is set to 2, local models can obtain optimal effort-aware prediction performance. Therefore, local models are promising for effort-aware JIT-SDP.

Author(s):  
Xingguang Yang ◽  
Huiqun Yu ◽  
Guisheng Fan ◽  
Kang Yang

Software defect prediction is an effective approach to save testing resources and improve software quality, which is widely studied in the field of software engineering. The effort-aware just-in-time software defect prediction (JIT-SDP) aims to identify defective software changes in limited software testing resources. Although many methods have been proposed to solve the JIT-SDP, the effort-aware prediction performance of the existing models still needs to be further improved. To this end, we propose a differential evolution (DE) based supervised method DEJIT to build JIT-SDP models. Specifically, first we propose a metric called density-percentile-average (DPA), which is used as optimization objective on the training set. Then, we use logistic regression (LR) to build a prediction model. To make the LR obtain the maximum DPA on the training set, we use the DE algorithm to determine the coefficients of the LR. The experiment uses defect data sets from six open source projects. We compare the proposed method with state-of-the-art four supervised models and four unsupervised models in cross-validation, cross-project-validation and timewise-cross-validation scenarios. The empirical results demonstrate that the DEJIT method can significantly improve the effort-aware prediction performance in the three evaluation scenarios. Therefore, the DEJIT method is promising for the effort-aware JIT-SDP.


2021 ◽  
Vol 11 (11) ◽  
pp. 4793
Author(s):  
Cong Pan ◽  
Minyan Lu ◽  
Biao Xu

Deep learning-based software defect prediction has been popular these days. Recently, the publishing of the CodeBERT model has made it possible to perform many software engineering tasks. We propose various CodeBERT models targeting software defect prediction, including CodeBERT-NT, CodeBERT-PS, CodeBERT-PK, and CodeBERT-PT. We perform empirical studies using such models in cross-version and cross-project software defect prediction to investigate if using a neural language model like CodeBERT could improve prediction performance. We also investigate the effects of different prediction patterns in software defect prediction using CodeBERT models. The empirical results are further discussed.


Author(s):  
Liqiong Chen ◽  
Shilong Song ◽  
Can Wang

Just-in-time software defect prediction (JIT-SDP) is a fine-grained software defect prediction technology, which aims to identify the defective code changes in software systems. Effort-aware software defect prediction is a software defect prediction technology that takes into consideration the cost of code inspection, which can find more defective code changes in limited test resources. The traditional effort-aware defect prediction model mainly measures the effort based on the number of lines of code (LOC) and rarely considers additional factors. This paper proposes a novel effort measure method called Multi-Metric Joint Calculation (MMJC). When measuring the effort, MMJC takes into account not only LOC, but also the distribution of modified code across different files (Entropy), the number of developers that changed the files (NDEV) and the developer experience (EXP). In the simulation experiment, MMJC is combined with Linear Regression, Decision Tree, Random Forest, LightGBM, Support Vector Machine and Neural Network, respectively, to build the software defect prediction model. Several comparative experiments are conducted between the models based on MMJC and baseline models. The results show that indicators ACC and [Formula: see text] of the models based on MMJC are improved by 35.3% and 15.9% on average in the three verification scenarios, respectively, compared with the baseline models.


Sign in / Sign up

Export Citation Format

Share Document