Local versus Global Models for Just-In-Time Software Defect Prediction

Just-in-time software defect prediction (JIT-SDP) is an active topic in software defect prediction, which aims to identify defect-inducing changes. Recently, some studies have found that the variability of defect data sets can affect the performance of defect predictors. By using local models, it can help improve the performance of prediction models. However, previous studies have focused on module-level defect prediction. Whether local models are still valid in the context of JIT-SDP is an important issue. To this end, we compare the performance of local and global models through a large-scale empirical study based on six open-source projects with 227417 changes. The experiment considers three evaluation scenarios of cross-validation, cross-project-validation, and timewise-cross-validation. To build local models, the experiment uses the k-medoids to divide the training set into several homogeneous regions. In addition, logistic regression and effort-aware linear regression (EALR) are used to build classification models and effort-aware prediction models, respectively. The empirical results show that local models perform worse than global models in the classification performance. However, local models have significantly better effort-aware prediction performance than global models in the cross-validation and cross-project-validation scenarios. Particularly, when the number of clusters k is set to 2, local models can obtain optimal effort-aware prediction performance. Therefore, local models are promising for effort-aware JIT-SDP.

Download Full-text

DEJIT: A Differential Evolution Algorithm for Effort-Aware Just-in-Time Software Defect Prediction

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194021500108 ◽

2021 ◽

Vol 31 (03) ◽

pp. 289-310

Author(s):

Xingguang Yang ◽

Huiqun Yu ◽

Guisheng Fan ◽

Kang Yang

Keyword(s):

Differential Evolution ◽

Cross Validation ◽

Differential Evolution Algorithm ◽

Prediction Performance ◽

Defect Prediction ◽

Just In Time ◽

Software Defect Prediction ◽

Training Set ◽

De Algorithm ◽

Software Defect

Software defect prediction is an effective approach to save testing resources and improve software quality, which is widely studied in the field of software engineering. The effort-aware just-in-time software defect prediction (JIT-SDP) aims to identify defective software changes in limited software testing resources. Although many methods have been proposed to solve the JIT-SDP, the effort-aware prediction performance of the existing models still needs to be further improved. To this end, we propose a differential evolution (DE) based supervised method DEJIT to build JIT-SDP models. Specifically, first we propose a metric called density-percentile-average (DPA), which is used as optimization objective on the training set. Then, we use logistic regression (LR) to build a prediction model. To make the LR obtain the maximum DPA on the training set, we use the DE algorithm to determine the coefficients of the LR. The experiment uses defect data sets from six open source projects. We compare the proposed method with state-of-the-art four supervised models and four unsupervised models in cross-validation, cross-project-validation and timewise-cross-validation scenarios. The empirical results demonstrate that the DEJIT method can significantly improve the effort-aware prediction performance in the three evaluation scenarios. Therefore, the DEJIT method is promising for the effort-aware JIT-SDP.

Download Full-text

An investigation of cross-project learning in online just-in-time software defect prediction

Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering ◽

10.1145/3377811.3380403 ◽

2020 ◽

Cited By ~ 1

Author(s):

Sadia Tabassum ◽

Leandro L. Minku ◽

Danyi Feng ◽

George G. Cabral ◽

Liyan Song

Keyword(s):

Defect Prediction ◽

Just In Time ◽

Software Defect Prediction ◽

Project Learning ◽

Software Defect ◽

Cross Project

Download Full-text

An Empirical Study on Software Defect Prediction Using CodeBERT Model

Applied Sciences ◽

10.3390/app11114793 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4793

Author(s):

Cong Pan ◽

Minyan Lu ◽

Biao Xu

Keyword(s):

Deep Learning ◽

Software Engineering ◽

Empirical Study ◽

Empirical Studies ◽

Language Model ◽

Prediction Performance ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Cross Project

Deep learning-based software defect prediction has been popular these days. Recently, the publishing of the CodeBERT model has made it possible to perform many software engineering tasks. We propose various CodeBERT models targeting software defect prediction, including CodeBERT-NT, CodeBERT-PS, CodeBERT-PK, and CodeBERT-PT. We perform empirical studies using such models in cross-version and cross-project software defect prediction to investigate if using a neural language model like CodeBERT could improve prediction performance. We also investigate the effects of different prediction patterns in software defect prediction using CodeBERT models. The empirical results are further discussed.

Download Full-text

Class Imbalance Issue in Software Defect Prediction Models by various Machine Learning Techniques: An Empirical Study

10.1109/icscc51209.2021.9528170 ◽

2021 ◽

Author(s):

Sushant Kumar Pandey ◽

Anil Kumar Tripathi

Keyword(s):

Machine Learning ◽

Empirical Study ◽

Prediction Models ◽

Class Imbalance ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Learning Techniques ◽

Defect Prediction Models

Download Full-text

MULTI: Multi-objective effort-aware just-in-time software defect prediction

Information and Software Technology ◽

10.1016/j.infsof.2017.08.004 ◽

2018 ◽

Vol 93 ◽

pp. 1-13 ◽

Cited By ~ 33

Author(s):

Xiang Chen ◽

Yingquan Zhao ◽

Qiuping Wang ◽

Zhidan Yuan

Keyword(s):

Defect Prediction ◽

Just In Time ◽

Software Defect Prediction ◽

Multi Objective ◽

Software Defect

Download Full-text

Impact of the Distribution Parameter of Data Sampling Approaches on Software Defect Prediction Models

2017 24th Asia-Pacific Software Engineering Conference (APSEC) ◽

10.1109/apsec.2017.76 ◽

2017 ◽

Author(s):

Kwabena Ebo Bennin ◽

Jacky Keung ◽

Akito Monden

Keyword(s):

Prediction Models ◽

Distribution Parameter ◽

Defect Prediction ◽

Software Defect Prediction ◽

Data Sampling ◽

Software Defect ◽

Defect Prediction Models

Download Full-text

A Novel Effort Measure Method for Effort-Aware Just-in-Time Software Defect Prediction

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194021500364 ◽

2021 ◽

Vol 31 (08) ◽

pp. 1145-1169

Author(s):

Liqiong Chen ◽

Shilong Song ◽

Can Wang

Keyword(s):

Prediction Model ◽

Defect Prediction ◽

Software Systems ◽

Support Vector ◽

Just In Time ◽

Software Defect Prediction ◽

Fine Grained ◽

Software Defect ◽

Code Changes ◽

The Cost

Just-in-time software defect prediction (JIT-SDP) is a fine-grained software defect prediction technology, which aims to identify the defective code changes in software systems. Effort-aware software defect prediction is a software defect prediction technology that takes into consideration the cost of code inspection, which can find more defective code changes in limited test resources. The traditional effort-aware defect prediction model mainly measures the effort based on the number of lines of code (LOC) and rarely considers additional factors. This paper proposes a novel effort measure method called Multi-Metric Joint Calculation (MMJC). When measuring the effort, MMJC takes into account not only LOC, but also the distribution of modified code across different files (Entropy), the number of developers that changed the files (NDEV) and the developer experience (EXP). In the simulation experiment, MMJC is combined with Linear Regression, Decision Tree, Random Forest, LightGBM, Support Vector Machine and Neural Network, respectively, to build the software defect prediction model. Several comparative experiments are conducted between the models based on MMJC and baseline models. The results show that indicators ACC and [Formula: see text] of the models based on MMJC are improved by 35.3% and 15.9% on average in the three verification scenarios, respectively, compared with the baseline models.

Download Full-text