SLDeep: Statement-level software defect prediction using deep-learning model on static code features

This paper provides an in-depth study and analysis of software defect prediction methods in a cloud environment and uses a deep learning approach to justify software prediction. A cost penalty term is added to the supervised part of the deep ladder network; that is, the misclassification cost of different classes is added to the model. A cost-sensitive deep ladder network-based software defect prediction model is proposed, which effectively mitigates the negative impact of the class imbalance problem on defect prediction. To address the problem of lack or insufficiency of historical data from the same project, a flow learning-based geodesic cross-project software defect prediction method is proposed. Drawing on data information from other projects, a migration learning approach was used to embed the source and target datasets into a Gaussian manifold. The kernel encapsulates the incremental changes between the differences and commonalities between the two domains. To this point, the subspace is the space of two distributional approximations formed by the source and target data transformations, with traditional in-project software defect classifiers used to predict labels. It is found that real-time defect prediction is more practical because it has a smaller amount of code to review; only individual changes need to be reviewed rather than entire files or packages while making it easier for developers to assign fixes to defects. More importantly, this paper combines deep belief network techniques with real-time defect prediction at a fine-grained level and TCA techniques to deal with data imbalance and proposes an improved deep belief network approach for real-time defect prediction, while trying to change the machine learning classifier underlying DBN for different experimental studies, and the results not only validate the effectiveness of using TCA techniques to solve the data imbalance problem but also show that the defect prediction model learned by the improved method in this paper has better prediction performance.

Download Full-text

Deep learning based software defect prediction

Neurocomputing ◽

10.1016/j.neucom.2019.11.067 ◽

2020 ◽

Vol 385 ◽

pp. 100-110 ◽

Cited By ~ 2

Author(s):

Lei Qiao ◽

Xuesong Li ◽

Qasim Umer ◽

Ping Guo

Keyword(s):

Deep Learning ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

Research on software defect prediction technology based on deep learning

2021 2nd International Conference on Computing and Data Science (CDS) ◽

10.1109/cds52072.2021.00024 ◽

2021 ◽

Author(s):

Pengcheng Jiang

Keyword(s):

Deep Learning ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

An Empirical Study on Software Defect Prediction Using CodeBERT Model

Applied Sciences ◽

10.3390/app11114793 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4793

Author(s):

Cong Pan ◽

Minyan Lu ◽

Biao Xu

Keyword(s):

Deep Learning ◽

Software Engineering ◽

Empirical Study ◽

Empirical Studies ◽

Language Model ◽

Prediction Performance ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Cross Project

Deep learning-based software defect prediction has been popular these days. Recently, the publishing of the CodeBERT model has made it possible to perform many software engineering tasks. We propose various CodeBERT models targeting software defect prediction, including CodeBERT-NT, CodeBERT-PS, CodeBERT-PK, and CodeBERT-PT. We perform empirical studies using such models in cross-version and cross-project software defect prediction to investigate if using a neural language model like CodeBERT could improve prediction performance. We also investigate the effects of different prediction patterns in software defect prediction using CodeBERT models. The empirical results are further discussed.

Download Full-text

Optimal Machine learning Model for Software Defect Prediction

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2019.02.05 ◽

2019 ◽

Vol 11 (2) ◽

pp. 36-48

Author(s):

Tripti Lamba ◽

◽

Kavita ◽

A.K. Mishra

Keyword(s):

Machine Learning ◽

Learning Model ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Machine Learning Model ◽

Optimal Machine

Download Full-text

Deep Learning for Software Defect Prediction in time

2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC) ◽

10.1109/pdgc.2018.8745804 ◽

2018 ◽

Author(s):

Monika Yadav ◽

Vijendra Singh ◽

Priyanka Rastogi

Keyword(s):

Deep Learning ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

An Improved CNN Model for Within-Project Software Defect Prediction

Applied Sciences ◽

10.3390/app9102138 ◽

2019 ◽

Vol 9 (10) ◽

pp. 2138 ◽

Cited By ~ 4

Author(s):

Cong Pan ◽

Minyan Lu ◽

Biao Xu ◽

Houleng Gao

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Source Code ◽

The State ◽

Defect Prediction ◽

Software Defect Prediction ◽

Learning Models ◽

Software Defect ◽

Original Dataset ◽

Holdout Validation

To improve software reliability, software defect prediction is used to find software bugs and prioritize testing efforts. Recently, some researchers introduced deep learning models, such as the deep belief network (DBN) and the state-of-the-art convolutional neural network (CNN), and used automatically generated features extracted from abstract syntax trees (ASTs) and deep learning models to improve defect prediction performance. However, the research on the CNN model failed to reveal clear conclusions due to its limited dataset size, insufficiently repeated experiments, and outdated baseline selection. To solve these problems, we built the PROMISE Source Code (PSC) dataset to enlarge the original dataset in the CNN research, which we named the Simplified PROMISE Source Code (SPSC) dataset. Then, we proposed an improved CNN model for within-project defect prediction (WPDP) and compared our results to existing CNN results and an empirical study. Our experiment was based on a 30-repetition holdout validation and a 10 * 10 cross-validation. Experimental results showed that our improved CNN model was comparable to the existing CNN model, and it outperformed the state-of-the-art machine learning models significantly for WPDP. Furthermore, we defined hyperparameter instability and examined the threat and opportunity it presents for deep learning models on defect prediction.

Download Full-text