Software Defect Prediction Using a Hybrid Model Based on Semantic Features Learned from the Source Code

Research on software defect prediction has achieved great success at modeling predictors. To build more accurate predictors, a number of hand-crafted features are proposed, such as static code features, process features, and social network features. Few models, however, consider the semantic and structural features of programs. Understanding the context information of source code files could explain a lot about the cause of defects in software. In this paper, we leverage representation learning for semantic and structural features generation. Specifically, we first extract token vectors of code files based on the Abstract Syntax Trees (ASTs) and then feed the token vectors into Convolutional Neural Network (CNN) to automatically learn semantic features. Meanwhile, we also construct a complex network model based on the dependencies between code files, namely, software network (SN). After that, to learn the structural features, we apply the network embedding method to the resulting SN. Finally, we build a novel software defect prediction model based on the learned semantic and structural features (SDP-S2S). We evaluated our method on 6 projects collected from public PROMISE repositories. The results suggest that the contribution of structural features extracted from software network is prominent, and when combined with semantic features, the results seem to be better. In addition, compared with the traditional hand-crafted features, the F-measure values of SDP-S2S are generally increased, with a maximum growth rate of 99.5%. We also explore the parameter sensitivity in the learning process of semantic and structural features and provide guidance for the optimization of predictors.

Download Full-text

Research of Software Defect Prediction Model Based on ACO-SVM

Chinese Journal of Computers ◽

10.3724/sp.j.1016.2011.01148 ◽

2011 ◽

Vol 34 (6) ◽

pp. 1148-1154 ◽

Cited By ~ 13

Author(s):

Hui-Yan JIANG ◽

Mao ZONG ◽

Xiang-Ying LIU

Keyword(s):

Prediction Model ◽

Defect Prediction ◽

Software Defect Prediction ◽

Model Based ◽

Software Defect

Download Full-text

Research of Software Defect Prediction Model Based on Gray Theory

2009 International Conference on Management and Service Science ◽

10.1109/icmss.2009.5301677 ◽

2009 ◽

Cited By ~ 1

Author(s):

Zhuo-yuan Xiang ◽

Zhitao Tang

Keyword(s):

Prediction Model ◽

Defect Prediction ◽

Software Defect Prediction ◽

Model Based ◽

Software Defect ◽

Gray Theory

Download Full-text

Learning Semantic Features for Software Defect Prediction by Code Comments Embedding

2018 IEEE International Conference on Data Mining (ICDM) ◽

10.1109/icdm.2018.00133 ◽

2018 ◽

Cited By ~ 1

Author(s):

Xuan Huo ◽

Yang Yang ◽

Ming Li ◽

De-Chuan Zhan

Keyword(s):

Defect Prediction ◽

Semantic Features ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

Software Defect Prediction Model Based on Stacked Denoising Auto-Encoder

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Artificial Intelligence for Communications and Networks ◽

10.1007/978-3-030-22971-9_2 ◽

2019 ◽

pp. 18-27

Author(s):

Yu Zhu ◽

Dongjin Yin ◽

Yingtao Gan ◽

Lanlan Rui ◽

Guoxin Xia

Keyword(s):

Prediction Model ◽

Defect Prediction ◽

Software Defect Prediction ◽

Model Based ◽

Software Defect

Download Full-text

A Novel Model Based on Nonlinear Manifold Detection for Software Defect Prediction

2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS) ◽

10.1109/iccons.2018.8663026 ◽

2018 ◽

Cited By ~ 2

Author(s):

Soumi Ghosh ◽

Ajay Rana ◽

Vineet Kansal

Keyword(s):

Defect Prediction ◽

Software Defect Prediction ◽

Model Based ◽

Software Defect ◽

Novel Model

Download Full-text

Defects in The Next Release; Software Defect Prediction Based on Source Code Versions

Electrical Engineering (ICEE), Iranian Conference on ◽

10.1109/icee.2018.8472535 ◽

2018 ◽

Author(s):

Molouk Mishmast Nehi ◽

Zahra Fakhrpoor ◽

Mohammad R. Moosavi

Keyword(s):

Source Code ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

Research on an Educational Software Defect Prediction Model Based on SVM

Entertainment for Education. Digital Techniques and Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-642-14533-9_22 ◽

2010 ◽

pp. 215-222 ◽

Cited By ~ 1

Author(s):

Guang-jie Liu ◽

Wen-yong Wang

Keyword(s):

Prediction Model ◽

Educational Software ◽

Defect Prediction ◽

Software Defect Prediction ◽

Model Based ◽

Software Defect

Download Full-text

An Improved CNN Model for Within-Project Software Defect Prediction

Applied Sciences ◽

10.3390/app9102138 ◽

2019 ◽

Vol 9 (10) ◽

pp. 2138 ◽

Cited By ~ 4

Author(s):

Cong Pan ◽

Minyan Lu ◽

Biao Xu ◽

Houleng Gao

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Source Code ◽

The State ◽

Defect Prediction ◽

Software Defect Prediction ◽

Learning Models ◽

Software Defect ◽

Original Dataset ◽

Holdout Validation

To improve software reliability, software defect prediction is used to find software bugs and prioritize testing efforts. Recently, some researchers introduced deep learning models, such as the deep belief network (DBN) and the state-of-the-art convolutional neural network (CNN), and used automatically generated features extracted from abstract syntax trees (ASTs) and deep learning models to improve defect prediction performance. However, the research on the CNN model failed to reveal clear conclusions due to its limited dataset size, insufficiently repeated experiments, and outdated baseline selection. To solve these problems, we built the PROMISE Source Code (PSC) dataset to enlarge the original dataset in the CNN research, which we named the Simplified PROMISE Source Code (SPSC) dataset. Then, we proposed an improved CNN model for within-project defect prediction (WPDP) and compared our results to existing CNN results and an empirical study. Our experiment was based on a 30-repetition holdout validation and a 10 * 10 cross-validation. Experimental results showed that our improved CNN model was comparable to the existing CNN model, and it outperformed the state-of-the-art machine learning models significantly for WPDP. Furthermore, we defined hyperparameter instability and examined the threat and opportunity it presents for deep learning models on defect prediction.

Download Full-text

Software Defect Prediction Model Based On KPCA-SVM

2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) ◽

10.1109/smartworld-uic-atc-scalcom-iop-sci.2019.00244 ◽

2019 ◽

Author(s):

Yan Zhou ◽

Chun Shan ◽

Shiyou Sun ◽

Shengjun Wei ◽

Sicong Zhang

Keyword(s):

Prediction Model ◽

Defect Prediction ◽

Software Defect Prediction ◽

Model Based ◽

Software Defect

Download Full-text