scholarly journals Revisiting heterogeneous defect prediction methods: How far are we?

2021 ◽  
Vol 130 ◽  
pp. 106441
Author(s):  
Xiang Chen ◽  
Yanzhou Mu ◽  
Ke Liu ◽  
Zhanqi Cui ◽  
Chao Ni

2019 ◽  
Vol E102.D (3) ◽  
pp. 537-549 ◽  
Author(s):  
Lina GONG ◽  
Shujuan JIANG ◽  
Qiao YU ◽  
Li JIANG


Author(s):  
Ying Sun ◽  
Xiao-Yuan Jing ◽  
Fei Wu ◽  
Xiwei Dong ◽  
Yanfei Sun ◽  
...  

The heterogeneous defect prediction (HDP) technique can predict defects in a target company using heterogeneous metric data from external company, which has received substantial research attention. However, existing HDP methods assume that source data is labeled but labeling data is expensive. Semi-supervised defect prediction technique can perform defect prediction with few labeled data. In this paper, we investigate a new problem — semi-supervised HDP (SHDP). To solve this problem, we propose a new approach named cost-sensitive kernel semi-supervised correlation analysis (CKSCA) as a solution of SHDP problem. It introduces unified metric representation and canonical correlation analysis to make the data distributions of different company projects more similar. CKSCA also designs a cost-sensitive kernel semi-supervised discriminant analysis mechanism to utilize the limited labeled data and sufficient real-life unlabeled data from different companies. Besides we collect lots of open-source projects from GitHub website to construct a new large-scale unlabeled dataset called GITHUB dataset. It contains 26,407 modules and is greater than each public project dataset. It has been public online and can be extended continuously. Experiments on the GITHUB dataset and other public datasets indicate that unlabeled GITHUB data can help prediction model improve prediction performance, and CKSCA is effective and efficient for solving SHDP problem.



2019 ◽  
Vol 32 (5) ◽  
Author(s):  
Xiang Chen ◽  
Yanzhou Mu ◽  
Yubin Qu ◽  
Chao Ni ◽  
Meng Liu ◽  
...  




2018 ◽  
Vol 44 (9) ◽  
pp. 874-896 ◽  
Author(s):  
Jaechang Nam ◽  
Wei Fu ◽  
Sunghun Kim ◽  
Tim Menzies ◽  
Lin Tan


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Wenjian Liu ◽  
Baoping Wang ◽  
Wennan Wang

This paper provides an in-depth study and analysis of software defect prediction methods in a cloud environment and uses a deep learning approach to justify software prediction. A cost penalty term is added to the supervised part of the deep ladder network; that is, the misclassification cost of different classes is added to the model. A cost-sensitive deep ladder network-based software defect prediction model is proposed, which effectively mitigates the negative impact of the class imbalance problem on defect prediction. To address the problem of lack or insufficiency of historical data from the same project, a flow learning-based geodesic cross-project software defect prediction method is proposed. Drawing on data information from other projects, a migration learning approach was used to embed the source and target datasets into a Gaussian manifold. The kernel encapsulates the incremental changes between the differences and commonalities between the two domains. To this point, the subspace is the space of two distributional approximations formed by the source and target data transformations, with traditional in-project software defect classifiers used to predict labels. It is found that real-time defect prediction is more practical because it has a smaller amount of code to review; only individual changes need to be reviewed rather than entire files or packages while making it easier for developers to assign fixes to defects. More importantly, this paper combines deep belief network techniques with real-time defect prediction at a fine-grained level and TCA techniques to deal with data imbalance and proposes an improved deep belief network approach for real-time defect prediction, while trying to change the machine learning classifier underlying DBN for different experimental studies, and the results not only validate the effectiveness of using TCA techniques to solve the data imbalance problem but also show that the defect prediction model learned by the improved method in this paper has better prediction performance.





Sign in / Sign up

Export Citation Format

Share Document