Revisiting heterogeneous defect prediction methods: How far are we?

A Novel Feature Selection Approach based on Binary Particle Swarm Optimization and Ensemble Learning for Heterogeneous Defect Prediction

2021 3rd Asia Pacific Information Technology Conference ◽

10.1145/3449365.3449384 ◽

2021 ◽

Author(s):

Ruchika Malhotra ◽

Anmol Budhiraja ◽

Abhinav Kumar Singh ◽

Ishani Ghoshal

Keyword(s):

Feature Selection ◽

Particle Swarm Optimization ◽

Ensemble Learning ◽

Particle Swarm ◽

Defect Prediction ◽

Binary Particle Swarm Optimization ◽

Swarm Optimization ◽

Selection Approach ◽

Feature Selection Approach ◽

Heterogeneous Defect Prediction

Unsupervised Deep Domain Adaptation for Heterogeneous Defect Prediction

IEICE Transactions on Information and Systems ◽

10.1587/transinf.2018edp7289 ◽

2019 ◽

Vol E102.D (3) ◽

pp. 537-549 ◽

Cited By ~ 2

Author(s):

Lina GONG ◽

Shujuan JIANG ◽

Qiao YU ◽

Li JIANG

Keyword(s):

Domain Adaptation ◽

Defect Prediction ◽

Heterogeneous Defect Prediction

Semi-supervised Heterogeneous Defect Prediction with Open-source Projects on GitHub

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194021500273 ◽

2021 ◽

Vol 31 (06) ◽

pp. 889-916

Author(s):

Ying Sun ◽

Xiao-Yuan Jing ◽

Fei Wu ◽

Xiwei Dong ◽

Yanfei Sun ◽

...

Keyword(s):

Correlation Analysis ◽

Open Source ◽

Large Scale ◽

Real Life ◽

Defect Prediction ◽

Public Project ◽

Research Attention ◽

Target Company ◽

Public Datasets ◽

Heterogeneous Defect Prediction

The heterogeneous defect prediction (HDP) technique can predict defects in a target company using heterogeneous metric data from external company, which has received substantial research attention. However, existing HDP methods assume that source data is labeled but labeling data is expensive. Semi-supervised defect prediction technique can perform defect prediction with few labeled data. In this paper, we investigate a new problem — semi-supervised HDP (SHDP). To solve this problem, we propose a new approach named cost-sensitive kernel semi-supervised correlation analysis (CKSCA) as a solution of SHDP problem. It introduces unified metric representation and canonical correlation analysis to make the data distributions of different company projects more similar. CKSCA also designs a cost-sensitive kernel semi-supervised discriminant analysis mechanism to utilize the limited labeled data and sufficient real-life unlabeled data from different companies. Besides we collect lots of open-source projects from GitHub website to construct a new large-scale unlabeled dataset called GITHUB dataset. It contains 26,407 modules and is greater than each public project dataset. It has been public online and can be extended continuously. Experiments on the GITHUB dataset and other public datasets indicate that unlabeled GITHUB data can help prediction model improve prediction performance, and CKSCA is effective and efficient for solving SHDP problem.

Heterogeneous Defect Prediction Using Ensemble Learning Technique

Advances in Intelligent Systems and Computing - Artificial Intelligence and Evolutionary Computations in Engineering Systems ◽

10.1007/978-981-15-0199-9_25 ◽

2020 ◽

pp. 283-293

Author(s):

Arsalan Ahmed Ansari ◽

Amaan Iqbal ◽

Bibhudatta Sahoo

Keyword(s):

Ensemble Learning ◽

Defect Prediction ◽

Learning Technique ◽

Heterogeneous Defect Prediction

Do different cross‐project defect prediction methods identify the same defective modules?

Journal of Software Evolution and Process ◽

10.1002/smr.2234 ◽

2019 ◽

Vol 32 (5) ◽

Cited By ~ 1

Author(s):

Xiang Chen ◽

Yanzhou Mu ◽

Yubin Qu ◽

Chao Ni ◽

Meng Liu ◽

...

Keyword(s):

Defect Prediction ◽

Prediction Methods ◽

Cross Project

Kernel Spectral Embedding Transfer Ensemble for Heterogeneous Defect Prediction

IEEE Transactions on Software Engineering ◽

10.1109/tse.2019.2939303 ◽

2019 ◽

pp. 1-1 ◽

Cited By ~ 2

Author(s):

Haonan Tong ◽

Bin Liu ◽

Shihai Wang

Keyword(s):

Defect Prediction ◽

Spectral Embedding ◽

Heterogeneous Defect Prediction

IEEE Transactions on Software Engineering ◽

10.1109/tse.2017.2720603 ◽

2018 ◽

Vol 44 (9) ◽

pp. 874-896 ◽

Cited By ~ 48

Author(s):

Jaechang Nam ◽

Wei Fu ◽

Sunghun Kim ◽

Tim Menzies ◽

Lin Tan

Keyword(s):

Defect Prediction ◽

Heterogeneous Defect Prediction

Deep Learning Software Defect Prediction Methods for Cloud Environments Research

Scientific Programming ◽

10.1155/2021/2323100 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Wenjian Liu ◽

Baoping Wang ◽

Wennan Wang

Keyword(s):

Deep Learning ◽

Prediction Model ◽

Real Time ◽

Defect Prediction ◽

Prediction Methods ◽

Learning Approach ◽

Software Defect Prediction ◽

Imbalance Problem ◽

Software Defect ◽

Ladder Network

This paper provides an in-depth study and analysis of software defect prediction methods in a cloud environment and uses a deep learning approach to justify software prediction. A cost penalty term is added to the supervised part of the deep ladder network; that is, the misclassification cost of different classes is added to the model. A cost-sensitive deep ladder network-based software defect prediction model is proposed, which effectively mitigates the negative impact of the class imbalance problem on defect prediction. To address the problem of lack or insufficiency of historical data from the same project, a flow learning-based geodesic cross-project software defect prediction method is proposed. Drawing on data information from other projects, a migration learning approach was used to embed the source and target datasets into a Gaussian manifold. The kernel encapsulates the incremental changes between the differences and commonalities between the two domains. To this point, the subspace is the space of two distributional approximations formed by the source and target data transformations, with traditional in-project software defect classifiers used to predict labels. It is found that real-time defect prediction is more practical because it has a smaller amount of code to review; only individual changes need to be reviewed rather than entire files or packages while making it easier for developers to assign fixes to defects. More importantly, this paper combines deep belief network techniques with real-time defect prediction at a fine-grained level and TCA techniques to deal with data imbalance and proposes an improved deep belief network approach for real-time defect prediction, while trying to change the machine learning classifier underlying DBN for different experimental studies, and the results not only validate the effectiveness of using TCA techniques to solve the data imbalance problem but also show that the defect prediction model learned by the improved method in this paper has better prediction performance.

MVSE: Effort-Aware Heterogeneous Defect Prediction via Multiple-View Spectral Embedding

2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS) ◽

10.1109/qrs.2019.00015 ◽

2019 ◽

Author(s):

Zhou Xu ◽

Sizhe Ye ◽

Tao Zhang ◽

Zhen Xia ◽

Shuai Pang ◽

...

Keyword(s):

Defect Prediction ◽

Multiple View ◽

Spectral Embedding ◽

Heterogeneous Defect Prediction

Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction

Automated Software Engineering ◽

10.1007/s10515-017-0220-7 ◽

2017 ◽

Vol 25 (2) ◽

pp. 201-245 ◽

Cited By ~ 23

Author(s):

Zhiqiang Li ◽

Xiao-Yuan Jing ◽

Fei Wu ◽

Xiaoke Zhu ◽

Baowen Xu ◽

...

Keyword(s):

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Defect Prediction ◽

Kernel Canonical Correlation Analysis ◽

Heterogeneous Defect Prediction