An Empirical Study of Heterogeneous Cross-Project Defect Prediction Using Various Statistical Techniques
Cross-project defect prediction (CPDP) forecasts flaws in a target project through defect prediction models (DPM) trained by defect data of another project. However, CPDP has a prevalent problem (i.e., distinct projects must have identical features to describe themselves). This article emphasizes on heterogeneous CPDP (HCPDP) modeling that does not require same metric set between two applications and builds DPM based on metrics showing comparable distribution in their values for a given pair of datasets. This paper evaluates empirically and theoretically HCPDP modeling, which comprises of three main phases: feature ranking and feature selection, metric matching, and finally, predicting defects in the target application. The research work has been experimented on 13 benchmarked datasets of three open source projects. Results show that performance of HCPDP is very much comparable to baseline within project defect prediction (WPDP) and XG boosting classification model gives best results when used in conjunction with Kendall's method of correlation as compared to other set of classifiers.