scholarly journals A cross‐project defect prediction method based on multi‐adaptation and nuclear norm

IET Software ◽  
2021 ◽  
Author(s):  
Qingan Huang ◽  
Le Ma ◽  
Siyu Jiang ◽  
Guobin Wu ◽  
Hengjie Song ◽  
...  
2021 ◽  
Vol 13 (8) ◽  
pp. 216
Author(s):  
Yu Zhao ◽  
Yi Zhu ◽  
Qiao Yu ◽  
Xiaoying Chen

Traditional research methods in software defect prediction use part of the data in the same project to train the defect prediction model and predict the defect label of the remaining part of the data. However, in the practical realm of software development, the software project that needs to be predicted is generally a brand new software project, and there is not enough labeled data to build a defect prediction model; therefore, traditional methods are no longer applicable. Cross-project defect prediction uses the labeled data of the same type of project similar to the target project to build the defect prediction model, so as to solve the problem of data loss in traditional methods. However, the difference in data distribution between the same type of project and the target project reduces the performance of defect prediction. To solve this problem, this paper proposes a cross-project defect prediction method based on manifold feature transformation. This method transforms the original feature space of the project into a manifold space, then reduces the difference in data distribution of the transformed source project and the transformed target project in the manifold space, and finally uses the transformed source project to train a naive Bayes prediction model with better performance. A comparative experiment was carried out using the Relink dataset and the AEEEM dataset. The experimental results show that compared with the benchmark method and several cross-project defect prediction methods, the proposed method effectively reduces the difference in data distribution between the source project and the target project, and obtains a higher F1 value, which is an indicator commonly used to measure the performance of the two-class model.


Algorithms ◽  
2019 ◽  
Vol 12 (1) ◽  
pp. 13
Author(s):  
Shengbing Ren ◽  
Wanying Zhang ◽  
Hafiz Shahbaz Munir ◽  
Lei Xia

Software defect prediction is an important means to guarantee software quality. Because there are no sufficient historical data within a project to train the classifier, cross-project defect prediction (CPDP) has been recognized as a fundamental approach. However, traditional defect prediction methods use feature attributes to represent samples, which cannot avoid negative transferring, may result in poor performance model in CPDP. This paper proposes a multi-source cross-project defect prediction method based on dissimilarity space (DM-CPDP). This method not only retains the original information, but also obtains the relationship with other objects. So it can enhances the discriminant ability of the sample attributes to the class label. This method firstly uses the density-based clustering method to construct the prototype set with the cluster center of samples in the target set. Then, the arc-cosine kernel is used to calculate the sample dissimilarities between the prototype set and the source domain or the target set to form the dissimilarity space. In this space, the training set is obtained with the earth mover’s distance (EMD) method. For the unlabeled samples converted from the target set, the k-Nearest Neighbor (KNN) algorithm is used to label those samples. Finally, the model is learned from training data based on TrAdaBoost method and used to predict new potential defects. The experimental results show that this approach has better performance than other traditional CPDP methods.


Author(s):  
Shengbing Ren ◽  
Wanying Zhang ◽  
Hafiz Shahbaz Munir ◽  
Lei Xia

Software defect prediction is an important means to guarantee software quality. Because there are no sufficient historical data within a project to train the classifier, cross-project defect prediction (CPDP) has been recognized as a fundamental approach.  However, traditional defect prediction methods using feature attributes to represent samples, which can not avoid negative transferring, may result in poor performance model in CPDP. This paper proposes a multi-source cross-project defect prediction method based on dissimilarity space ( DM-CPDP). This method first uses the density-based clustering method to construct the prototype set with the cluster center of samples in the target set. Then, the arc-cosine kernel is used to form the dissimilarity space, and in this space the training set is obtained with the earth mover’s distance (EMD) method. For the unlabeled samples converted from the target set, the KNN algorithm is used to label those samples. Finally, we use TrAdaBoost method to establish the prediction model.  The experimental results show that our approach has better performance than other traditional CPDP methods.


2020 ◽  
Author(s):  
Sonali Srivastava ◽  
Shikha Rani ◽  
Shailly Singh ◽  
Saurabh Singh ◽  
Rohit Vashisht

2021 ◽  
Author(s):  
Bruno Sotto-Mayor ◽  
Meir Kalech

Sign in / Sign up

Export Citation Format

Share Document