A Comparison of Semi-Supervised Classification Approaches for Software Defect Prediction

2014 ◽  
Vol 23 (1) ◽  
pp. 75-82 ◽  
Author(s):  
Cagatay Catal

AbstractPredicting the defect-prone modules when the previous defect labels of modules are limited is a challenging problem encountered in the software industry. Supervised classification approaches cannot build high-performance prediction models with few defect data, leading to the need for new methods, techniques, and tools. One solution is to combine labeled data points with unlabeled data points during learning phase. Semi-supervised classification methods use not only labeled data points but also unlabeled ones to improve the generalization capability. In this study, we evaluated four semi-supervised classification methods for semi-supervised defect prediction. Low-density separation (LDS), support vector machine (SVM), expectation-maximization (EM-SEMI), and class mass normalization (CMN) methods have been investigated on NASA data sets, which are CM1, KC1, KC2, and PC1. Experimental results showed that SVM and LDS algorithms outperform CMN and EM-SEMI algorithms. In addition, LDS algorithm performs much better than SVM when the data set is large. In this study, the LDS-based prediction approach is suggested for software defect prediction when there are limited fault data.

Author(s):  
Liqiong Chen ◽  
Shilong Song ◽  
Can Wang

Just-in-time software defect prediction (JIT-SDP) is a fine-grained software defect prediction technology, which aims to identify the defective code changes in software systems. Effort-aware software defect prediction is a software defect prediction technology that takes into consideration the cost of code inspection, which can find more defective code changes in limited test resources. The traditional effort-aware defect prediction model mainly measures the effort based on the number of lines of code (LOC) and rarely considers additional factors. This paper proposes a novel effort measure method called Multi-Metric Joint Calculation (MMJC). When measuring the effort, MMJC takes into account not only LOC, but also the distribution of modified code across different files (Entropy), the number of developers that changed the files (NDEV) and the developer experience (EXP). In the simulation experiment, MMJC is combined with Linear Regression, Decision Tree, Random Forest, LightGBM, Support Vector Machine and Neural Network, respectively, to build the software defect prediction model. Several comparative experiments are conducted between the models based on MMJC and baseline models. The results show that indicators ACC and [Formula: see text] of the models based on MMJC are improved by 35.3% and 15.9% on average in the three verification scenarios, respectively, compared with the baseline models.


2020 ◽  
pp. 1577-1597
Author(s):  
Mohammed Akour ◽  
Wasen Yahya Melhem

This article describes how classification methods on software defect prediction is widely researched due to the need to increase the software quality and decrease testing efforts. However, findings of past researches done on this issue has not shown any classifier which proves to be superior to the other. Additionally, there is a lack of research that studies the effects and accuracy of genetic programming on software defect prediction. To find solutions for this problem, a comparative software defect prediction experiment between genetic programming and neural networks are performed on four datasets from the NASA Metrics Data repository. Generally, an interesting degree of accuracy is detected, which shows how the metric-based classification is useful. Nevertheless, this article specifies that the application and usage of genetic programming is highly recommended due to the detailed analysis it provides, as well as an important feature in this classification method which allows the viewing of each attributes impact in the dataset.


2021 ◽  
Author(s):  
Anjali Bansal

As we all know a lot of research has been done in the field of software defect prediction but most of them uses static code metrics as their independent variable. In this paper the main objective is to analyze the effect of process metrics on prediction performance using various classification and ensemble techniques. Also in this i have used both AUC and MCC measure to analyze the results. We can conclude that process metrics are as effective as static code metrics.


Author(s):  
Yan Wang ◽  

In order to solve the problem of low efficiency in software operation, we need to research the defect prediction of monitoring configuration software. The current method has the low efficiency in the defect prediction of software. Therefore, this paper proposed the software defect prediction method based on genetic optimization support vector machines. This method carried out feature selection for the measure of complexity of software, and built software defect prediction model of genetic optimized support vector machine, and completed the research on the efficient prediction method of software defects. Experimental results show that the proposed method improves the quality of software effectively.


2017 ◽  
Vol 26 (02) ◽  
pp. 1750001 ◽  
Author(s):  
Stamatis Karlos ◽  
Nikos Fazakis ◽  
Sotiris Kotsiantis ◽  
Kyriakos Sgarbas

The most important characteristic of semi-supervised learning methods is the combination of available unlabeled data along with an enough smaller set of labeled examples, so as to increase the learning accuracy compared with the default procedure of supervised methods, which on the other hand use only the labeled data during the training phase. In this work, we have implemented a hybrid Self-trained system that combines a Support Vector Machine, a Decision Tree, a Lazy Learner and a Bayesian algorithm using a Stacking variant methodology. We performed an in depth comparison with other well-known Semi-Supervised classification methods on standard benchmark datasets and we finally reached to the point that the presented technique had better accuracy in most cases.


Sign in / Sign up

Export Citation Format

Share Document