A Comparison of Semi-Supervised Classification Approaches for Software Defect Prediction

AbstractPredicting the defect-prone modules when the previous defect labels of modules are limited is a challenging problem encountered in the software industry. Supervised classification approaches cannot build high-performance prediction models with few defect data, leading to the need for new methods, techniques, and tools. One solution is to combine labeled data points with unlabeled data points during learning phase. Semi-supervised classification methods use not only labeled data points but also unlabeled ones to improve the generalization capability. In this study, we evaluated four semi-supervised classification methods for semi-supervised defect prediction. Low-density separation (LDS), support vector machine (SVM), expectation-maximization (EM-SEMI), and class mass normalization (CMN) methods have been investigated on NASA data sets, which are CM1, KC1, KC2, and PC1. Experimental results showed that SVM and LDS algorithms outperform CMN and EM-SEMI algorithms. In addition, LDS algorithm performs much better than SVM when the data set is large. In this study, the LDS-based prediction approach is suggested for software defect prediction when there are limited fault data.

Download Full-text

Comparison of Tree Based Supervised Classification Methods with Mammogram Data Set

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i4.504506 ◽

2019 ◽

Vol 7 (4) ◽

pp. 504-506

Author(s):

M. Vasantha

Keyword(s):

Supervised Classification ◽

Classification Methods ◽

Data Set ◽

Supervised Classification Methods

Download Full-text

Support Vector based Oversampling Technique for Handling Class Imbalance in Software Defect Prediction

2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence) ◽

10.1109/confluence51648.2021.9377068 ◽

2021 ◽

Author(s):

Ruchika Malhotra ◽

Vaibhav Agrawal ◽

Vedansh Pal ◽

Tushar Agarwal

Keyword(s):

Class Imbalance ◽

Defect Prediction ◽

Support Vector ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

A Novel Effort Measure Method for Effort-Aware Just-in-Time Software Defect Prediction

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194021500364 ◽

2021 ◽

Vol 31 (08) ◽

pp. 1145-1169

Author(s):

Liqiong Chen ◽

Shilong Song ◽

Can Wang

Keyword(s):

Prediction Model ◽

Defect Prediction ◽

Software Systems ◽

Support Vector ◽

Just In Time ◽

Software Defect Prediction ◽

Fine Grained ◽

Software Defect ◽

Code Changes ◽

The Cost

Just-in-time software defect prediction (JIT-SDP) is a fine-grained software defect prediction technology, which aims to identify the defective code changes in software systems. Effort-aware software defect prediction is a software defect prediction technology that takes into consideration the cost of code inspection, which can find more defective code changes in limited test resources. The traditional effort-aware defect prediction model mainly measures the effort based on the number of lines of code (LOC) and rarely considers additional factors. This paper proposes a novel effort measure method called Multi-Metric Joint Calculation (MMJC). When measuring the effort, MMJC takes into account not only LOC, but also the distribution of modified code across different files (Entropy), the number of developers that changed the files (NDEV) and the developer experience (EXP). In the simulation experiment, MMJC is combined with Linear Regression, Decision Tree, Random Forest, LightGBM, Support Vector Machine and Neural Network, respectively, to build the software defect prediction model. Several comparative experiments are conducted between the models based on MMJC and baseline models. The results show that indicators ACC and [Formula: see text] of the models based on MMJC are improved by 35.3% and 15.9% on average in the three verification scenarios, respectively, compared with the baseline models.

Download Full-text

Software Defect Prediction Using Genetic Programming and Neural Networks

Deep Learning and Neural Networks ◽

10.4018/978-1-7998-0414-7.ch088 ◽

2020 ◽

pp. 1577-1597

Author(s):

Mohammed Akour ◽

Wasen Yahya Melhem

Keyword(s):

Neural Networks ◽

Genetic Programming ◽

Detailed Analysis ◽

Software Quality ◽

The Other ◽

Defect Prediction ◽

Data Repository ◽

Software Defect Prediction ◽

Classification Methods ◽

Software Defect

This article describes how classification methods on software defect prediction is widely researched due to the need to increase the software quality and decrease testing efforts. However, findings of past researches done on this issue has not shown any classifier which proves to be superior to the other. Additionally, there is a lack of research that studies the effects and accuracy of genetic programming on software defect prediction. To find solutions for this problem, a comparative software defect prediction experiment between genetic programming and neural networks are performed on four datasets from the NASA Metrics Data repository. Generally, an interesting degree of accuracy is detected, which shows how the metric-based classification is useful. Nevertheless, this article specifies that the application and usage of genetic programming is highly recommended due to the detailed analysis it provides, as well as an important feature in this classification method which allows the viewing of each attributes impact in the dataset.

Download Full-text

Using the Support Vector Machine as a Classification Method for Software Defect Prediction with Static Code Metrics

Engineering Applications of Neural Networks - Communications in Computer and Information Science ◽

10.1007/978-3-642-03969-0_21 ◽

2009 ◽

pp. 223-234 ◽

Cited By ~ 33

Author(s):

David Gray ◽

David Bowes ◽

Neil Davey ◽

Yi Sun ◽

Bruce Christianson

Keyword(s):

Support Vector Machine ◽

Classification Method ◽

Defect Prediction ◽

Support Vector ◽

Software Defect Prediction ◽

Software Defect ◽

Code Metrics

Download Full-text

Software Defect Prediction Using Dynamic Support Vector Machine

2013 Ninth International Conference on Computational Intelligence and Security ◽

10.1109/cis.2013.61 ◽

2013 ◽

Cited By ~ 6

Author(s):

Bo Shuai ◽

Haifeng Li ◽

Mengjun Li ◽

Quan Zhang ◽

Chaojing Tang

Keyword(s):

Support Vector Machine ◽

Defect Prediction ◽

Support Vector ◽

Software Defect Prediction ◽

Software Defect ◽

Dynamic Support

Download Full-text

COMPARATIVE ANALYSIS OF CLASSIFICATION METHODS FOR PREDICTION SOFTWARE FAULT PRONENESS USING PROCESS METRICS

10.36227/techrxiv.16586354.v1 ◽

2021 ◽

Author(s):

Anjali Bansal

Keyword(s):

Prediction Performance ◽

Defect Prediction ◽

Software Defect Prediction ◽

Classification Methods ◽

Software Defect ◽

Ensemble Techniques ◽

Code Metrics ◽

Independent Variable ◽

Software Fault ◽

Process Metrics

As we all know a lot of research has been done in the field of software defect prediction but most of them uses static code metrics as their independent variable. In this paper the main objective is to analyze the effect of process metrics on prediction performance using various classification and ensemble techniques. Also in this i have used both AUC and MCC measure to analyze the results. We can conclude that process metrics are as effective as static code metrics.

Download Full-text

Effective software defect prediction using support vector machines (SVMs)

International Journal of Systems Assurance Engineering and Management ◽

10.1007/s13198-021-01326-1 ◽

2021 ◽

Author(s):

Somya Goyal

Keyword(s):

Support Vector Machines ◽

Defect Prediction ◽

Support Vector ◽

Software Defect Prediction ◽

Software Defect ◽

Vector Machines

Download Full-text

Efficient Prediction Method of Defect of Monitor Configuration Software

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2019.p0340 ◽

2019 ◽

Vol 23 (2) ◽

pp. 340-344

Author(s):

Yan Wang ◽

Keyword(s):

Prediction Method ◽

Current Method ◽

Configuration Software ◽

Defect Prediction ◽

Support Vector ◽

Software Defect Prediction ◽

Software Defect ◽

Vector Machines ◽

Efficient Prediction ◽

Low Efficiency

In order to solve the problem of low efficiency in software operation, we need to research the defect prediction of monitoring configuration software. The current method has the low efficiency in the defect prediction of software. Therefore, this paper proposed the software defect prediction method based on genetic optimization support vector machines. This method carried out feature selection for the measure of complexity of software, and built software defect prediction model of genetic optimized support vector machine, and completed the research on the efficient prediction method of software defects. Experimental results show that the proposed method improves the quality of software effectively.

Download Full-text

Self-Trained Stacking Model for Semi-Supervised Learning

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213017500014 ◽

2017 ◽

Vol 26 (02) ◽

pp. 1750001 ◽

Cited By ~ 4

Author(s):

Stamatis Karlos ◽

Nikos Fazakis ◽

Sotiris Kotsiantis ◽

Kyriakos Sgarbas

Keyword(s):

Supervised Learning ◽

Supervised Classification ◽

The Other ◽

Support Vector ◽

Training Phase ◽

Classification Methods ◽

Bayesian Algorithm ◽

Benchmark Datasets ◽

Supervised Methods ◽

Supervised Classification Methods

The most important characteristic of semi-supervised learning methods is the combination of available unlabeled data along with an enough smaller set of labeled examples, so as to increase the learning accuracy compared with the default procedure of supervised methods, which on the other hand use only the labeled data during the training phase. In this work, we have implemented a hybrid Self-trained system that combines a Support Vector Machine, a Decision Tree, a Lazy Learner and a Bayesian algorithm using a Stacking variant methodology. We performed an in depth comparison with other well-known Semi-Supervised classification methods on standard benchmark datasets and we finally reached to the point that the presented technique had better accuracy in most cases.

Download Full-text