Using Clustering Techniques Goes with Genetic Algorithm to Improve Software Defect Prediction

Author(s):  
F.E. Usman-Hamza ◽  
A.F. Atte ◽  
A.O. Balogun ◽  
H.A. Mojeed ◽  
A.O. Bajeh ◽  
...  

Software testing using software defect prediction aims to detect as many defects as possible in software before the software release. This plays an important role in ensuring quality and reliability. Software defect prediction can be modeled as a classification problem that classifies software modules into two classes: defective and non-defective; and classification algorithms are used for this process. This study investigated the impact of feature selection methods on classification via clustering techniques for software defect prediction. Three clustering techniques were selected; Farthest First Clusterer, K-Means and Make-Density Clusterer, and three feature selection methods: Chi-Square, Clustering Variation, and Information Gain were used on software defect datasets from NASA repository. The best software defect prediction model was farthest-first using information gain feature selection method with an accuracy of 78.69%, precision value of 0.804 and recall value of 0.788. The experimental results showed that the use of clustering techniques as a classifier gave a good predictive performance and feature selection methods further enhanced their performance. This indicates that classification via clustering techniques can give competitive results against standard classification methods with the advantage of not having to train any model using labeled dataset; as it can be used on the unlabeled datasets.Keywords: Classification, Clustering, Feature Selection, Software Defect PredictionVol. 26, No 1, June, 2019


Author(s):  
Waheeda Almayyan

The purpose of software defect prediction is to improve the quality of a software project by building a predictive model to decide whether a software module is or is not fault prone. In recent years, much research in using machine learning techniques in this topic has been performed. Our aim was to evaluate the performance of clustering techniques with feature selection schemes to address the problem of software defect prediction problem. We analysed the National Aeronautics and Space Administration (NASA) dataset benchmarks using three clustering algorithms: (1) Farthest First, (2) X-Means, and (3) selforganizing map (SOM). In order to evaluate different feature selection algorithms, this article presents a comparative analysis involving software defects prediction based on Bat, Cuckoo, Grey Wolf Optimizer (GWO), and particle swarm optimizer (PSO). The results obtained with the proposed clustering models enabled us to build an efficient predictive model with a satisfactory detection rate and acceptable number of features.


Sign in / Sign up

Export Citation Format

Share Document