Evaluating the Impact of Sampling-Based Nonlinear Manifold Detection Model on Software Defect Prediction Problem

The impact of the distance metric and measure on SMOTE-based techniques in software defect prediction

Information and Software Technology ◽

10.1016/j.infsof.2021.106742 ◽

2021 ◽

pp. 106742

Author(s):

Shuo Feng ◽

Jacky Keung ◽

Peichang Zhang ◽

Yan Xiao ◽

Miao Zhang

Keyword(s):

Defect Prediction ◽

Software Defect Prediction ◽

Distance Metric ◽

Software Defect ◽

The Impact

Download Full-text

Revisiting the Impact of Dependency Network Metrics on Software Defect Prediction

IEEE Transactions on Software Engineering ◽

10.1109/tse.2021.3131950 ◽

2021 ◽

pp. 1-1

Author(s):

Lina Gong ◽

Gopi Krishnan Krishnan Rajbahadur ◽

Ahmed E. Hassan ◽

S. Jiang

Keyword(s):

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Dependency Network ◽

Network Metrics ◽

The Impact

Download Full-text

What is the Impact of Imbalance on Software Defect Prediction Performance?

Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering - PROMISE '15 ◽

10.1145/2810146.2810150 ◽

2015 ◽

Cited By ~ 13

Author(s):

Zaheed Mahmood ◽

David Bowes ◽

Peter C. R. Lane ◽

Tracy Hall

Keyword(s):

Prediction Performance ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

The Impact

Download Full-text

Software Defect Prediction Based on Fuzzy Weighted Extreme Learning Machine with Relative Density Information

Scientific Programming ◽

10.1155/2020/8852705 ◽

2020 ◽

Vol 2020 ◽

pp. 1-18

Author(s):

Shang Zheng ◽

Jinjing Gai ◽

Hualong Yu ◽

Haitao Zou ◽

Shang Gao

Keyword(s):

Relative Density ◽

Extreme Learning Machine ◽

Data Distribution ◽

Defect Prediction ◽

Classification Error ◽

Software Defect Prediction ◽

Software Defect ◽

Weighted Extreme Learning Machine ◽

Learning Machine ◽

The Impact

To identify software modules that are more likely to be defective, machine learning has been used to construct software defect prediction (SDP) models. However, several previous works have found that the imbalanced nature of software defective data can decrease the model performance. In this paper, we discussed the issue of how to improve imbalanced data distribution in the context of SDP, which can benefit software defect prediction with the aim of finding better methods. Firstly, a relative density was introduced to reflect the significance of each instance within its class, which is irrelevant to the scale of data distribution in feature space; hence, it can be more robust than the absolute distance information. Secondly, a K-nearest-neighbors-based probability density estimation (KNN-PDE) alike strategy was utilised to calculate the relative density of each training instance. Furthermore, the fuzzy memberships of sample were designed based on relative density in order to eliminate classification error coming from noise and outlier samples. Finally, two algorithms were proposed to train software defect prediction models based on the weighted extreme learning machine. This paper compared the proposed algorithms with traditional SDP methods on the benchmark data sets. It was proved that the proposed methods have much better overall performance in terms of the measures including G-mean, AUC, and Balance. The proposed algorithms are more robust and adaptive for SDP data distribution types and can more accurately estimate the significance of each instance and assign the identical total fuzzy coefficients for two different classes without considering the impact of data scale.

Download Full-text

Impact of feature selection on classification via clustering techniques in software defect prediction

Journal of Computer Science and Its Application ◽

10.4314/jcsia.v26i1.8 ◽

2020 ◽

Vol 26 (1) ◽

Cited By ~ 1

Author(s):

F.E. Usman-Hamza ◽

A.F. Atte ◽

A.O. Balogun ◽

H.A. Mojeed ◽

A.O. Bajeh ◽

...

Keyword(s):

Feature Selection ◽

Information Gain ◽

Feature Selection Method ◽

Predictive Performance ◽

Defect Prediction ◽

Software Defect Prediction ◽

Selection Methods ◽

Clustering Techniques ◽

Software Defect ◽

The Impact

Software testing using software defect prediction aims to detect as many defects as possible in software before the software release. This plays an important role in ensuring quality and reliability. Software defect prediction can be modeled as a classification problem that classifies software modules into two classes: defective and non-defective; and classification algorithms are used for this process. This study investigated the impact of feature selection methods on classification via clustering techniques for software defect prediction. Three clustering techniques were selected; Farthest First Clusterer, K-Means and Make-Density Clusterer, and three feature selection methods: Chi-Square, Clustering Variation, and Information Gain were used on software defect datasets from NASA repository. The best software defect prediction model was farthest-first using information gain feature selection method with an accuracy of 78.69%, precision value of 0.804 and recall value of 0.788. The experimental results showed that the use of clustering techniques as a classifier gave a good predictive performance and feature selection methods further enhanced their performance. This indicates that classification via clustering techniques can give competitive results against standard classification methods with the advantage of not having to train any model using labeled dataset; as it can be used on the unlabeled datasets.Keywords: Classification, Clustering, Feature Selection, Software Defect PredictionVol. 26, No 1, June, 2019

Download Full-text

Hybrid algorithm for two-objective software defect prediction problem

International Journal of Innovative Computing and Applications ◽

10.1504/ijica.2017.10009205 ◽

2017 ◽

Vol 8 (4) ◽

pp. 207

Author(s):

Xiaotao Rong ◽

Zhihua Cui

Keyword(s):

Hybrid Algorithm ◽

Defect Prediction ◽

Software Defect Prediction ◽

Prediction Problem ◽

Software Defect

Download Full-text

Hybrid algorithm for two-objective software defect prediction problem

International Journal of Innovative Computing and Applications ◽

10.1504/ijica.2017.088162 ◽

2017 ◽

Vol 8 (4) ◽

pp. 207 ◽

Cited By ~ 2

Author(s):

Xiaotao Rong ◽

Zhihua Cui

Keyword(s):

Hybrid Algorithm ◽

Defect Prediction ◽

Software Defect Prediction ◽

Prediction Problem ◽

Software Defect

Download Full-text

Empirical Evaluation of the Impact of Class Overlap on Software Defect Prediction

2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE) ◽

10.1109/ase.2019.00071 ◽

2019 ◽

Cited By ~ 1

Author(s):

Lina Gong ◽

Shujuan Jiang ◽

Rongcun Wang ◽

Li Jiang

Keyword(s):

Empirical Evaluation ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

The Impact

Download Full-text

Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study

Symmetry ◽

10.3390/sym12071147 ◽

2020 ◽

Vol 12 (7) ◽

pp. 1147 ◽

Cited By ~ 2

Author(s):

Abdullateef O. Balogun ◽

Shuib Basri ◽

Saipunidzam Mahamad ◽

Said J. Abdulkadir ◽

Malek A. Almomani ◽

...

Keyword(s):

Feature Selection ◽

Empirical Study ◽

Prediction Models ◽

Empirical Studies ◽

Experimental Results ◽

Defect Prediction ◽

Software Defect Prediction ◽

Search Methods ◽

Software Defect ◽

The Impact

Feature selection (FS) is a feasible solution for mitigating high dimensionality problem, and many FS methods have been proposed in the context of software defect prediction (SDP). Moreover, many empirical studies on the impact and effectiveness of FS methods on SDP models often lead to contradictory experimental results and inconsistent findings. These contradictions can be attributed to relative study limitations such as small datasets, limited FS search methods, and unsuitable prediction models in the respective scope of studies. It is hence critical to conduct an extensive empirical study to address these contradictions to guide researchers and buttress the scientific tenacity of experimental conclusions. In this study, we investigated the impact of 46 FS methods using Naïve Bayes and Decision Tree classifiers over 25 software defect datasets from 4 software repositories (NASA, PROMISE, ReLink, and AEEEM). The ensuing prediction models were evaluated based on accuracy and AUC values. Scott–KnottESD and the novel Double Scott–KnottESD rank statistical methods were used for statistical ranking of the studied FS methods. The experimental results showed that there is no one best FS method as their respective performances depends on the choice of classifiers, performance evaluation metrics, and dataset. However, we recommend the use of statistical-based, probability-based, and classifier-based filter feature ranking (FFR) methods, respectively, in SDP. For filter subset selection (FSS) methods, correlation-based feature selection (CFS) with metaheuristic search methods is recommended. For wrapper feature selection (WFS) methods, the IWSS-based WFS method is recommended as it outperforms the conventional SFS and LHS-based WFS methods.

Download Full-text

A feature dependent Naive Bayes approach and its application to the software defect prediction problem

Applied Soft Computing ◽

10.1016/j.asoc.2017.05.043 ◽

2017 ◽

Vol 59 ◽

pp. 197-209 ◽

Cited By ~ 39

Author(s):

Ömer Faruk Arar ◽

Kürşat Ayan

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Defect Prediction ◽

Software Defect Prediction ◽

Prediction Problem ◽

Software Defect ◽

Bayes Approach

Download Full-text