Impact of imbalanced data on the performance of software defect prediction classifiers

Software defect prediction technology has been widely used in improving the quality of software system. Most real software defect datasets tend to have fewer defective modules than defective-free modules. Highly class-imbalanced data typically make accurate predictions difficult. The imbalanced nature of software defect datasets makes the prediction model classifying a defective module as a defective-free one easily. As there exists the similarity during the different software modules, one module can be represented by the sparse representation coefficients over the pre-defined dictionary which consists of historical software defect datasets. In this study, we make use of dictionary learning method to predict software defect. We optimize the classifier parameters and the dictionary atoms iteratively, to ensure that the extracted features (sparse representation) are optimal for the trained classifier. We prove the optimal condition of the elastic net which is used to solve the sparse coding coefficients and the regularity of the elastic net solution. Due to the reason that the misclassification of defective modules generally incurs much higher cost risk than the misclassification of defective-free ones, we take the different misclassification costs into account, increasing the punishment on misclassification defective modules in the procedure of dictionary learning, making the classification inclining to classify a module as a defective one. Thus, we propose a cost-sensitive software defect prediction method using dictionary learning (CSDL). Experimental results on the 10 class-imbalance datasets of NASA show that our method is more effective than several typical state-of-the-art defect prediction methods.

Download Full-text

Solving the Imbalanced Class Problem in Software Defect Prediction Using GANS

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a2165.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 8683-8687

Keyword(s):

Data Distribution ◽

Imbalanced Data ◽

Local Information ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Software Defect Prediction ◽

Minority Class ◽

Software Defect ◽

Learning Techniques ◽

Conventional Machine

Prediction of software defects is a highly researched and important domain for cost - saving advantage in software development. Different methods of classification using attributes of static code were used to predict defects in software.However, the defective instances count is very minimal compared to the count of non - defective instances and this leads to imbalanced data, where the ratio of data class is not equal. For such data, conventional machine learning techniques give poor results.While there are different strategies to address this issue, normal oversampling methods are different versions of the SMOTE algorithm, These approaches are based on local information,instead of the complete distribution of minority class.GANs is used to approximate the true data distribution of minority class data used for software defect prediction.

Download Full-text

An Intelligent Fusion Algorithm and Its Application Based on Subgroup Migration and Adaptive Boosting

Symmetry ◽

10.3390/sym13040569 ◽

2021 ◽

Vol 13 (4) ◽

pp. 569

Author(s):

Timing Li ◽

Lei Yang ◽

Kewen Li ◽

Jiannan Zhai

Keyword(s):

Imbalanced Data ◽

Defect Prediction ◽

Mining Machine ◽

Software Defect Prediction ◽

Fusion Algorithm ◽

Adaptive Boosting ◽

Software Defect ◽

Data Application ◽

Industrial Big Data ◽

Big Data Application

Imbalanced data and feature redundancies are common problems in many fields, especially in software defect prediction, data mining, machine learning, and industrial big data application. To resolve these problems, we propose an intelligent fusion algorithm, SMPSO-HS-AdaBoost, which combines particle swarm optimization based on subgroup migration and adaptive boosting based on hybrid-sampling. In this paper, we apply the proposed intelligent fusion algorithm to software defect prediction to improve the prediction efficiency and accuracy by solving the issues caused by imbalanced data and feature redundancies. The results show that the proposed algorithm resolves the coexisting problems of imbalanced data and feature redundancies, and ensures the efficiency and accuracy of software defect prediction.

Download Full-text

Evaluation of Sampling-Based Ensembles of Classifiers on Imbalanced Data for Software Defect Prediction Problems

SN Computer Science ◽

10.1007/s42979-020-0119-4 ◽

2020 ◽

Vol 1 (2) ◽

Cited By ~ 1

Author(s):

Thanh Tung Khuat ◽

My Hanh Le

Keyword(s):

Imbalanced Data ◽

Defect Prediction ◽

Software Defect Prediction ◽

Ensembles Of Classifiers ◽

Software Defect ◽

Prediction Problems

Download Full-text

Impact of imbalanced data on the performance of software defect prediction classifiers

Handling Imbalanced Data using Ensemble Learning in Software Defect Prediction

Applying Weighted Particle Swarm Optimization to Imbalanced Data in Software Defect Prediction

An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data

Imbalanced Data Processing Model for Software Defect Prediction

Attribute Selection and Imbalanced Data: Problems in Software Defect Prediction

Feature Selection with Imbalanced Data for Software Defect Prediction

Software Defect Prediction Based on Cost-Sensitive Dictionary Learning

Solving the Imbalanced Class Problem in Software Defect Prediction Using GANS

An Intelligent Fusion Algorithm and Its Application Based on Subgroup Migration and Adaptive Boosting

Evaluation of Sampling-Based Ensembles of Classifiers on Imbalanced Data for Software Defect Prediction Problems

Export Citation Format