scholarly journals Software defect prediction based on stacked sparse denoising autoencoders and enhanced extreme learning machine

IET Software ◽  
2021 ◽  
Author(s):  
Nana Zhang ◽  
Shi Ying ◽  
Kun Zhu ◽  
Dandan Zhu
2021 ◽  
Author(s):  
Yu Tang ◽  
Qi Dai ◽  
Mengyuan Yang ◽  
Lifang Chen

Abstract For the traditional ensemble learning algorithm of software defect prediction, the base predictor exists the problem that too many parameters are difficult to optimize, resulting in the optimized performance of the model unable to be obtained. An ensemble learning algorithm for software defect prediction that is proposed by using the improved sparrow search algorithm to optimize the extreme learning machine, which divided into three parts. Firstly, the improved sparrow search algorithm (ISSA) is proposed to improve the optimization ability and convergence speed, and the performance of the improved sparrow search algorithm is tested by using eight benchmark test functions. Secondly, ISSA is used to optimize extreme learning machine (ISSA-ELM) to improve the prediction ability. Finally, the optimized ensemble learning algorithm (ISSA-ELM-Bagging) is presented in the Bagging algorithm which improve the prediction performance of ELM in software defect datasets. Experiments are carried out in six groups of software defect datasets. The experimental results show that ISSA-ELM-Bagging ensemble learning algorithm is significantly better than the other four comparison algorithms under the six evaluation indexes of Precision, Recall, F-measure, MCC, Accuracy and G-mean, which has better stability and generalization ability.


2020 ◽  
Vol 2020 ◽  
pp. 1-18
Author(s):  
Shang Zheng ◽  
Jinjing Gai ◽  
Hualong Yu ◽  
Haitao Zou ◽  
Shang Gao

To identify software modules that are more likely to be defective, machine learning has been used to construct software defect prediction (SDP) models. However, several previous works have found that the imbalanced nature of software defective data can decrease the model performance. In this paper, we discussed the issue of how to improve imbalanced data distribution in the context of SDP, which can benefit software defect prediction with the aim of finding better methods. Firstly, a relative density was introduced to reflect the significance of each instance within its class, which is irrelevant to the scale of data distribution in feature space; hence, it can be more robust than the absolute distance information. Secondly, a K-nearest-neighbors-based probability density estimation (KNN-PDE) alike strategy was utilised to calculate the relative density of each training instance. Furthermore, the fuzzy memberships of sample were designed based on relative density in order to eliminate classification error coming from noise and outlier samples. Finally, two algorithms were proposed to train software defect prediction models based on the weighted extreme learning machine. This paper compared the proposed algorithms with traditional SDP methods on the benchmark data sets. It was proved that the proposed methods have much better overall performance in terms of the measures including G-mean, AUC, and Balance. The proposed algorithms are more robust and adaptive for SDP data distribution types and can more accurately estimate the significance of each instance and assign the identical total fuzzy coefficients for two different classes without considering the impact of data scale.


Author(s):  
Khadijah Khadijah ◽  
Priyo Sidik Sasongko

Software testing is a crucial process in software development life cycle which will affect the software quality. However, testing is a tedious task and resource consuming. Software testing can be conducted more efficiently by focusing this activitiy to software modules which is prone to defect. Therefore, an automated software defect prediction is needed. This research implemented Extreme Learning Machine (ELM) as classification algorithm because of its simplicity in training process and good generalization performance. Aside classification algorithm, the most important problem need to be addressed is imbalanced data between samples of positive class (prone to defect) and negative class. Such imbalance problem could bias the performance of classifier. Therefore, this research compared some approaches to handle imbalance problem between SMOTE (resampling method) and weighted-ELM (algorithm-level method).The results of experiment using 10-fold cross validation on NASA MDP dataset show that including imbalance problem handling in building software defect prediction model is able to increase the specificity and g-mean of model. When the value of imbalance ratio is not very small, the SMOTE is better than weighted-ELM. Otherwise, weighted-ELM is better than SMOTE in term of sensitivity and g-mean, but worse in term of specificity and accuracy.Software testing is a crucial process in software development life cycle which will affect the software quality. However, testing is a tedious task and resource consuming. Software testing can be conducted more efficiently by focusing this activitiy to software modules which is prone to defect. Therefore, an automated software defect prediction is needed. This research implemented Extreme Learning Machine (ELM) as classification algorithm because of its simplicity in training process and good generalization performance. Aside classification algorithm, the most important problem need to be addressed is imbalanced data between samples of positive class (prone to defect) and negative class. Such imbalance problem could bias the performance of classifier. Therefore, this research compared some approaches to handle imbalance problem between SMOTE (resampling method) and weighted-ELM (algorithm-level method).The results of experiment using 10-fold cross validation on NASA MDP dataset show that including imbalance problem handling in building software defect prediction model is able to increase the specificity and g-mean of model. When the value of imbalance ratio is not very small, the SMOTE is better than weighted-ELM. Otherwise, weighted-ELM is better than SMOTE in term of sensitivity and g-mean, but worse in term of specificity and accuracy.


Sign in / Sign up

Export Citation Format

Share Document