scholarly journals Software Defect Prediction Using Heterogeneous Ensemble Classification Based on Segmented Patterns

2020 ◽  
Vol 10 (5) ◽  
pp. 1745 ◽  
Author(s):  
Hamad Alsawalqah ◽  
Neveen Hijazi ◽  
Mohammed Eshtay ◽  
Hossam Faris ◽  
Ahmed Al Radaideh ◽  
...  

Software defect prediction is a promising approach aiming to improve software quality and testing efficiency by providing timely identification of defect-prone software modules before the actual testing process begins. These prediction results help software developers to effectively allocate their limited resources to the modules that are more prone to defects. In this paper, a hybrid heterogeneous ensemble approach is proposed for the purpose of software defect prediction. Heterogeneous ensembles consist of set of classifiers of different learning base methods in which each of them has its own strengths and weaknesses. The main idea of the proposed approach is to develop expert and robust heterogeneous classification models. Two versions of the proposed approach are developed and experimented. The first is based on simple classifiers, and the second is based on ensemble ones. For evaluation, 21 publicly available benchmark datasets are selected to conduct the experiments and benchmark the proposed approach. The evaluation results show the superiority of the ensemble version over other well-regarded basic and ensemble classifiers.

2022 ◽  
Vol 12 (1) ◽  
pp. 493
Author(s):  
Mahesha Pandit ◽  
Deepali Gupta ◽  
Divya Anand ◽  
Nitin Goyal ◽  
Hani Moaiteq Aljahdali ◽  
...  

Using artificial intelligence (AI) based software defect prediction (SDP) techniques in the software development process helps isolate defective software modules, count the number of software defects, and identify risky code changes. However, software development teams are unaware of SDP and do not have easy access to relevant models and techniques. The major reason for this problem seems to be the fragmentation of SDP research and SDP practice. To unify SDP research and practice this article introduces a cloud-based, global, unified AI framework for SDP called DePaaS—Defects Prediction as a Service. The article describes the usage context, use cases and detailed architecture of DePaaS and presents the first response of the industry practitioners to DePaaS. In a first of its kind survey, the article captures practitioner’s belief into SDP and ability of DePaaS to solve some of the known challenges of the field of software defect prediction. This article also provides a novel process for SDP, detailed description of the structure and behaviour of DePaaS architecture components, six best SDP models offered by DePaaS, a description of algorithms that recommend SDP models, feature sets and tunable parameters, and a rich set of challenges to build, use and sustain DePaaS. With the contributions of this article, SDP research and practice could be unified enabling building and using more pragmatic defect prediction models leading to increase in the efficiency of software testing.


Author(s):  
Hongyan Wan ◽  
Guoqing Wu ◽  
Mali Yu ◽  
Mengting Yuan

Software defect prediction technology has been widely used in improving the quality of software system. Most real software defect datasets tend to have fewer defective modules than defective-free modules. Highly class-imbalanced data typically make accurate predictions difficult. The imbalanced nature of software defect datasets makes the prediction model classifying a defective module as a defective-free one easily. As there exists the similarity during the different software modules, one module can be represented by the sparse representation coefficients over the pre-defined dictionary which consists of historical software defect datasets. In this study, we make use of dictionary learning method to predict software defect. We optimize the classifier parameters and the dictionary atoms iteratively, to ensure that the extracted features (sparse representation) are optimal for the trained classifier. We prove the optimal condition of the elastic net which is used to solve the sparse coding coefficients and the regularity of the elastic net solution. Due to the reason that the misclassification of defective modules generally incurs much higher cost risk than the misclassification of defective-free ones, we take the different misclassification costs into account, increasing the punishment on misclassification defective modules in the procedure of dictionary learning, making the classification inclining to classify a module as a defective one. Thus, we propose a cost-sensitive software defect prediction method using dictionary learning (CSDL). Experimental results on the 10 class-imbalance datasets of NASA show that our method is more effective than several typical state-of-the-art defect prediction methods.


Mathematics ◽  
2021 ◽  
Vol 9 (15) ◽  
pp. 1722
Author(s):  
Ruba Abu Khurma ◽  
Hamad Alsawalqah ◽  
Ibrahim Aljarah ◽  
Mohamed Abd Elaziz ◽  
Robertas Damaševičius

Software defect prediction (SDP) is crucial in the early stages of defect-free software development before testing operations take place. Effective SDP can help test managers locate defects and defect-prone software modules. This facilitates the allocation of limited software quality assurance resources optimally and economically. Feature selection (FS) is a complicated problem with a polynomial time complexity. For a dataset with N features, the complete search space has 2N feature subsets, which means that the algorithm needs an exponential running time to traverse all these feature subsets. Swarm intelligence algorithms have shown impressive performance in mitigating the FS problem and reducing the running time. The moth flame optimization (MFO) algorithm is a well-known swarm intelligence algorithm that has been used widely and proven its capability in solving various optimization problems. An efficient binary variant of MFO (BMFO) is proposed in this paper by using the island BMFO (IsBMFO) model. IsBMFO divides the solutions in the population into a set of sub-populations named islands. Each island is treated independently using a variant of BMFO. To increase the diversification capability of the algorithm, a migration step is performed after a specific number of iterations to exchange the solutions between islands. Twenty-one public software datasets are used for evaluating the proposed method. The results of the experiments show that FS using IsBMFO improves the classification results. IsBMFO followed by support vector machine (SVM) classification is the best model for the SDP problem over other compared models, with an average G-mean of 78%.


Author(s):  
YI PENG ◽  
GANG KOU ◽  
GUOXUN WANG ◽  
HONGGANG WANG ◽  
FRANZ I. S. KO

Software development involves plenty of risks, and errors exist in software modules represent a major kind of risk. Software defect prediction techniques and tools that identify software errors play a crucial role in software risk management. Among software defect prediction techniques, classification is a commonly used approach. Various types of classifiers have been applied to software defect prediction in recent years. How to select an adequate classifier (or set of classifiers) to identify error prone software modules is an important task for software development organizations. There are many different measures for classifiers and each measure is intended for assessing different aspect of a classifier. This paper developed a performance metric that combines various measures to evaluate the quality of classifiers for software defect prediction. The performance metric is analyzed experimentally using 13 classifiers on 11 public domain software defect datasets. The results of the experiment indicate that support vector machines (SVM), C4.5 algorithm, and K-nearest-neighbor algorithm ranked the top three classifiers.


2016 ◽  
Vol 2016 ◽  
pp. 1-12 ◽  
Author(s):  
Divya Tomar ◽  
Sonali Agarwal

Software defect predictors are useful to maintain the high quality of software products effectively. The early prediction of defective software modules can help the software developers to allocate the available resources to deliver high quality software products. The objective of software defect prediction system is to find as many defective software modules as possible without affecting the overall performance. The learning process of a software defect predictor is difficult due to the imbalanced distribution of software modules between defective and nondefective classes. Misclassification cost of defective software modules generally incurs much higher cost than the misclassification of nondefective one. Therefore, on considering the misclassification cost issue, we have developed a software defect prediction system using Weighted Least Squares Twin Support Vector Machine (WLSTSVM). This system assigns higher misclassification cost to the data samples of defective classes and lower cost to the data samples of nondefective classes. The experiments on eight software defect prediction datasets have proved the validity of the proposed defect prediction system. The significance of the results has been tested via statistical analysis performed by using nonparametric Wilcoxon signed rank test.


2020 ◽  
Vol 245 ◽  
pp. 05041
Author(s):  
Elisabetta Ronchieri ◽  
Marco Canaparo ◽  
Mauro Belgiovine ◽  
Davide Salomoni ◽  
Barbara Martelli

Software defect prediction is an activity that aims at narrowing down the most likely defect-prone software modules and helping developers and testers to prioritize inspection and testing. This activity can be addressed by using Machine Learning techniques applied to software metrics datasets that are usually unlabelled, i.e. they lack modules classification in terms of defectiveness. To overcome this limitation, in addition to the usual data pre-processing operations to manage mission values and/or to remove inconsistencies, researches have to adopt an approach to label their unlabelled software datasets. The extraction of defectiveness data to label all the instances of the datasets is an extremely time and effort consuming operation. In literature, many studies have introduced approaches to build a defect prediction models on unlabelled datasets. In this paper, we describe the analysis of new unlabelled datasets from WLCG software, coming from HEP-related experiments and middleware, by using Machine Learning techniques. We have experimented new approaches to label the various modules due to the heterogeneity of software metrics distribution. We discuss a number of lessons learned from conducting these activities, what has worked, what has not and how our research can be improved.


Author(s):  
Khadijah Khadijah ◽  
Priyo Sidik Sasongko

Software testing is a crucial process in software development life cycle which will affect the software quality. However, testing is a tedious task and resource consuming. Software testing can be conducted more efficiently by focusing this activitiy to software modules which is prone to defect. Therefore, an automated software defect prediction is needed. This research implemented Extreme Learning Machine (ELM) as classification algorithm because of its simplicity in training process and good generalization performance. Aside classification algorithm, the most important problem need to be addressed is imbalanced data between samples of positive class (prone to defect) and negative class. Such imbalance problem could bias the performance of classifier. Therefore, this research compared some approaches to handle imbalance problem between SMOTE (resampling method) and weighted-ELM (algorithm-level method).The results of experiment using 10-fold cross validation on NASA MDP dataset show that including imbalance problem handling in building software defect prediction model is able to increase the specificity and g-mean of model. When the value of imbalance ratio is not very small, the SMOTE is better than weighted-ELM. Otherwise, weighted-ELM is better than SMOTE in term of sensitivity and g-mean, but worse in term of specificity and accuracy.Software testing is a crucial process in software development life cycle which will affect the software quality. However, testing is a tedious task and resource consuming. Software testing can be conducted more efficiently by focusing this activitiy to software modules which is prone to defect. Therefore, an automated software defect prediction is needed. This research implemented Extreme Learning Machine (ELM) as classification algorithm because of its simplicity in training process and good generalization performance. Aside classification algorithm, the most important problem need to be addressed is imbalanced data between samples of positive class (prone to defect) and negative class. Such imbalance problem could bias the performance of classifier. Therefore, this research compared some approaches to handle imbalance problem between SMOTE (resampling method) and weighted-ELM (algorithm-level method).The results of experiment using 10-fold cross validation on NASA MDP dataset show that including imbalance problem handling in building software defect prediction model is able to increase the specificity and g-mean of model. When the value of imbalance ratio is not very small, the SMOTE is better than weighted-ELM. Otherwise, weighted-ELM is better than SMOTE in term of sensitivity and g-mean, but worse in term of specificity and accuracy.


Author(s):  
Rohit John Jacob ◽  
Rutuja J Kamat ◽  
N M Sahithya ◽  
Sharon Saji John ◽  
Sahana P. Shankar

Sign in / Sign up

Export Citation Format

Share Document