Software Defect Prediction Using Heterogeneous Ensemble Classification Based on Segmented Patterns

Software defect prediction is a promising approach aiming to improve software quality and testing efficiency by providing timely identification of defect-prone software modules before the actual testing process begins. These prediction results help software developers to effectively allocate their limited resources to the modules that are more prone to defects. In this paper, a hybrid heterogeneous ensemble approach is proposed for the purpose of software defect prediction. Heterogeneous ensembles consist of set of classifiers of different learning base methods in which each of them has its own strengths and weaknesses. The main idea of the proposed approach is to develop expert and robust heterogeneous classification models. Two versions of the proposed approach are developed and experimented. The first is based on simple classifiers, and the second is based on ensemble ones. For evaluation, 21 publicly available benchmark datasets are selected to conduct the experiments and benchmark the proposed approach. The evaluation results show the superiority of the ensemble version over other well-regarded basic and ensemble classifiers.

Download Full-text

Feature Selection and Software Defect Prediction by Different Ensemble Classifiers

10.1007/978-3-030-86472-9_28 ◽

2021 ◽

pp. 307-313

Author(s):

Natalya Shakhovska ◽

Vitaliy Yakovyna

Keyword(s):

Feature Selection ◽

Defect Prediction ◽

Software Defect Prediction ◽

Ensemble Classifiers ◽

Software Defect

Download Full-text

Towards Design and Feasibility Analysis of DePaaS: AI Based Global Unified Software Defect Prediction Framework

Applied Sciences ◽

10.3390/app12010493 ◽

2022 ◽

Vol 12 (1) ◽

pp. 493

Author(s):

Mahesha Pandit ◽

Deepali Gupta ◽

Divya Anand ◽

Nitin Goyal ◽

Hani Moaiteq Aljahdali ◽

...

Keyword(s):

Software Development ◽

Prediction Models ◽

Easy Access ◽

Defect Prediction ◽

Software Defect Prediction ◽

Research And Practice ◽

Software Defect ◽

Software Modules ◽

Software Development Teams ◽

Defect Prediction Models

Using artificial intelligence (AI) based software defect prediction (SDP) techniques in the software development process helps isolate defective software modules, count the number of software defects, and identify risky code changes. However, software development teams are unaware of SDP and do not have easy access to relevant models and techniques. The major reason for this problem seems to be the fragmentation of SDP research and SDP practice. To unify SDP research and practice this article introduces a cloud-based, global, unified AI framework for SDP called DePaaS—Defects Prediction as a Service. The article describes the usage context, use cases and detailed architecture of DePaaS and presents the first response of the industry practitioners to DePaaS. In a first of its kind survey, the article captures practitioner’s belief into SDP and ability of DePaaS to solve some of the known challenges of the field of software defect prediction. This article also provides a novel process for SDP, detailed description of the structure and behaviour of DePaaS architecture components, six best SDP models offered by DePaaS, a description of algorithms that recommend SDP models, feature sets and tunable parameters, and a rich set of challenges to build, use and sustain DePaaS. With the contributions of this article, SDP research and practice could be unified enabling building and using more pragmatic defect prediction models leading to increase in the efficiency of software testing.

Download Full-text

Software Defect Prediction Based on Cost-Sensitive Dictionary Learning

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194019500384 ◽

2019 ◽

Vol 29 (09) ◽

pp. 1219-1243 ◽

Cited By ~ 1

Author(s):

Hongyan Wan ◽

Guoqing Wu ◽

Mali Yu ◽

Mengting Yuan

Keyword(s):

Sparse Representation ◽

Dictionary Learning ◽

Class Imbalance ◽

Imbalanced Data ◽

Prediction Method ◽

Elastic Net ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Software Modules

Software defect prediction technology has been widely used in improving the quality of software system. Most real software defect datasets tend to have fewer defective modules than defective-free modules. Highly class-imbalanced data typically make accurate predictions difficult. The imbalanced nature of software defect datasets makes the prediction model classifying a defective module as a defective-free one easily. As there exists the similarity during the different software modules, one module can be represented by the sparse representation coefficients over the pre-defined dictionary which consists of historical software defect datasets. In this study, we make use of dictionary learning method to predict software defect. We optimize the classifier parameters and the dictionary atoms iteratively, to ensure that the extracted features (sparse representation) are optimal for the trained classifier. We prove the optimal condition of the elastic net which is used to solve the sparse coding coefficients and the regularity of the elastic net solution. Due to the reason that the misclassification of defective modules generally incurs much higher cost risk than the misclassification of defective-free ones, we take the different misclassification costs into account, increasing the punishment on misclassification defective modules in the procedure of dictionary learning, making the classification inclining to classify a module as a defective one. Thus, we propose a cost-sensitive software defect prediction method using dictionary learning (CSDL). Experimental results on the 10 class-imbalance datasets of NASA show that our method is more effective than several typical state-of-the-art defect prediction methods.

Download Full-text

An Enhanced Evolutionary Software Defect Prediction Method Using Island Moth Flame Optimization

Mathematics ◽

10.3390/math9151722 ◽

2021 ◽

Vol 9 (15) ◽

pp. 1722

Author(s):

Ruba Abu Khurma ◽

Hamad Alsawalqah ◽

Ibrahim Aljarah ◽

Mohamed Abd Elaziz ◽

Robertas Damaševičius

Keyword(s):

Swarm Intelligence ◽

Optimization Problems ◽

Prediction Method ◽

Search Space ◽

Defect Prediction ◽

Support Vector ◽

Software Defect Prediction ◽

Running Time ◽

Software Defect ◽

Software Modules

Software defect prediction (SDP) is crucial in the early stages of defect-free software development before testing operations take place. Effective SDP can help test managers locate defects and defect-prone software modules. This facilitates the allocation of limited software quality assurance resources optimally and economically. Feature selection (FS) is a complicated problem with a polynomial time complexity. For a dataset with N features, the complete search space has 2N feature subsets, which means that the algorithm needs an exponential running time to traverse all these feature subsets. Swarm intelligence algorithms have shown impressive performance in mitigating the FS problem and reducing the running time. The moth flame optimization (MFO) algorithm is a well-known swarm intelligence algorithm that has been used widely and proven its capability in solving various optimization problems. An efficient binary variant of MFO (BMFO) is proposed in this paper by using the island BMFO (IsBMFO) model. IsBMFO divides the solutions in the population into a set of sub-populations named islands. Each island is treated independently using a variant of BMFO. To increase the diversification capability of the algorithm, a migration step is performed after a specific number of iterations to exchange the solutions between islands. Twenty-one public software datasets are used for evaluating the proposed method. The results of the experiments show that FS using IsBMFO improves the classification results. IsBMFO followed by support vector machine (SVM) classification is the best model for the SDP problem over other compared models, with an average G-mean of 78%.

Download Full-text

A Feature Selection based Ensemble Classification Framework for Software Defect Prediction

International Journal of Modern Education and Computer Science ◽

10.5815/ijmecs.2019.09.06 ◽

2019 ◽

Vol 11 (9) ◽

pp. 54-64

Author(s):

Ahmed Iqbal ◽

Shabib Aftab ◽

Israr Ullah ◽

Muhammad Salman Bashir ◽

Muhammad Anwaar Saeed

Keyword(s):

Feature Selection ◽

Ensemble Classification ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Classification Framework

Download Full-text

EMPIRICAL EVALUATION OF CLASSIFIERS FOR SOFTWARE RISK MANAGEMENT

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622009003715 ◽

2009 ◽

Vol 08 (04) ◽

pp. 749-767 ◽

Cited By ~ 37

Author(s):

YI PENG ◽

GANG KOU ◽

GUOXUN WANG ◽

HONGGANG WANG ◽

FRANZ I. S. KO

Keyword(s):

Risk Management ◽

Software Development ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Performance Metric ◽

Software Modules ◽

Software Risk Management ◽

Software Risk ◽

Prediction Techniques

Software development involves plenty of risks, and errors exist in software modules represent a major kind of risk. Software defect prediction techniques and tools that identify software errors play a crucial role in software risk management. Among software defect prediction techniques, classification is a commonly used approach. Various types of classifiers have been applied to software defect prediction in recent years. How to select an adequate classifier (or set of classifiers) to identify error prone software modules is an important task for software development organizations. There are many different measures for classifiers and each measure is intended for assessing different aspect of a classifier. This paper developed a performance metric that combines various measures to evaluate the quality of classifiers for software defect prediction. The performance metric is analyzed experimentally using 13 classifiers on 11 public domain software defect datasets. The results of the experiment indicate that support vector machines (SVM), C4.5 algorithm, and K-nearest-neighbor algorithm ranked the top three classifiers.

Download Full-text

Prediction of Defective Software Modules Using Class Imbalance Learning

Applied Computational Intelligence and Soft Computing ◽

10.1155/2016/7658207 ◽

2016 ◽

Vol 2016 ◽

pp. 1-12 ◽

Cited By ~ 18

Author(s):

Divya Tomar ◽

Sonali Agarwal

Keyword(s):

Twin Support Vector Machine ◽

Defect Prediction ◽

Support Vector ◽

Prediction System ◽

Software Defect Prediction ◽

High Quality ◽

Misclassification Cost ◽

Software Products ◽

Software Defect ◽

Software Modules

Software defect predictors are useful to maintain the high quality of software products effectively. The early prediction of defective software modules can help the software developers to allocate the available resources to deliver high quality software products. The objective of software defect prediction system is to find as many defective software modules as possible without affecting the overall performance. The learning process of a software defect predictor is difficult due to the imbalanced distribution of software modules between defective and nondefective classes. Misclassification cost of defective software modules generally incurs much higher cost than the misclassification of nondefective one. Therefore, on considering the misclassification cost issue, we have developed a software defect prediction system using Weighted Least Squares Twin Support Vector Machine (WLSTSVM). This system assigns higher misclassification cost to the data samples of defective classes and lower cost to the data samples of nondefective classes. The experiments on eight software defect prediction datasets have proved the validity of the proposed defect prediction system. The significance of the results has been tested via statistical analysis performed by using nonparametric Wilcoxon signed rank test.

Download Full-text

Lessons Learned from the Assessment of Software Defect Prediction on WLCG Software: A Study with Unlabelled Datasets and Machine Learning Techniques

EPJ Web of Conferences ◽

10.1051/epjconf/202024505041 ◽

2020 ◽

Vol 245 ◽

pp. 05041

Author(s):

Elisabetta Ronchieri ◽

Marco Canaparo ◽

Mauro Belgiovine ◽

Davide Salomoni ◽

Barbara Martelli

Keyword(s):

Machine Learning ◽

Software Metrics ◽

Prediction Models ◽

Lessons Learned ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Learning Techniques ◽

Software Modules

Software defect prediction is an activity that aims at narrowing down the most likely defect-prone software modules and helping developers and testers to prioritize inspection and testing. This activity can be addressed by using Machine Learning techniques applied to software metrics datasets that are usually unlabelled, i.e. they lack modules classification in terms of defectiveness. To overcome this limitation, in addition to the usual data pre-processing operations to manage mission values and/or to remove inconsistencies, researches have to adopt an approach to label their unlabelled software datasets. The extraction of defectiveness data to label all the instances of the datasets is an extremely time and effort consuming operation. In literature, many studies have introduced approaches to build a defect prediction models on unlabelled datasets. In this paper, we describe the analysis of new unlabelled datasets from WLCG software, coming from HEP-related experiments and middleware, by using Machine Learning techniques. We have experimented new approaches to label the various modules due to the heterogeneity of software metrics distribution. We discuss a number of lessons learned from conducting these activities, what has worked, what has not and how our research can be improved.

Download Full-text

The Comparison of Imbalanced Data Handling Method in Software Defect Prediction

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i3.1049 ◽

2020 ◽

pp. 203-210

Author(s):

Khadijah Khadijah ◽

Priyo Sidik Sasongko

Keyword(s):

Software Testing ◽

Imbalanced Data ◽

Classification Algorithm ◽

Defect Prediction ◽

Software Defect Prediction ◽

Imbalance Problem ◽

Software Defect ◽

Software Modules ◽

Learning Machine ◽

Better Than

Software testing is a crucial process in software development life cycle which will affect the software quality. However, testing is a tedious task and resource consuming. Software testing can be conducted more efficiently by focusing this activitiy to software modules which is prone to defect. Therefore, an automated software defect prediction is needed. This research implemented Extreme Learning Machine (ELM) as classification algorithm because of its simplicity in training process and good generalization performance. Aside classification algorithm, the most important problem need to be addressed is imbalanced data between samples of positive class (prone to defect) and negative class. Such imbalance problem could bias the performance of classifier. Therefore, this research compared some approaches to handle imbalance problem between SMOTE (resampling method) and weighted-ELM (algorithm-level method).The results of experiment using 10-fold cross validation on NASA MDP dataset show that including imbalance problem handling in building software defect prediction model is able to increase the specificity and g-mean of model. When the value of imbalance ratio is not very small, the SMOTE is better than weighted-ELM. Otherwise, weighted-ELM is better than SMOTE in term of sensitivity and g-mean, but worse in term of specificity and accuracy.Software testing is a crucial process in software development life cycle which will affect the software quality. However, testing is a tedious task and resource consuming. Software testing can be conducted more efficiently by focusing this activitiy to software modules which is prone to defect. Therefore, an automated software defect prediction is needed. This research implemented Extreme Learning Machine (ELM) as classification algorithm because of its simplicity in training process and good generalization performance. Aside classification algorithm, the most important problem need to be addressed is imbalanced data between samples of positive class (prone to defect) and negative class. Such imbalance problem could bias the performance of classifier. Therefore, this research compared some approaches to handle imbalance problem between SMOTE (resampling method) and weighted-ELM (algorithm-level method).The results of experiment using 10-fold cross validation on NASA MDP dataset show that including imbalance problem handling in building software defect prediction model is able to increase the specificity and g-mean of model. When the value of imbalance ratio is not very small, the SMOTE is better than weighted-ELM. Otherwise, weighted-ELM is better than SMOTE in term of sensitivity and g-mean, but worse in term of specificity and accuracy.

Download Full-text

Voting Based Ensemble Classification for Software Defect Prediction

10.1109/mysurucon52639.2021.9641713 ◽

2021 ◽

Author(s):

Rohit John Jacob ◽

Rutuja J Kamat ◽

N M Sahithya ◽

Sharon Saji John ◽

Sahana P. Shankar

Keyword(s):

Ensemble Classification ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Download Full-text