Omni-Ensemble Learning (OEL): Utilizing Over-Bagging, Static and Dynamic Ensemble Selection Approaches for Software Defect Prediction

Machine learning methods in software engineering are becoming increasingly important as they can improve quality and testing efficiency by constructing models to predict defects in software modules. The existing datasets for software defect prediction suffer from an imbalance of class distribution which makes the learning problem in such a task harder. In this paper, we propose a novel approach by integrating Over-Bagging, static and dynamic ensemble selection strategies. The proposed method utilizes most of ensemble learning approaches called Omni-Ensemble Learning (OEL). This approach exploits a new Over-Bagging method for class imbalance learning in which the effect of three different methods of assigning weight to training samples is investigated. The proposed method first specifies the best classifiers along with their combiner for all test samples through Genetic Algorithm as the static ensemble selection approach. Then, a subset of the selected classifiers is chosen for each test sample as the dynamic ensemble selection. Our experiments confirm that the proposed OEL can provide better overall performance (in terms of G-mean, balance, and AUC measures) comparing with other six related works and six multiple classifier systems over seven NASA datasets. We generally recommend OEL to improve the performance of software defect prediction and the similar problem based on these experimental results.

Download Full-text

Software Defect Prediction Incremental Model using Ensemble Learning

International Journal of Performability Engineering ◽

10.23940/ijpe.20.11.p9.17711780 ◽

2020 ◽

Vol 16 (11) ◽

pp. 1771

Author(s):

Wang Shibo ◽

Li Yong ◽

Mi Wenbo ◽

Liu Ying

Keyword(s):

Ensemble Learning ◽

Defect Prediction ◽

Software Defect Prediction ◽

Incremental Model ◽

Software Defect

Download Full-text

Support Vector based Oversampling Technique for Handling Class Imbalance in Software Defect Prediction

2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence) ◽

10.1109/confluence51648.2021.9377068 ◽

2021 ◽

Author(s):

Ruchika Malhotra ◽

Vaibhav Agrawal ◽

Vedansh Pal ◽

Tushar Agarwal

Keyword(s):

Class Imbalance ◽

Defect Prediction ◽

Support Vector ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

Handling Class-Imbalance with KNN (Neighbourhood) Under-Sampling for Software Defect Prediction

Artificial Intelligence Review ◽

10.1007/s10462-021-10044-w ◽

2021 ◽

Author(s):

Somya Goyal

Keyword(s):

Class Imbalance ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Under Sampling

Download Full-text

Software Defect Prediction and Localization with Attention-Based Models and Ensemble Learning

2020 27th Asia-Pacific Software Engineering Conference (APSEC) ◽

10.1109/apsec51365.2020.00016 ◽

2020 ◽

Author(s):

Tianhang Zhang ◽

Qingfeng Du ◽

Jincheng Xu ◽

Jiechu Li ◽

Xiaojun Li

Keyword(s):

Ensemble Learning ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

Class Imbalance Issue in Software Defect Prediction Models by various Machine Learning Techniques: An Empirical Study

10.1109/icscc51209.2021.9528170 ◽

2021 ◽

Author(s):

Sushant Kumar Pandey ◽

Anil Kumar Tripathi

Keyword(s):

Machine Learning ◽

Empirical Study ◽

Prediction Models ◽

Class Imbalance ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Learning Techniques ◽

Defect Prediction Models

Download Full-text

Prediction of Software Defect Using Ensemble Learning Based Improved Sparrow Search Algorithm To Optimize Extreme Learning Machine

10.21203/rs.3.rs-1100298/v1 ◽

2021 ◽

Author(s):

Yu Tang ◽

Qi Dai ◽

Mengyuan Yang ◽

Lifang Chen

Keyword(s):

Extreme Learning Machine ◽

Ensemble Learning ◽

Learning Algorithm ◽

Search Algorithm ◽

Defect Prediction ◽

Software Defect Prediction ◽

Prediction Ability ◽

Software Defect ◽

Ensemble Learning Algorithm ◽

Learning Machine

Abstract For the traditional ensemble learning algorithm of software defect prediction, the base predictor exists the problem that too many parameters are difficult to optimize, resulting in the optimized performance of the model unable to be obtained. An ensemble learning algorithm for software defect prediction that is proposed by using the improved sparrow search algorithm to optimize the extreme learning machine, which divided into three parts. Firstly, the improved sparrow search algorithm (ISSA) is proposed to improve the optimization ability and convergence speed, and the performance of the improved sparrow search algorithm is tested by using eight benchmark test functions. Secondly, ISSA is used to optimize extreme learning machine (ISSA-ELM) to improve the prediction ability. Finally, the optimized ensemble learning algorithm (ISSA-ELM-Bagging) is presented in the Bagging algorithm which improve the prediction performance of ELM in software defect datasets. Experiments are carried out in six groups of software defect datasets. The experimental results show that ISSA-ELM-Bagging ensemble learning algorithm is significantly better than the other four comparison algorithms under the six evaluation indexes of Precision, Recall, F-measure, MCC, Accuracy and G-mean, which has better stability and generalization ability.

Download Full-text

A Hybrid Approach to Coping with High Dimensionality and Class Imbalance for Software Defect Prediction

2012 11th International Conference on Machine Learning and Applications ◽

10.1109/icmla.2012.145 ◽

2012 ◽

Cited By ~ 4

Author(s):

Kehan Gao ◽

Taghi M. Khoshgoftaar ◽

Amri Napolitano

Keyword(s):

Hybrid Approach ◽

Class Imbalance ◽

High Dimensionality ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

Software Defect Prediction Based on Ensemble Learning

Proceedings of the 2019 2nd International Conference on Data Science and Information Technology ◽

10.1145/3352411.3352412 ◽

2019 ◽

Cited By ~ 4

Author(s):

Ran Li ◽

Lijuan Zhou ◽

Shudong Zhang ◽

Hui Liu ◽

Xiangyang Huang ◽

...

Keyword(s):

Ensemble Learning ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

Handling Imbalanced Data using Ensemble Learning in Software Defect Prediction

2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence) ◽

10.1109/confluence47617.2020.9058124 ◽

2020 ◽

Cited By ~ 1

Author(s):

Ruchika Malhotra ◽

Juhi Jain

Keyword(s):

Ensemble Learning ◽

Imbalanced Data ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

The Use of Ensemble-Based Data Preprocessing Techniques for Software Defect Prediction

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194014400105 ◽

2014 ◽

Vol 24 (09) ◽

pp. 1229-1253 ◽

Cited By ~ 3

Author(s):

Kehan Gao ◽

Taghi M. Khoshgoftaar ◽

Amri Napolitano

Keyword(s):

Feature Selection ◽

Prediction Models ◽

Measurement Data ◽

Class Imbalance ◽

Data Preprocessing ◽

High Dimensionality ◽

Training Dataset ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Software defect prediction models that use software metrics such as code-level measurements and defect data to build classification models are useful tools for identifying potentially-problematic program modules. Effectiveness of detecting such modules is affected by the software measurements used, making data preprocessing an important step during software quality prediction. Generally, there are two problems affecting software measurement data: high dimensionality (where a training dataset has an extremely large number of independent attributes, or features) and class imbalance (where a training dataset has one class with relatively many more members than the other class). In this paper, we present a novel form of ensemble learning based on boosting that incorporates data sampling to alleviate class imbalance and feature (software metric) selection to address high dimensionality. As we adopt two different sampling methods (Random Undersampling (RUS) and Synthetic Minority Oversampling (SMOTE)) in the technique, we have two forms of our new ensemble-based approach: selectRUSBoost and selectSMOTEBoost. To evaluate the effectiveness of these new techniques, we apply them to two groups of datasets from two real-world software systems. In the experiments, four learners and nine feature selection techniques are employed to build our models. We also consider versions of the technique which do not incorporate feature selection, and compare all four techniques (the two different ensemble-based approaches which utilize feature selection and the two versions which use sampling only). The experimental results demonstrate that selectRUSBoost is generally more effective in improving defect prediction performance than selectSMOTEBoost, and that the techniques with feature selection do help for getting better prediction than the techniques without feature selection.

Download Full-text