Integrasi SMOTE pada Naive Bayes dan Logistic Regression Berbasis Particle Swarm Optimization untuk Prediksi Cacat Perangkat Lunak

Software defects are one of the main contributors to information technology waste and lead to rework, thus consuming a lot of time and money. Software defect prediction has the objective of defect prevention by classifying certain modules as defective or not defective. Many researchers have conducted research in the field of software defect prediction using NASA MDP public datasets, but these datasets still have shortcomings such as class imbalance and noise attribute. The class imbalance problem can be overcome by utilizing SMOTE (Synthetic Minority Over-sampling Technique) and the noise attribute problem can be solved by selecting features using Particle Swarm Optimization (PSO), So in this research, the integration between SMOTE and PSO is applied to the classification technique machine learning naïve Bayes and logistic regression. From the results of experiments that have been carried out on 8 NASA MDP datasets by dividing the dataset into training and testing data, it is found that the SMOTE + PSO integration in each classification technique can improve classification performance with the highest AUC (Area Under Curve) value on average 0,89 on logistic regression and 0,86 in naïve Bayes in the training and at the same time better than without combining the two.

Download Full-text

LIMCR: Less-Informative Majorities Cleaning Rule Based on Naïve Bayes for Imbalance Learning in Software Defect Prediction

Applied Sciences ◽

10.3390/app10238324 ◽

2020 ◽

Vol 10 (23) ◽

pp. 8324

Author(s):

Yumei Wu ◽

Jingxiu Yao ◽

Shuo Chang ◽

Bin Liu

Keyword(s):

Sample Size ◽

Naive Bayes ◽

Data Distribution ◽

Naïve Bayes ◽

Large Datasets ◽

Small Sample ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Imbalance Learning

Software defect prediction (SDP) is an effective technique to lower software module testing costs. However, the imbalanced distribution almost exists in all SDP datasets and restricts the accuracy of defect prediction. In order to balance the data distribution reasonably, we propose a novel resampling method LIMCR on the basis of Naïve Bayes to optimize and improve the SDP performance. The main idea of LIMCR is to remove less-informative majorities for rebalancing the data distribution after evaluating the degree of being informative for every sample from the majority class. We employ 29 SDP datasets from the PROMISE and NASA dataset and divide them into two parts, the small sample size (the amount of data is smaller than 1100) and the large sample size (larger than 1100). Then we conduct experiments by comparing the matching of classifiers and imbalance learning methods on small datasets and large datasets, respectively. The results show the effectiveness of LIMCR, and LIMCR+GNB performs better than other methods on small datasets while not brilliant on large datasets.

Download Full-text

Applying Weighted Particle Swarm Optimization to Imbalanced Data in Software Defect Prediction

Lecture Notes in Networks and Systems - New Technologies, Development and Application ◽

10.1007/978-3-319-90893-9_35 ◽

2018 ◽

pp. 289-296 ◽

Cited By ~ 2

Author(s):

Lucija Brezočnik ◽

Vili Podgorelec

Keyword(s):

Particle Swarm Optimization ◽

Particle Swarm ◽

Imbalanced Data ◽

Defect Prediction ◽

Software Defect Prediction ◽

Swarm Optimization ◽

Software Defect

Download Full-text

Data Mining Optimization Using Sample Bootstrapping and Particle Swarm Optimization in the Credit Approval Classification

Indonesian Journal of Artificial Intelligence and Data Mining ◽

10.24014/ijaidm.v2i1.6299 ◽

2019 ◽

Vol 2 (1) ◽

Author(s):

Andre Alvi Agustian ◽

Achmad Bisri

Keyword(s):

Data Mining ◽

Particle Swarm Optimization ◽

Credit Card ◽

Naive Bayes ◽

Particle Swarm ◽

Class Imbalance ◽

Naïve Bayes ◽

Swarm Optimization ◽

The Status ◽

Auc Value

Credit approval is a process carried out by the bank or credit provider company. Where the process is carried out based on credit requests and credit proposals from the borrower. Credit approval is often difficult for banks or credit providers. Where the number of requests and classifications must be made on various data submitted. This study aims to enable banks or credit card issuing companies to carry out credit approval processes effectively and accurately in determining the status of the submissions that have been made. This research uses data mining techniques. This study uses a Credit Approval dataset from UCI Machine Learning, where there is a class imbalance in the dataset. 14 attributes are used as system inputs. This study uses the C4.5 and Naive Bayes algorithms where optimization is needed using Sample Bootstrapping and Particle Swarm Optimization (PSO) in the algorithm so that the results of the research produce good accuracy and are included in the good classification. After using the optimization, it produces an accuracy rate of C4.5 which is initially 85.99% and the AUC value of 0.904 becomes 94.44% with the AUC value of 0.969 and Naive Bayes which initially has an accuracy value of 83.09% with an AUC value of 0.916 to 90 , 10% with an AUC value of 0.944.

Download Full-text

Implementation of Chaotic Gaussian Particle Swarm Optimization for Optimize Learning-to-Rank Software Defect Prediction Model Construction

Journal of Physics Conference Series ◽

10.1088/1742-6596/978/1/012079 ◽

2018 ◽

Vol 978 ◽

pp. 012079 ◽

Cited By ~ 1

Author(s):

M A Buchari ◽

S Mardiyanto ◽

B Hendradjaya

Keyword(s):

Particle Swarm Optimization ◽

Prediction Model ◽

Learning To Rank ◽

Particle Swarm ◽

Model Construction ◽

Defect Prediction ◽

Software Defect Prediction ◽

Swarm Optimization ◽

Software Defect

Download Full-text

A Comparison of the Best Fitness Functions for Software Defect Prediction in Object-Oriented Applications Using Particle Swarm Optimization

Algorithms for Intelligent Systems - Intelligent Systems ◽

10.1007/978-981-16-2248-9_13 ◽

2021 ◽

pp. 125-133

Author(s):

Hurditya ◽

Ekta Rani ◽

Mridul Gupta ◽

Ruchika Malhotra

Keyword(s):

Particle Swarm Optimization ◽

Particle Swarm ◽

Object Oriented ◽

Defect Prediction ◽

Software Defect Prediction ◽

Swarm Optimization ◽

Fitness Functions ◽

Software Defect

Download Full-text

Software Defect Prediction Using Software Metrics with Naïve Bayes and Rule Mining Association Methods

2019 5th International Conference on Science and Technology (ICST) ◽

10.1109/icst47872.2019.9166448 ◽

2019 ◽

Author(s):

Fernando Maruli Tua ◽

Wikan Danar Sunindyo

Keyword(s):

Software Metrics ◽

Naive Bayes ◽

Naïve Bayes ◽

Defect Prediction ◽

Software Defect Prediction ◽

Rule Mining ◽

Software Defect

Download Full-text

Neural Network based Software Defect Prediction using Genetic Algorithm and Particle Swarm Optimization

2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT) ◽

10.1109/icasert.2019.8934642 ◽

2019 ◽

Cited By ~ 2

Author(s):

Safial Islam Ayon

Keyword(s):

Neural Network ◽

Genetic Algorithm ◽

Particle Swarm Optimization ◽

Particle Swarm ◽

Defect Prediction ◽

Software Defect Prediction ◽

Swarm Optimization ◽

Software Defect

Download Full-text

A new model for software defect prediction using Particle Swarm Optimization and support vector machine

2013 25th Chinese Control and Decision Conference (CCDC) ◽

10.1109/ccdc.2013.6561670 ◽

2013 ◽

Cited By ~ 12

Author(s):

He Can ◽

Xing Jianchun ◽

Zhu Ruide ◽

Li Juelong ◽

Yang Qiliang ◽

...

Keyword(s):

Support Vector Machine ◽

Particle Swarm Optimization ◽

Particle Swarm ◽

Defect Prediction ◽

Support Vector ◽

Software Defect Prediction ◽

Swarm Optimization ◽

New Model ◽

Software Defect

Download Full-text

Application of Particle Swarm Optimization for Software Defect Prediction Using Object Oriented Metrics

2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence) ◽

10.1109/confluence51648.2021.9377116 ◽

2021 ◽

Author(s):

Ruchika Malhotra ◽

Nishant Nishant ◽

Spandun Gurha ◽

Vishal Rathi

Keyword(s):

Particle Swarm Optimization ◽

Particle Swarm ◽

Object Oriented ◽

Defect Prediction ◽

Software Defect Prediction ◽

Swarm Optimization ◽

Software Defect ◽

Object Oriented Metrics

Download Full-text

Boosted Relief Feature Subset Selection and Heterogeneous Cross Project Defect Prediction using Firefly Particle Swarm Optimization

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e6333.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 2605-2613

Keyword(s):

Particle Swarm Optimization ◽

Particle Swarm ◽

Subset Selection ◽

Feature Subset Selection ◽

Defect Prediction ◽

Feature Subset ◽

Software Defect Prediction ◽

Swarm Optimization ◽

Software Defect ◽

Cross Project

The exponential growth in the field of information technology, need for quality-based software development is highly demanded. The important factor to be focused during the software development is software defect detection in earlier stages. Failure to detect hidden faults will affect the effectiveness and quality of the software usage and its maintenance. In traditional software defect prediction models, projects with same metrics are involved in prediction process. In recent years, active topic is dealing with Cross Project Defect Prediction (CPDP) to predict defects on software project from other software projects dataset. Still, traditional cross project defect prediction approaches also require common metrics among the dataset of two projects for constructing the defect prediction techniques. Suppose if cross project dataset with different metrics has to be used for defect prediction then these methods become infeasible. To overcome the issues in software defect prediction using Heterogeneous cross projects dataset, this paper introduced a Boosted Relief Feature Subset Selection (BRFSS) to handle the two different projects with Heterogeneous feature sets. BRFSS employs the mapping approach to embed the data from two different domains into a comparable feature space with a lower dimension. Based on the similarity measure the difference among the mapped domains of dataset are used for prediction process. This work used five different software groups with six different datasets to perform heterogeneous cross project defect prediction using firefly particle swarm optimization. To produce optimal defect prediction in the Heterogeneous environment, the knowledge of particle swarm optimization by inducing firefly algorithm. The simulation result is compared with other standard models, the outcome of the result proved the efficiency of the prediction process while using firefly enabled particle swarm optimization.

Download Full-text