scholarly journals Integrasi SMOTE pada Naive Bayes dan Logistic Regression Berbasis Particle Swarm Optimization untuk Prediksi Cacat Perangkat Lunak

2021 ◽  
Vol 5 (1) ◽  
pp. 233
Author(s):  
Andre Hardoni ◽  
Dian Palupi Rini ◽  
Sukemi Sukemi

Software defects are one of the main contributors to information technology waste and lead to rework, thus consuming a lot of time and money. Software defect prediction has the objective of defect prevention by classifying certain modules as defective or not defective. Many researchers have conducted research in the field of software defect prediction using NASA MDP public datasets, but these datasets still have shortcomings such as class imbalance and noise attribute. The class imbalance problem can be overcome by utilizing SMOTE (Synthetic Minority Over-sampling Technique) and the noise attribute problem can be solved by selecting features using Particle Swarm Optimization (PSO), So in this research, the integration between SMOTE and PSO is applied to the classification technique machine learning naïve Bayes and logistic regression. From the results of experiments that have been carried out on 8 NASA MDP datasets by dividing the dataset into training and testing data, it is found that the SMOTE + PSO integration in each classification technique can improve classification performance with the highest AUC (Area Under Curve) value on average 0,89 on logistic regression and 0,86 in naïve Bayes in the training and at the same time better than without combining the two.

2020 ◽  
Vol 10 (23) ◽  
pp. 8324
Author(s):  
Yumei Wu ◽  
Jingxiu Yao ◽  
Shuo Chang ◽  
Bin Liu

Software defect prediction (SDP) is an effective technique to lower software module testing costs. However, the imbalanced distribution almost exists in all SDP datasets and restricts the accuracy of defect prediction. In order to balance the data distribution reasonably, we propose a novel resampling method LIMCR on the basis of Naïve Bayes to optimize and improve the SDP performance. The main idea of LIMCR is to remove less-informative majorities for rebalancing the data distribution after evaluating the degree of being informative for every sample from the majority class. We employ 29 SDP datasets from the PROMISE and NASA dataset and divide them into two parts, the small sample size (the amount of data is smaller than 1100) and the large sample size (larger than 1100). Then we conduct experiments by comparing the matching of classifiers and imbalance learning methods on small datasets and large datasets, respectively. The results show the effectiveness of LIMCR, and LIMCR+GNB performs better than other methods on small datasets while not brilliant on large datasets.


Author(s):  
Andre Alvi Agustian ◽  
Achmad Bisri

Credit approval is a process carried out by the bank or credit provider company. Where the process is carried out based on credit requests and credit proposals from the borrower. Credit approval is often difficult for banks or credit providers. Where the number of requests and classifications must be made on various data submitted. This study aims to enable banks or credit card issuing companies to carry out credit approval processes effectively and accurately in determining the status of the submissions that have been made. This research uses data mining techniques. This study uses a Credit Approval dataset from UCI Machine Learning, where there is a class imbalance in the dataset. 14 attributes are used as system inputs. This study uses the C4.5 and Naive Bayes algorithms where optimization is needed using Sample Bootstrapping and Particle Swarm Optimization (PSO) in the algorithm so that the results of the research produce good accuracy and are included in the good classification. After using the optimization, it produces an accuracy rate of C4.5 which is initially 85.99% and the AUC value of 0.904 becomes 94.44% with the AUC value of 0.969 and Naive Bayes which initially has an accuracy value of 83.09% with an AUC value of 0.916 to 90 , 10% with an AUC value of 0.944.


2020 ◽  
Vol 8 (5) ◽  
pp. 2605-2613

The exponential growth in the field of information technology, need for quality-based software development is highly demanded. The important factor to be focused during the software development is software defect detection in earlier stages. Failure to detect hidden faults will affect the effectiveness and quality of the software usage and its maintenance. In traditional software defect prediction models, projects with same metrics are involved in prediction process. In recent years, active topic is dealing with Cross Project Defect Prediction (CPDP) to predict defects on software project from other software projects dataset. Still, traditional cross project defect prediction approaches also require common metrics among the dataset of two projects for constructing the defect prediction techniques. Suppose if cross project dataset with different metrics has to be used for defect prediction then these methods become infeasible. To overcome the issues in software defect prediction using Heterogeneous cross projects dataset, this paper introduced a Boosted Relief Feature Subset Selection (BRFSS) to handle the two different projects with Heterogeneous feature sets. BRFSS employs the mapping approach to embed the data from two different domains into a comparable feature space with a lower dimension. Based on the similarity measure the difference among the mapped domains of dataset are used for prediction process. This work used five different software groups with six different datasets to perform heterogeneous cross project defect prediction using firefly particle swarm optimization. To produce optimal defect prediction in the Heterogeneous environment, the knowledge of particle swarm optimization by inducing firefly algorithm. The simulation result is compared with other standard models, the outcome of the result proved the efficiency of the prediction process while using firefly enabled particle swarm optimization.


Sign in / Sign up

Export Citation Format

Share Document