A Comparative Study of Bug Classification Algorithms

The performance of ten classic algorithms to classify the software bugs for different bug repositories are compared. The algorithms included in the study are Naïve Bayes, Naïve Bayes Multinomial, Discriminative Multinomial Naïve Bayes (DMNB), J48, Support Vector Machine, Radial Basis Function (RBF) Neural Network, Classification using Clustering, Classification using Regression, Adaptive Boosting (AdaBoost) and Bagging. These algorithms are applied on four open source bug repositories namely Android, JBoss-Seam, Mozilla and MySql. The classification is evaluated using 10-fold cross validation technique. The accuracy and F-measure parameters are compared for all of the algorithms. The concept of software bug taxonomy hierarchy is also introduced with eleven standard bug categories (classes). The comparative study also covers the effect of number of categories over performance of classifiers in terms of accuracy and F-measure. The results are produced in tabular and graphical forms.

Download Full-text

COMPARATIVE STUDY OF CLASSIFICATION ALGORITHMS: HOLDOUTS AS ACCURACY ESTIMATION

CogITo Smart Journal ◽

10.31154/cogito.v1i1.2.13-23 ◽

2016 ◽

Vol 1 (1) ◽

pp. 13 ◽

Cited By ~ 1

Author(s):

Debby Erce Sondakh

Keyword(s):

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Decision Rules ◽

Naïve Bayes ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Accuracy Estimation ◽

F Measure

Penelitian ini bertujuan untuk mengukur dan membandingkan kinerja lima algoritma klasifikasi teks berbasis pembelajaran mesin, yaitu decision rules, decision tree, k-nearest neighbor (k-NN), naïve Bayes, dan Support Vector Machine (SVM), menggunakan dokumen teks multi-class. Perbandingan dilakukan pada efektifiatas algoritma, yaitu kemampuan untuk mengklasifikasi dokumen pada kategori yang tepat, menggunakan metode holdout atau percentage split. Ukuran efektifitas yang digunakan adalah precision, recall, F-measure, dan akurasi. Hasil eksperimen menunjukkan bahwa untuk algoritma naïve Bayes, semakin besar persentase dokumen pelatihan semakin tinggi akurasi model yang dihasilkan. Akurasi tertinggi naïve Bayes pada persentase 90/10, SVM pada 80/20, dan decision tree pada 70/30. Hasil eksperimen juga menunjukkan, algoritma naïve Bayes memiliki nilai efektifitas tertinggi di antara lima algoritma yang diuji, dan waktu membangun model klasiifikasi yang tercepat, yaitu 0.02 detik. Algoritma decision tree dapat mengklasifikasi dokumen teks dengan nilai akurasi yang lebih tinggi dibanding SVM, namun waktu membangun modelnya lebih lambat. Dalam hal waktu membangun model, k-NN adalah yang tercepat namun nilai akurasinya kurang.

Download Full-text

Energy Management in Wireless Sensor Networks Based on Naive Bayes, MLP, and SVM Classifications: A Comparative Study

Journal of Sensors ◽

10.1155/2016/6250319 ◽

2016 ◽

Vol 2016 ◽

pp. 1-12 ◽

Cited By ~ 4

Author(s):

Abdulaziz Y. Barnawi ◽

Ismail M. Keshta

Keyword(s):

Energy Efficiency ◽

Wireless Sensor Networks ◽

Sensor Networks ◽

Comparative Study ◽

Energy Management ◽

Naive Bayes ◽

Naïve Bayes ◽

Wireless Sensor ◽

Support Vector ◽

Multilayer Perceptrons

Maximizing wireless sensor networks (WSNs) lifetime is a primary objective in the design of these networks. Intelligent energy management models can assist designers to achieve this objective. These models aim to reduce the number of selected sensors to report environmental measurements and, hence, achieve higher energy efficiency while maintaining the desired level of accuracy in the reported measurement. In this paper, we present a comparative study of three intelligent models based on Naive Bayes, Multilayer Perceptrons (MLP), and Support Vector Machine (SVM) classifiers. Simulation results show that Linear-SVM selects sensors that produce higher energy efficiency compared to those selected by MLP and Naive Bayes for the same WSNs Lifetime Extension Factor.

Download Full-text

Comparative Study of Support Vector Machine and Naïve Bayes Classification Algorithm on Amazon Data

International Journal of Computer Trends and Technology ◽

10.14445/22312803/ijctt-v67i12p106 ◽

2019 ◽

Vol 67 (12) ◽

pp. 24-27

Author(s):

Priyanka Tyagi ◽

Tripathi R.C

Keyword(s):

Support Vector Machine ◽

Comparative Study ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Algorithm ◽

Support Vector ◽

Naive Bayes Classification ◽

Naïve Bayes Classification

Download Full-text

Pemodelan Prediksi Status Keberlanjutan Polis Asuransi Kendaraan dengan Teknik Pemilihan Mayoritas Menggunakan Algoritma-Algoritma Klasifikasi Data Mining

Prosiding Seminar Nasional Teknoka ◽

10.22236/teknoka.v5i.391 ◽

2020 ◽

Vol 5 ◽

pp. 19-24

Author(s):

Dyah Retno Utari ◽

Arief Wibowo

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Majority Voting ◽

Support Vector ◽

F Measure

Asuransi kendaraan bermotor merupakan jenis usaha pertanggungan terhadap kerugian atau risiko kerusakan yang dapat timbul dari berbagai macam potensi kejadian yang menimpa kendaraan. Persaingan dalam bisnis asuransi khususnya untuk kendaraan bermotor menuntut inovasi dan strategi agar keberlangsungan bisnis tetap terjamin. Salah satu upaya yang dapat dilakukan perusahaan adalah memprediksi status keberlanjutan polis asuransi kendaraan dengan menganalisis data-data profil dan transaksi nasabah. Prediksi terhadap keputusan pemegang polis menjadi sangat penting bagi perusahaan, karena dapat menentukan strategi pemasaran yang mempengaruhi keputusan pelanggan untuk pembaharuan polis asuransi. Penelitian ini telah mengusulkan suatu model prediksi status keberlanjutan polis asuransi kendaraan dengan teknik pemilihan mayoritas dari hasil klasifikasi menggunakan algoritma- algoritma data mining seperti Naive Bayes, Support Vector Machine dan Decision Tree. Hasil pengujian menggunakan confusion matrix menunjukkan nilai akurasi terbaik diperoleh sebesar 93,57%, apapun untuk nilai precision mencapai 97,20%, dan nilai recall sebesar 95,20% serta nilai F-Measure sebesar 95,30%. Nilai evaluasi model terbaik dihasilkan menggunakan pendekatan pemilihan mayoritas (majority voting), mengungguli kinerja model prediksi berbasis pengklasifikasi tunggal.

Download Full-text

Exploration of Lymph Node-Negative Breast Cancers by Support Vector Machines, Naïve Bayes, and Decision Trees: A Comparative Study

Handbook of Artificial Intelligence in Biomedical Engineering ◽

10.1201/9781003045564-23 ◽

2020 ◽

pp. 509-524

Author(s):

J. Satya Eswari ◽

Pradeep Singh

Keyword(s):

Lymph Node ◽

Support Vector Machines ◽

Comparative Study ◽

Decision Trees ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Breast Cancers ◽

Node Negative ◽

Vector Machines

Download Full-text

Real Time Smartphone Data for Prediction of Nomophobia Severity using Supervised Machine Learning

10.21467/proceedings.114.11 ◽

2021 ◽

Author(s):

Anshika Arora ◽

Pinaki Chakraborty ◽

M.P.S. Bhatia

Keyword(s):

Machine Learning ◽

Real Time ◽

Undergraduate Students ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Supervised Machine Learning ◽

Support Vector ◽

K Nearest Neighbor ◽

F Measure

Excessive use of smartphones throughout the day having dependency on them for social interaction, entertainment and information retrieval may lead users to develop nomophobia. This makes them feel anxious during non-availability of smartphones. This study describes the usefulness of real time smartphone usage data for prediction of nomophobia severity using machine learning. Data is collected from 141 undergraduate students analyzing their perception about their smartphone using the Nomophobia Questionnaire (NMP-Q) and their real time smartphone usage patterns using a purpose-built android application. Supervised machine learning models including Random Forest, Decision Tree, Support Vector Machines, Naïve Bayes and K-Nearest Neighbor are trained using two features sets where the first feature set comprises only the NMP-Q features and the other comprises real time smartphone usage features along with the NMP-Q features. Performance of these models is evaluated using f-measure and area under ROC and It is observed that all the models perform better when provided with smartphone usage features along with the NMP-Q features. Naïve Bayes outperforms other models in prediction of nomophobia achieving a f-measure value of 0.891 and ROC area value of 0.933.

Download Full-text

A Comparative Study of Support Vector Machine and Naive Bayes Classifier for Sentiment Analysis on Amazon Product Reviews

2020 International Conference on Contemporary Computing and Applications (IC3A) ◽

10.1109/ic3a48958.2020.233300 ◽

2020 ◽

Cited By ~ 1

Author(s):

Sanjay Dey ◽

Sarhan Wasif ◽

Dhiman Sikder Tonmoy ◽

Subrina Sultana ◽

Jayjeet Sarkar ◽

...

Keyword(s):

Support Vector Machine ◽

Comparative Study ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Product Reviews ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Download Full-text

Multiclass Severity Classification for Software Bugs Using Support Vector Machine, K-Nearest Neighbor, Decision Tree and Naïve Bayes

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9348 ◽

2020 ◽

Vol 17 (11) ◽

pp. 5109-5112

Author(s):

Raj Kumar ◽

Sanjay Singla

Keyword(s):

Decision Tree ◽

Software Development ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Data Mining Algorithm ◽

K Nearest Neighbor ◽

Software Bugs ◽

The Impact

During the software development, all most 30–35 present cost is due to the testing. This means that if a bug travels from one phase to succeeding phases without detection, it will definitely increase the cost of the software development and due to this software quality may be compromised. So use of the data mining algorithm for the software bug classification is highly appreciable. Bug severity may be categorised into S1, S2, S3, S4 and S5 categories, depending on the impact of the severity. In this paper, multiclass of bug severity is done using SVM, KNN, Decision Tree and Naïve Bayes. Comparative analysis of these algorithms is done with respect to accuracy, precision, recall and execution time.

Download Full-text

PENERAPAN METODE ENSEMBLE UNTUK MENINGKATKAN KINERJA ALGORITME KLASIFIKASI PADA IMBALANCED DATASET

Jurnal Teknoinfo ◽

10.33365/jti.v13i1.184 ◽

2019 ◽

Vol 13 (1) ◽

pp. 11

Author(s):

Yoga Pristyanto

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Imbalanced Dataset ◽

Adaptive Boosting

Pada bidang data mining sering kali para peneliti tidak memperhatikan keseimbangan distribusi kelas pada dataset. Hal ini dapat menimbulkan kesulitan yang cukup serius pada algoritme klasifikasi. karena secara teori mayoritas classifier mengasumsikan distribusi yang relatif seimbang, sehingga menyebabkan kinerja suatu algoritme klasifikasi menjadi kurang maksimal. Oleh karena itu, pada penelitian ini diterapkan metode ensemble dengan penambahan adaptive boosting untuk menyelesaikan permasalahan tersebut. Dari hasil pengujian yang dilakukan pada penelitian ini, metode ensemble dengan penambahan adaptive boosting dapat meningkatkan nilai kinerja algoritme klasifikasi. Nilai kinerja algoritme Naive Bayes dengan Adaptive Boosting akurasi yang dihasilkan sebesar 91.98%, sensitifitas sebesar 91.98%, spesifisitas sebesar 96.49%, dan g-mean sebesar 94.21%. Nilai kinerja algoritme Support Vector Machine dengan Adaptive Boosting akurasi yang dihasilkan sebesar 91.52%, sensitifitas sebesar 91.52%, spesifisitas sebesar 96.29%, dan g-mean sebesar 93.88%. Sedangkan Nilai kinerja algoritme Decision Tree dengan Adaptive Boosting akurasi yang dihasilkan sebesar 94.37%, sensitifitas sebesar 94.37%, spesifisitas sebesar 97.73%, dan g-mean sebesar 96.03%. Hal ini menunjukkan bahwa metode ensemble dengan Adaptive Boosting dapat menjadi solusi untuk meningkatkan kinerja algoritme pada imbalanced dataset.Kata Kunci: adaptive boosting, data mining, ensemble, ketidakseimbangan kelas, klasifikasi.

Download Full-text

KLASIFIKASI SMS SPAM MENGGUNAKAN SUPPORT VECTOR MACHINE

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.693 ◽

2019 ◽

Vol 15 (2) ◽

pp. 275-280

Author(s):

Agus Setiyono ◽

Hilman F Pardede

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Spam Detection ◽

Support Vector Machine Algorithm ◽

Data Mining Techniques ◽

To Receive

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam. One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.

Download Full-text