Evaluating the Performance of Supervised Classification  Models: Decision Tree and Naïve Bayes Using KNIME

The classification task is to predict the value of the target variable from the values of the input variables. If a target is provided as part of the dataset, then classification is a supervised task. It is important to analysis the performance of supervised classification models before using them in classification task. In our research we would like to propose a novel way to evaluated the performance of supervised classification models like Decision Tree and Naïve Bayes using KNIME Analytics platform. Experiments are conducted on Multi variant dataset consisting 58000 instances, 9 columns associated specially for classification, collected from UCI Machine learning repositories (http://archive.ics.uci.edu/ml/datasets/statlog+(shuttle)) and compared the performance of both the models in terms of Classification Accuracy (CA) and Error Rate. Finally, validated both the models using Metric precision, recall and F-measure. In our finding, we found that Decision tree acquires CA (99.465%) where as Naïve Bayes attain CA (90.358%). The F-measure of Decision tree is 0.984, whereas Naïve Bayes acquire 0.7045.

Download Full-text

COMPARATIVE STUDY OF CLASSIFICATION ALGORITHMS: HOLDOUTS AS ACCURACY ESTIMATION

CogITo Smart Journal ◽

10.31154/cogito.v1i1.2.13-23 ◽

2016 ◽

Vol 1 (1) ◽

pp. 13 ◽

Cited By ~ 1

Author(s):

Debby Erce Sondakh

Keyword(s):

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Decision Rules ◽

Naïve Bayes ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Accuracy Estimation ◽

F Measure

Penelitian ini bertujuan untuk mengukur dan membandingkan kinerja lima algoritma klasifikasi teks berbasis pembelajaran mesin, yaitu decision rules, decision tree, k-nearest neighbor (k-NN), naïve Bayes, dan Support Vector Machine (SVM), menggunakan dokumen teks multi-class. Perbandingan dilakukan pada efektifiatas algoritma, yaitu kemampuan untuk mengklasifikasi dokumen pada kategori yang tepat, menggunakan metode holdout atau percentage split. Ukuran efektifitas yang digunakan adalah precision, recall, F-measure, dan akurasi. Hasil eksperimen menunjukkan bahwa untuk algoritma naïve Bayes, semakin besar persentase dokumen pelatihan semakin tinggi akurasi model yang dihasilkan. Akurasi tertinggi naïve Bayes pada persentase 90/10, SVM pada 80/20, dan decision tree pada 70/30. Hasil eksperimen juga menunjukkan, algoritma naïve Bayes memiliki nilai efektifitas tertinggi di antara lima algoritma yang diuji, dan waktu membangun model klasiifikasi yang tercepat, yaitu 0.02 detik. Algoritma decision tree dapat mengklasifikasi dokumen teks dengan nilai akurasi yang lebih tinggi dibanding SVM, namun waktu membangun modelnya lebih lambat. Dalam hal waktu membangun model, k-NN adalah yang tercepat namun nilai akurasinya kurang.

Download Full-text

A Machine Learning Approach for Improving the Movement of Humanoid NAO’s Gaits

Wireless Communications and Mobile Computing ◽

10.1155/2021/1496364 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Fatmah Abdulrahman Baothman

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Average Velocity ◽

Walking Speed ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Models ◽

Ann Model ◽

The Real ◽

Optimal Average

A humanoid robot’s development requires an incredible combination of interdisciplinary work from engineering to mathematics, software, and machine learning. NAO is a humanoid bipedal robot designed to participate in football competitions against humans by 2050, and speed is crucial for football sports. Therefore, the focus of the paper is on improving NAO speed. This paper is aimed at testing the hypothesis of whether the humanoid NAO walking speed can be improved without changing its physical configuration. The applied research method compares three classification techniques: artificial neural network (ANN), Naïve Bayes, and decision tree to measure and predict NAO’s best walking speed, then select the best method, and enhance it to find the optimal average velocity speed. According to Aldebaran documentation, the real NAO’s robot default walking speed is 9.52 cm/s. The proposed work was initiated by studying NAO hardware platform limitations and selecting Nao’s gait 12 parameters to measure the accuracy metrics implemented in the three classification models design. Five experiments were designed to model and trace the changes for the 12 parameters. The preliminary NAO’s walking datasets open-source available at GitHub, the NAL, and RoboCup datasheets are implemented. All generated gaits’ parameters for both legs and feet in the experiments were recorded using the Choregraphe software. This dataset was divided into 30% for training and 70% for testing each model. The recorded gaits’ parameters were then fed to the three classification models to measure and predict NAO’s walking best speed. After 500 training cycles for the Naïve Bayes, the decision tree, and ANN, the RapidMiner scored 48.20%, 49.87%, and 55.12%, walking metric speed rate, respectively. Next, the emphasis was on enhancing the ANN model to reach the optimal average velocity walking speed for the real NAO. With 12 attributes, the maximum accuracy metric rate of 65.31% was reached with only four hidden layers in 500 training cycles with a 0.5 learning rate for the best walking learning process, and the ANN model predicted the optimal average velocity speed of 51.08% without stiffness: V 1 = 22.62 cm / s , V 2 = 40 cm / s , and V = 30 cm / s . Thus, the tested hypothesis holds with the ANN model scoring the highest accuracy rate for predicting NAO’s robot walking state speed by taking both legs to gauge joint 12 parameter values.

Download Full-text

Pemodelan Prediksi Status Keberlanjutan Polis Asuransi Kendaraan dengan Teknik Pemilihan Mayoritas Menggunakan Algoritma-Algoritma Klasifikasi Data Mining

Prosiding Seminar Nasional Teknoka ◽

10.22236/teknoka.v5i.391 ◽

2020 ◽

Vol 5 ◽

pp. 19-24

Author(s):

Dyah Retno Utari ◽

Arief Wibowo

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Majority Voting ◽

Support Vector ◽

F Measure

Asuransi kendaraan bermotor merupakan jenis usaha pertanggungan terhadap kerugian atau risiko kerusakan yang dapat timbul dari berbagai macam potensi kejadian yang menimpa kendaraan. Persaingan dalam bisnis asuransi khususnya untuk kendaraan bermotor menuntut inovasi dan strategi agar keberlangsungan bisnis tetap terjamin. Salah satu upaya yang dapat dilakukan perusahaan adalah memprediksi status keberlanjutan polis asuransi kendaraan dengan menganalisis data-data profil dan transaksi nasabah. Prediksi terhadap keputusan pemegang polis menjadi sangat penting bagi perusahaan, karena dapat menentukan strategi pemasaran yang mempengaruhi keputusan pelanggan untuk pembaharuan polis asuransi. Penelitian ini telah mengusulkan suatu model prediksi status keberlanjutan polis asuransi kendaraan dengan teknik pemilihan mayoritas dari hasil klasifikasi menggunakan algoritma- algoritma data mining seperti Naive Bayes, Support Vector Machine dan Decision Tree. Hasil pengujian menggunakan confusion matrix menunjukkan nilai akurasi terbaik diperoleh sebesar 93,57%, apapun untuk nilai precision mencapai 97,20%, dan nilai recall sebesar 95,20% serta nilai F-Measure sebesar 95,30%. Nilai evaluasi model terbaik dihasilkan menggunakan pendekatan pemilihan mayoritas (majority voting), mengungguli kinerja model prediksi berbasis pengklasifikasi tunggal.

Download Full-text

Algoritma Naïve Bayes Untuk Klasifikasi Penerima Bantuan Pangan Non Tunai ( Studi Kasus Kelurahan Utama )

Techno Com ◽

10.33633/tc.v18i4.2587 ◽

2019 ◽

Vol 18 (4) ◽

pp. 321-331

Author(s):

Castaka Agus Sugianto ◽

Firdi Rizky Maulana

Keyword(s):

Data Mining ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

T Test ◽

Model Data ◽

F Measure

Kelurahan Utama merupakan instansi pemerintahan di cimahi selatan. Kelurahan utama menjalankan program pemerintah yaitu program Bantuan Pangan Non Tunai, dalam menjalankan program Bantuan Pangan Non Tunai sebagian warga banyak yang mengeluh karena tidak mendapat bantuan, sedangkan ada beberapa warga yang dianggap mampu justru mendapatkan bantuan. Berdasarkan latar belakang tersebut maka penulis melakukan proses pengolahan data menggunakan data mining untuk mengklasifikasi penerima dan bukan penerima bantuan pangan non tunai dengan metode klasifikasi menggunakan Algoritma Naïve Bayes dan Algoritma Decision Tree sebagai pembanding. Diharapkan data yang dihasilkan dari proses data mining bisa menjadi bahan evaluasi untuk pemerintah. Dalam penelitian ini penulis mengklasifikasi data penerima dan bukan penerima bantuan pangan non tunai menggunakan teknik klasifikasi pada data mining menggunakan Algoritma Naïve Bayes dan Algoritma Decision Tree sebagai pembanding. Model data mining di buat menggunakan RapidMiner, dengan hasil nilai Probabilitas untuk class ‘’PENERIMA’’ yaitu 0,481 dengan pembulatan nilai menjadi 0,48 dan nilai Probabilitas untuk class ‘’Bukan Penerima’’ yaitu 0,519 dengan pembulatan nilai menjadi 0,52. Algoritma Naïve Bayes mempunyai tingkat Accuracy sebesar 58,29%, Precision 92,90%, Recall 21,84%, AUC 0,765, F-Measure 34.42%. Sedangkan algoritma Decision Tree mempunyai tingkat Accuracy sebesar 73,97%, Precision 85,04%, Recall 61,92%, AUC 0,746, F-Measure 71,17%. Dalam hasil pengujian T-Test antara Algoritma Naive Bayes dan Algoritma Decision Tree didapat alpha ≤ 0.000, maka dapat disimpulkan pengujian T-Test antara Algoritma Naïve Bayes dan Algoritma Decision Tree hasilnya signifikan.

Download Full-text

Depression Detection with DM v1

10.17504/protocols.io.bzm8p49w ◽

2021 ◽

Author(s):

Umme Marzia Haque

Keyword(s):

Random Forest ◽

Decision Tree ◽

Supervised Learning ◽

Naive Bayes ◽

Naïve Bayes ◽

Learning Models ◽

Target Variable ◽

Correlated Variables ◽

Low Correlation ◽

Depression Detection

The study has used data from YMM. The Yes/No variables that had a low correlation with target variable have been removed. To extract the most relevant features , the high correlated variables with the target variable , the Boruta method was used in conjunction with a Random Forest( RF) Classifier. To select suitable supervised learning models, the Tree-based Pipeline Optimization Tool To select suitable supervised learning models, the Tree-based Pipeline Optimization Tool (TPOTclassifier) has been employed. RF, XGBoost (XGB), Decision Tree (DT), and Gaussian Naive Bayes (GaussianNB) have been employed in the depression identification step.has been employed. RF, XGBoost (XGB), Decision Tree (DT), and Gaussian Naive Bayes (GaussianNB) were employed in the depression identification step.

Download Full-text

Depression Detection Algorithm v1

10.17504/protocols.io.bzm6p49e ◽

2021 ◽

Author(s):

Umme Marzia Haque

Keyword(s):

Random Forest ◽

Decision Tree ◽

Supervised Learning ◽

Naive Bayes ◽

Detection Algorithm ◽

Naïve Bayes ◽

Learning Models ◽

Target Variable ◽

Correlated Variables ◽

Depression Detection

Download Full-text

Diabetic Prediction using Classification Method

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9718.079220 ◽

2020 ◽

Vol 9 (2) ◽

pp. 264-267

Keyword(s):

Diabetes Mellitus ◽

Feature Extraction ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Model ◽

Classification Models ◽

Performance Parameters ◽

Prediction Analysis ◽

Input Dataset

Prediction analysis of diabetes mellitus is the main focus of this work. There are mainly three tasks involved in prediction analysis. These tasks are input dataset, feature extraction and classification. The earlier framework makes use of SVM and naïve bayes approaches for predicting this disease. This study implements voting classifier for prediction purpose. It is an ensemble approach. This classifier combines three classification models. These models are SVM, naïve bayes and decision tree. The implementation of available and new technique is carried out in python tool. These approaches give outcomes in terms of different performance parameters. In contrast to other classification models, proposed classification model performs better.

Download Full-text

KLASIFIKASI SMS SPAM MENGGUNAKAN SUPPORT VECTOR MACHINE

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.693 ◽

2019 ◽

Vol 15 (2) ◽

pp. 275-280

Author(s):

Agus Setiyono ◽

Hilman F Pardede

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Spam Detection ◽

Support Vector Machine Algorithm ◽

Data Mining Techniques ◽

To Receive

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam. One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.

Download Full-text

Identifying Key Fraud Indicators in the Automobile Insurance Industry Using SQL Server Analysis Services

Studia Universitatis Babe-Bolyai Oeconomica ◽

10.2478/subboec-2019-0009 ◽

2019 ◽

Vol 64 (2) ◽

pp. 53-71

Author(s):

Botond Benedek ◽

Ede László

Keyword(s):

Neural Network ◽

Decision Tree ◽

Naive Bayes ◽

Insurance Industry ◽

Naïve Bayes ◽

Sql Server ◽

Categorical Variables ◽

Automobile Insurance ◽

Price Determination ◽

Mining Tool

Abstract Customer segmentation represents a true challenge in the automobile insurance industry, as datasets are large, multidimensional, unbalanced and it also requires a unique price determination based on the risk profile of the customer. Furthermore, the price determination of an insurance policy or the validity of the compensation claim, in most cases must be an instant decision. Therefore, the purpose of this research is to identify an easily usable data mining tool that is capable to identify key automobile insurance fraud indicators, facilitating the segmentation. In addition, the methods used by the tool, should be based primarily on numerical and categorical variables, as there is no well-functioning text mining tool for Central Eastern European languages. Hence, we decided on the SQL Server Analysis Services (SSAS) tool and to compare the performance of the decision tree, neural network and Naïve Bayes methods. The results suggest that decision tree and neural network are more suitable than Naïve Bayes, however the best conclusion can be drawn if we use the decision tree and neural network together.

Download Full-text

Impute, Select, Decision Tree and Naïve Bayes (ISE-DNC): An Ensemble Learning Approach to Classify the Lung Cancer

SSRN Electronic Journal ◽

10.2139/ssrn.3667438 ◽

2020 ◽

Author(s):

Bhanumathi S ◽

Dr. Chandrashekara S N

Keyword(s):

Lung Cancer ◽

Decision Tree ◽

Ensemble Learning ◽

Naive Bayes ◽

Naïve Bayes ◽

Learning Approach

Download Full-text