Perbandingan Kinerja Algoritma Klasifikasi  Naive Bayes, Support Vector Machine (SVM), dan Random Forest untuk Prediksi Ketidakhadiran di Tempat Kerja

Absence is a problem for the company. Absenteeism is defined as a task that is assigned to an individual, but the individual cannot complete the task when he is not present. Absence from work is influenced by many factors, including mismatched working hours, job demand and other factors such as serious accidents / illness, low morale, poor working conditions, boredom, lack of supervision, personal problems, insufficient nutrition, transportation problems, stress, workload, and dissatisfaction. The purpose of this study is to predict absenteeism at work based on the Absenteeism at work dataset obtained from the UCI Machine Learning repository site using the Weka 3.8 application and the Naïve Bayes algorithm, Support Vector Machine (SVM), and Random Forest. In the results of the study, the Random Forest algorithm obtained the highest accuracy, precision, and recall values compared to the Naïve Bayes and SVM algorithms, which resulted in an accuracy value of 99.38%, 99.42% precision and a recall of 99.39%.

Download Full-text

Algorithm Comparation of Naive Bayes and Support Vector Machine based on Particle Swarm Optimization in Sentiment Analysis of Freight Forwarding Services

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i2.1840 ◽

2020 ◽

Vol 4 (2) ◽

pp. 362-369

Author(s):

Sharazita Dyah Anggita ◽

Ikmah

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

The Public ◽

Svm Algorithm ◽

Bayes Algorithm ◽

Freight Forwarding ◽

Improved Accuracy

The needs of the community for freight forwarding are now starting to increase with the marketplace. User opinion about freight forwarding services is currently carried out by the public through many things one of them is social media Twitter. By sentiment analysis, the tendency of an opinion will be able to be seen whether it has a positive or negative tendency. The methods that can be applied to sentiment analysis are the Naive Bayes Algorithm and Support Vector Machine (SVM). This research will implement the two algorithms that are optimized using the PSO algorithms in sentiment analysis. Testing will be done by setting parameters on the PSO in each classifier algorithm. The results of the research that have been done can produce an increase in the accreditation of 15.11% on the optimization of the PSO-based Naive Bayes algorithm. Improved accuracy on the PSO-based SVM algorithm worth 1.74% in the sigmoid kernel.

Download Full-text

Analysis of Sentiment of Moving a National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i3.1942 ◽

2020 ◽

Vol 4 (3) ◽

pp. 504-512

Author(s):

Faried Zamachsari ◽

Gabriel Vangeran Saragih ◽

Susafa'ati ◽

Windu Gata

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Feature Selection ◽

Public Opinion ◽

Naive Bayes ◽

Naïve Bayes ◽

Capital City ◽

Support Vector ◽

National Capital ◽

Bayes Algorithm

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.

Download Full-text

ALGORITMOS DE APRENDIZAGEM DE MÁQUINA E VARIÁVEIS DE SENSORIAMENTO REMOTO PARA O MAPEAMENTO DA CAFEICULTURA

Boletim de Ciências Geodésicas ◽

10.1590/s1982-21702016000400043 ◽

2016 ◽

Vol 22 (4) ◽

pp. 751-773 ◽

Cited By ~ 1

Author(s):

Carolina Gusmão Souza ◽

Luis Carvalho ◽

Polyanne Aguiar ◽

Tássia Borges Arantes

Keyword(s):

Support Vector Machine ◽

Random Forest ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector

A cafeicultura é uma das principais culturas agrícolas do Brasil e realizar o mapeamento e monitoramento desta cultura é fundamental para conhecer sua distribuição espacial. Porém, mapear estas áreas utilizando imagens de Sensoriamento Remoto não é uma tarefa fácil. Sendo assim, este trabalho foi realizado com o objetivo de comparar o uso de diferentes variáveis e algoritmos de classificação para o mapeamento de áreas cafeeiras. O trabalho foi desenvolvido em três áreas diferentes, que são bastante significativas na produção de café. Foram utilizados 5 algoritmos de aprendizagem de máquinas e 7 combinações de variáveis: espectrais, texturais e geométricas, associadas ao processo de classificação. Um total de 105 classificações foram realizadas, 35 classificações para cada uma das áreas. As classificações que não usaram variáveis espectrais não resultaram em bons índices de acurácia. Nas três áreas, o algoritmo que apresentou as melhores acurácias foi o Support vector machine, com acurácia global de 85,33% em Araguari, 87% em Carmo de Minas e 88,33% em Três Pontas. Os piores resultados foram encontrados com o algoritmo Random Forest em Araguari, com acurácia global de 76,66% e com o Naive Bayes em Carmo de Minas e Três Pontas, com 76% e 82% de acerto. Nas três áreas, variáveis texturais, quando associadas às espectrais, melhoraram a acurácia da classificação. O SVM apresentou o melhor desempenho para as três áreas

Download Full-text

Thematic Textual Hadith Classification: An Experiment in Rapidminer using Support Vector Machine (SVM) and Naïve Bayes Algorithm

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2020/262942020 ◽

2020 ◽

Vol 9 (4) ◽

pp. 5967-5972

Author(s):

Norzihani Yusof

Keyword(s):

Support Vector Machine ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Bayes Algorithm

Download Full-text

Comparison of Naïve Bayes, Support Vector Machine, Decision Trees and Random Forest on Sentiment Analysis

Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management ◽

10.5220/0008364105250531 ◽

2019 ◽

Cited By ~ 1

Author(s):

Márcio Guia ◽

Rodrigo Silva ◽

Jorge Bernardino

Keyword(s):

Support Vector Machine ◽

Random Forest ◽

Sentiment Analysis ◽

Decision Trees ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector

Download Full-text

Perbandingan Klassifikasi SMS Berbasis Support Vector Machine, Naive Bayes Classifier, Random Forest dan Bagging Classifier

Jurnal Sisfokom (Sistem Informasi dan Komputer) ◽

10.32736/sisfokom.v10i3.1302 ◽

2021 ◽

Vol 10 (3) ◽

pp. 432-437

Author(s):

Devi Irawan ◽

Eza Budi Perkasa ◽

Yurindra Yurindra ◽

Delpiah Wahyuningsih ◽

Ellya Helmud

Keyword(s):

Support Vector Machine ◽

Random Forest ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Short Message ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Message Service

Short message service (SMS) adalah salah satu media komunikasi yang penting untuk mendukung kecepatan pengunaan ponsel oleh pengguna. Sistem hibrid klasifikasi SMS digunakan untuk mendeteksi sms yang dianggap sampah dan benar. Dalam penelitian ini yang diperlukan adalah mengumpulan dataset SMS, pemilihan fitur, prapemrosesan, pembuatan vektor, melakukan penyaringan dan pembaharuan sistem. Dua jenis klasifikasi SMS pada ponsel saat ini ada yang terdaftar sebagai daftar hitam (ditolak) dan daftar putih (diterima). Penelitian ini menggunakan beberapa algoritma seperti support vector machine, Naïve Bayes classifier, Random Forest dan Bagging Classifier. Tujuan dari penelitian ini adalah untuk menyelesaikan semua masalah SMS yang teridentifikasi spam yang banyak terjadi pada saat ini sehingga dapat memberikan masukan dalam perbandingan metode yang mampu menyaring dan memisahkan sms spam dan sms non spam. Pada penelitian ini menghasilkan bahwa Bagging classifier algorithm ini mendapatkan ferformance score tertinggi dari algoritma yang lain yang dapat dipergunakan sebagai sarana untuk memfiltrasi SMS yang masuk ke dalam inbox pengguna dan Bagging classifier algorithm dapat memberikan hasil filtrasi yang akurat untuk menyaring SMS yang masuk.

Download Full-text

Εξόρυξη γνώσης από αρχεία μεγάλου όγκου δεδομένων υγείας -Big Data- με χρήση υπολογιστικών αλγορίθμων ανάλυσης - Health Analytics

10.12681/eadd/50564 ◽

2021 ◽

Author(s):

Ιωάννης Μήνου

Keyword(s):

Support Vector Machine ◽

Big Data ◽

Random Forest ◽

Cross Validation ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Η μεγαλύτερη πρόκληση των σύγχρονων υπολογιστικών συστημάτων είναι αναμφισβήτητα η αποδοτική αποθήκευση και ανάκτηση πολύ μεγάλου όγκου δεδομένων. Η ανάγκη αυτή έκανε την εμφάνισή της τα τελευταία χρόνια λόγω της έκρηξης δεδομένων που παρατηρείται στο διαδίκτυο και αποκτά ολοένα και μεγαλύτερη σημασία λόγω του πολύ μεγάλου εύρους πληροφοριών που μπορούμε να αντλήσουμε. Ο τομέας της υγειονομικής περίθαλψης και των ιατρικών δεδομένων είναι συνεχώς και ταχέως εξελισσόμενος. Η αξιοποίηση των Big Data στο χώρο της υγείας προσφέρει πολύτιμη πληροφόρηση καθώς παρουσιάζουν απεριόριστες δυνατότητες για αποτελεσματική αποθήκευση, επεξεργασία, sql queries και ανάλυση ιατρικών δεδομένων.Σκοπός της παρούσας διατριβής είναι η μελέτη τεχνικών εξόρυξης γνώσης για δεδομένα μεγάλου όγκου, που αφορούν το πεδίο της Υγείας. Παράλληλα σκοπός της έρευνας είναι η μελέτη στατιστικών και υπολογιστικών αλγορίθμων ανάλυσης μεγάλου όγκου δεδομένων υγείας που έχουν ως αποτέλεσμα την παραγωγή νέας γνώσης καθώς και την εξαγωγή στατιστικά σημαντικής πληροφορίας για τους επαγγελματίες υγείας. Τέλος, η παρούσα διατριβή διερευνά τις γνώσεις των επιστημόνων της Πληροφορικής Υγείας και των επαγγελματιών υγείας σχετικά με τα Big Data.Στην παρούσα διδακτορική διατριβή έγινε βιβλιογραφική ανασκόπηση της έννοιας των Big Data. Η ανασκόπηση αυτή περιλαμβάνει τον ορισμό των Big Data ,τα χαρακτηριστικά τους, τα πλεονεκτήματα και τα μειονεκτήματά τους στο χώρο της υγείας. Στη συνέχεια γίνεται αναφορά στην υλοποίηση και στους μηχανισμούς αποθήκευσης των Big Data. Επιπλέον γίνεται αναφορά στα συστήματα ανάλυσης και επεξεργασίας μεγάλου όγκου δεδομένων, στις γλώσσες προγραμματισμού για Big Data, στην εξόρυξη γνώσης δεδομένων στο χώρο της υγείας. Ακόμη γίνεται αναφορά στη χρήση των Big Data στην Ευρώπη και στον κόσμο. Τέλος παρουσιάζονται οι βασικές αρχές του GDPR καθώς και το πώς σχετίζεται με τα Big Data στο χώρο της υγείας. Επίσης διεξήχθησαν δύο εμπειρικές μελέτες.Η πρώτη μελέτη είχε σαν στόχο την καταγραφή της άποψης των επιστημόνων της Πληροφορικής Υγείας σχετικά με την τεχνολογία των Big Data. Η συλλογή των δεδομένων έγινε με χρήση ερωτηματολογίου. Η στατιστική ανάλυση έδειξε τη θετική ανταπόκριση του δείγματος σχετικά με την τεχνολογία των Big Data.Η δεύτερη μελέτη είχε σαν στόχο την καταγραφή της άποψης των Επαγγελματιών Υγείας σχετικά με την τεχνολογία των Big Data. Η συλλογή των δεδομένων έγινε με χρήση ερωτηματολογίου. Η στατιστική ανάλυση δεν έδωσε επαρκείς απαντήσεις καθώς οι ερωτηθέντες έδειξαν θετική στάση απέναντι στα Big Data ενώ απάντησαν ότι δεν γνωρίζουν πολλά για τη συγκεκριμένη τεχνολογία.Το τελευταίο κομμάτι της διατριβής περιλαμβάνει την ανάπτυξη μεθόδων πρόβλεψης για την δυνατότητα διάγνωσης των ασθενών με καρδιαγγειακά νοσήματα. Οι μέθοδοι πρόβλεψης που χρησιμοποιήθηκαν είναι: Λογιστική Παλινδρόμηση, Naive Bayes Classifier, Δένδρα αποφάσεων, Αλγόριθμος Κ κοντινότερων γειτόνων, Αλγόριθμος SVM (Support Vector Machine) και Random Forest. Η ανάπτυξη περιλάμβανε όλα τα στάδια προεπεξεργασίας των δεδομένων ενώ χρησιμοποιήθηκαν συγκεκριμένες μετρικές για τη μέτρηση της απόδοσης των κατηγοριοποιητών. Τέλος έγιναν βελτιώσεις της απόδοσης των κατηγοριοποιητών χρησιμοποιώντας διασταυρωτική επαλήθευση με την μέθοδο cross-validation ενώ επιλύθηκε και το πρόβλημα της ανισορροπίας των κλάσεων χρησιμοποιώντας τη μέθοδο SMOTE.

Download Full-text

Prediction of Breast Cancer Using Machine Learning

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190617160834 ◽

2020 ◽

Vol 13 (5) ◽

pp. 901-908

Author(s):

Somil Jain ◽

Puneet Kumar

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Prediction Accuracy ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Classification Algorithms ◽

Breast Cancer Dataset

Background:: Breast cancer is one of the diseases which cause number of deaths ever year across the globe, early detection and diagnosis of such type of disease is a challenging task in order to reduce the number of deaths. Now a days various techniques of machine learning and data mining are used for medical diagnosis which has proven there metal by which prediction can be done for the chronic diseases like cancer which can save the life’s of the patients suffering from such type of disease. The major concern of this study is to find the prediction accuracy of the classification algorithms like Support Vector Machine, J48, Naïve Bayes and Random Forest and to suggest the best algorithm. Objective:: The objective of this study is to assess the prediction accuracy of the classification algorithms in terms of efficiency and effectiveness. Methods: This paper provides a detailed analysis of the classification algorithms like Support Vector Machine, J48, Naïve Bayes and Random Forest in terms of their prediction accuracy by applying 10 fold cross validation technique on the Wisconsin Diagnostic Breast Cancer dataset using WEKA open source tool. Results:: The result of this study states that Support Vector Machine has achieved the highest prediction accuracy of 97.89 % with low error rate of 0.14%. Conclusion:: This paper provides a clear view over the performance of the classification algorithms in terms of their predicting ability which provides a helping hand to the medical practitioners to diagnose the chronic disease like breast cancer effectively.

Download Full-text