Sentiment Analysis of Tweets Using Naïve Bayes, KNN, and Decision Tree

Kadda Zerrouki; Reda Mohamed Hamou; Abdellatif Rahmoun

doi:10.4018/ijoci.2020100103

Sentiment Analysis of Tweets Using Naïve Bayes, KNN, and Decision Tree

International Journal of Organizational and Collective Intelligence ◽

10.4018/ijoci.2020100103 ◽

2020 ◽

Vol 10 (4) ◽

pp. 35-49

Author(s):

Kadda Zerrouki ◽

Reda Mohamed Hamou ◽

Abdellatif Rahmoun

Keyword(s):

Decision Tree ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

K Nearest Neighbor ◽

Use Of Social Media ◽

The Masses

Making use of social media for analyzing the perceptions of the masses over a product, event, or a person has gained momentum in recent times. Out of a wide array of social networks, the authors chose Twitter for their analysis as the opinions expressed there are concise and bear a distinctive polarity. Sentiment analysis is an approach to analyze data and retrieve sentiment that it embodies. The paper elaborately discusses three supervised machine learning algorithms—naïve bayes, k-nearest neighbor (KNN), and decision tree—and compares their overall accuracy, precision, as well as recall values, f-measure, number of tweets correctly classified, number of tweets incorrectly classified, and execution time.

Download Full-text

Sentiment Analysis about E-Commerce from Tweets Using Decision Tree, K-Nearest Neighbor, and Naïve Bayes

2018 International Conference on Orange Technologies (ICOT) ◽

10.1109/icot.2018.8705796 ◽

2018 ◽

Cited By ~ 2

Author(s):

Achmad Bayhaqy ◽

Sfenrianto Sfenrianto ◽

Kaman Nainggolan ◽

Emil R. Kaburuan

Keyword(s):

Decision Tree ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

K Nearest Neighbor

Download Full-text

Sentiment Analysis About Indonesian Lawyers Club Television Program Using K-Nearest Neighbor, Naïve Bayes Classifier, And Decision Tree

International Journal of New Media Technology ◽

10.31937/ijnmt.v8i1.1965 ◽

2021 ◽

Vol 8 (1) ◽

pp. 50-56

Author(s):

Nico Nathanael Wilim ◽

Raymond Sunardi Oetama

Keyword(s):

Decision Tree ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

K Nearest Neighbor ◽

Naïve Bayes Classifier ◽

Talk Show

Indonesia Lawyers Club (ILC) is a talk show on TVOne that discusses topics around public phenomena, legal issues, crime, and other similar topics. In 2018, ILC won the Panasonic Gobel Awards as the best news talk show program. But in 2019, ILC failed to win the award which was won by Mata Najwa which featured a talk show event that appeared on Trans7. As one of the television shows that has won awards, ILC has pros and cons for its shows from the public. This study applies a sentiment analysis approach to examine public opinion on Twitter about Mata Najwa and ILC in 2018 and 2019. This study applies K-Nearest Neighbor, Naïve Bayes Classifier, and Decision Tree classification algorithm to validate the result. The contribution of this study is to show that public opinion on Twitter can be examined to figure out community sentiment on a tv talk show as well as to confirm the Award winner of tv Talkshow. Index Terms—datamining; Decision Tree; K-NN; Naïve Bayes Classifier; sentiment analysis

Download Full-text

Sentiment Analysis System for Myanmar News using K Nearest Neighbor and Naïve Bayes

Proceedings of 2020 the 10th International Workshop on Computer Science and Engineering ◽

10.18178/wcse.2020.02.001 ◽

2020 ◽

Keyword(s):

Sentiment Analysis ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

Analysis System

Download Full-text

Centroid Based Classifier With TF – IDF – ICF for Classfication of Student’s Complaint at Appliation E-Complaint in Muhammadiyah University of Sidoarjo

JEEE-U (Journal of Electrical and Electronic Engineering-UMSIDA) ◽

10.21070/jeee-u.v1i1.23 ◽

2016 ◽

Vol 1 (1) ◽

pp. 17 ◽

Cited By ~ 1

Author(s):

Mochamad Alfan Rosid ◽

Gunawan Gunawan ◽

Edwin Pramana

Keyword(s):

Text Mining ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

Base Classifier

Text mining mengacu pada proses mengambil informasi berkualitas tinggi dari teks. Informasi berkualitas tinggi biasanya diperoleh melalui peramalan pola dan kecenderungan melalui sarana seperti pembelajaran pola statistik. Salah satu kegiatan penting dalam text mining adalah klasifikasi atau kategorisasi teks. Kategorisasi teks sendiri saat ini memiliki berbagai metode antara lain metode K-Nearest Neighbor, Naïve Bayes, dan Centroid Base Classifier, atau decision tree classification.Pada penelitian ini, klasifikasi keluhan mahasiswa dilakukan dengan metode centroid based classifier dan dengan fitur TF-IDF-ICF, Ada lima tahap yang dilakukan untuk mendapatkan hasil klasifikasi. Tahap pengambilan data keluhan kemudian dilanjutkan dengan tahap preprosesing yaitu mempersiapkan data yang tidak terstruktur sehingga siap digunakan untuk proses selanjutnya, kemudian dilanjutkan dengan proses pembagian data, data dibagi menjadi dua macam yaitu data latih dan data uji, tahap selanjutnya yaitu tahap pelatihan untuk menghasilkan model klasifikasi dan tahap terakhir adalah tahap pengujian yaitu menguji model klasifikasi yang telah dibuat pada tahap pelatihan terhadap data uji. Keluhan untuk pengujian akan diambilkan dari database aplikasi e-complaint Universitas Muhammadiyah Sidoarjo. Adapun hasil uji coba menunjukkan bahwa klasifikasi keluhan dengan algoritma centroid based classifier dan dengan fitur TF-IDF-ICF memiliki rata-rata akurasi yang cukup tinggi yaitu 79.5%. Nilai akurasi akan meningkat dengan meningkatnya data latih dan efesiensi sistem semakin menurun dengan meningkatnya data latih.

Download Full-text

Comparative analysis on bayesian classification for breast cancer problem

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v8i4.1628 ◽

2019 ◽

Vol 8 (4) ◽

Author(s):

Wan Nor Liyana Wan Hassan Ibeni ◽

Mohd Zaki Mohd Salikon ◽

Aida Mustapha ◽

Saiful Adli Daud ◽

Mohd Najib Mohd Salleh

Keyword(s):

Breast Cancer ◽

Bayesian Networks ◽

Nearest Neighbor ◽

Naive Bayes ◽

Likelihood Estimation ◽

Predictive Distribution ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor

The problem of imbalanced class distribution or small datasets is quite frequent in certain fields especially in medical domain. However, the classical Naive Bayes approach in dealing with uncertainties within medical datasets face with the difficulties in selecting prior distributions, whereby parameter estimation such as the maximum likelihood estimation (MLE) and maximum a posteriori (MAP) often hurt the accuracy of predictions. This paper presents the full Bayesian approach to assess the predictive distribution of all classes using three classifiers; naïve bayes (NB), bayesian networks (BN), and tree augmented naïve bayes (TAN) with three datasets; Breast cancer, breast cancer wisconsin, and breast tissue dataset. Next, the prediction accuracies of bayesian approaches are also compared with three standard machine learning algorithms from the literature; K-nearest neighbor (K-NN), support vector machine (SVM), and decision tree (DT). The results showed that the best performance was the bayesian networks (BN) algorithm with accuracy of 97.281%. The results are hoped to provide as base comparison for further research on breast cancer detection. All experiments are conducted in WEKA data mining tool.

Download Full-text

PERBANDINGAN ALGORITMA K-NEAREST NEIGHBOR, DECISION TREE, DAN NAIVE BAYES UNTUK MENENTUKAN KELAYAKAN PEMBERIAN KREDIT

Infotech: Journal of Technology Information ◽

10.37365/jti.v7i1.104 ◽

2021 ◽

Vol 7 (1) ◽

pp. 35-40

Author(s):

Tupan Tri Muryono ◽

Ahmad Taufik ◽

Irwansyah Irwansyah

Keyword(s):

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Home Ownership ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

Status Number ◽

Credit Analysis ◽

Credit Granting ◽

Loan Amount

The banking world in terms of providing credit to customers is a regular activity that has a large effect. In its application, non-performing loans or bad loans are often created due to poor credit analysis in the credit granting process, or from bad customers. The purpose of this study is to compare the results of algorithm accuracy between K-Nearest Neighbor (K-NN), Decision Tree, and Naive Bayes which results in the best accuracy will be implemented to determine creditworthiness. The attributes used in this study consisted of 11 attributes, namely marital status, number of dependents, age, last education, occupation, monthly income, home ownership, collateral, loan amount, length of loan and information as result attributes. The methods used in this research are K-Nearest Neighbor, Decision Tree, and Naive Bayes. From the results of evaluation and validation using k-5 fold that has been carried out using RapidMiner tools, the highest accuracy results from a comparison of 3 algorithms is using a decision tree (C4.5) of 98% in the 3rd test.

Download Full-text

COMPARATIVE STUDY OF CLASSIFICATION ALGORITHMS: HOLDOUTS AS ACCURACY ESTIMATION

CogITo Smart Journal ◽

10.31154/cogito.v1i1.2.13-23 ◽

2016 ◽

Vol 1 (1) ◽

pp. 13 ◽

Cited By ~ 1

Author(s):

Debby Erce Sondakh

Keyword(s):

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Decision Rules ◽

Naïve Bayes ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Accuracy Estimation ◽

F Measure

Penelitian ini bertujuan untuk mengukur dan membandingkan kinerja lima algoritma klasifikasi teks berbasis pembelajaran mesin, yaitu decision rules, decision tree, k-nearest neighbor (k-NN), naïve Bayes, dan Support Vector Machine (SVM), menggunakan dokumen teks multi-class. Perbandingan dilakukan pada efektifiatas algoritma, yaitu kemampuan untuk mengklasifikasi dokumen pada kategori yang tepat, menggunakan metode holdout atau percentage split. Ukuran efektifitas yang digunakan adalah precision, recall, F-measure, dan akurasi. Hasil eksperimen menunjukkan bahwa untuk algoritma naïve Bayes, semakin besar persentase dokumen pelatihan semakin tinggi akurasi model yang dihasilkan. Akurasi tertinggi naïve Bayes pada persentase 90/10, SVM pada 80/20, dan decision tree pada 70/30. Hasil eksperimen juga menunjukkan, algoritma naïve Bayes memiliki nilai efektifitas tertinggi di antara lima algoritma yang diuji, dan waktu membangun model klasiifikasi yang tercepat, yaitu 0.02 detik. Algoritma decision tree dapat mengklasifikasi dokumen teks dengan nilai akurasi yang lebih tinggi dibanding SVM, namun waktu membangun modelnya lebih lambat. Dalam hal waktu membangun model, k-NN adalah yang tercepat namun nilai akurasinya kurang.

Download Full-text

Analisis Komparatif Evaluasi Performa Algoritma Klasifikasi pada Readmisi Pasien Diabetes

Jurnal Buana Informatika ◽

10.24002/jbi.v7i4.770 ◽

2016 ◽

Vol 7 (4) ◽

Author(s):

Mochammad Yusa ◽

Ema Utami ◽

Emha T. Luthfi

Keyword(s):

Data Mining ◽

Decision Tree ◽

Cross Validation ◽

Nearest Neighbor ◽

Naive Bayes ◽

Kappa Statistic ◽

Naïve Bayes ◽

Validation Dataset ◽

K Nearest Neighbor ◽

Fold Cross Validation

Abstract. Readmission is associated with quality measures on patients in hospitals. Different attributes related to diabetic patients such as medication, ethnicity, race, lifestyle, age, and others result in the calculation of quality care that tends to be complicated. Classification techniques of data mining can solve this problem. In this paper, the evaluation on three different classifiers, i.e. Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes with various settingparameter, is developed by using 10-Fold Cross Validation technique. The targets of parameter performance evaluated is based on term of Accuracy, Mean Absolute Error (MAE), dan Kappa Statistic. The selected dataset consists of 47 attributes and 49.735 records. The result shows that k-NN classifier with k=100 has a better performance in term of accuracy and Kappa Statistic, but Naive Bayes outperforms in term of MAE among other classifiers. Keywords: k-NN, naive bayes, diabetes, readmissionAbstrak. Proses Readmisi dikaitkan dengan perhitungan kualitas penanganan pasien di rumah sakit. Perbedaan atribut-atribut yang berhubungan dengan pasien diabetes proses medikasi, etnis, ras, gaya hidup, umur, dan lain-lain, mengakibatkan perhitungan kualitas cenderung rumit. Teknik klasifikasi data mining dapat menjadi solusi dalam perhitungan kualitas ini. Teknik klasifikasi merupakan salah satu teknik data mining yang perkembangannya cukup signifikan. Di dalam penelitian ini, model algoritma klasifikasi Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes dengan berbagai parameter setting akan dievaluasi performanya berdasarkan nilai performa Accuracy, Mean AbsoluteError (MAE), dan Kappa Statistik dengan metode 10-Fold Cross Validation. Dataset yang dievaluasi memiliki 47 atribut dengan 49.735 records. Hasil penelitian menunjukan bahwa performa accuracy, MAE, dan Kappa Statistik terbaik didapatkan dari Model Algoritma Naive Bayes.Kata Kunci: k-NN, naive bayes, diabetes, readmisi

Download Full-text

Analisis Perbandingan Kinerja Algoritma Naïve Bayes, Decision Tree-J48 dan Lazy-IBK

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i3.3055 ◽

2021 ◽

Vol 5 (3) ◽

pp. 1038

Author(s):

Indra Rukmana ◽

Arvin Rasheda ◽

Faiz Fathulhuda ◽

Muh Rizky Cahyadi ◽

Fitriyani Fitriyani

Keyword(s):

Breast Cancer ◽

Decision Tree ◽

Thoracic Surgery ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Breast Cancer Dataset ◽

Decision Tree Algorithm ◽

K Nearest Neighbor ◽

Cancer Dataset

This research is focused on knowing the performance of the classification algorithms, namely Naïve Bayes, Decision Tree-J48 and K-Nearest Neighbor. The speed and the percentage of accuracy in this study are the benchmarks for the performance of the algorithm. This study uses the Breast Cancer and Thoracic Surgery dataset, which is downloaded on the UCI Machine Learning Repository website. Using the help of Weka software Version 3.8.5 to find out the classification algorithm testing. The results show that the J-48 Decision Tree algorithm has the best accuracy, namely 75.6% in the cross-validation test mode for the Breast Cancer dataset and 84.5% for the Thoracic Surgery dataset.

Download Full-text

KOMPARASI METODE KLASIFIKASI PADA ANALISIS SENTIMEN USAHA WARALABA BERDASARKAN DATA TWITTER

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.752 ◽

2019 ◽

Vol 15 (2) ◽

pp. 267-274

Author(s):

Tati Mardiana ◽

Hafiz Syahreva ◽

Tuslaela Tuslaela

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Support Vector ◽

K Nearest Neighbor

Saat ini usaha waralaba di Indonesia memiliki daya tarik yang relatif tinggi. Namun, para pelaku usaha banyak juga yang mengalami kegagalan. Bagi seseorang yang ingin memulai usaha perlu mempertimbangkan sentimen masyarakat terhadap usaha waralaba. Meskipun demikian, tidak mudah untuk melakukan analisis sentimen karena banyaknya jumlah percakapan di Twitter terkait usaha waralaba dan tidak terstruktur. Tujuan penelitian ini adalah melakukan komparasi akurasi metode Neural Network, K-Nearest Neighbor, Naïve Bayes, Support Vector Machine, dan Decision Tree dalam mengekstraksi atribut pada dokumen atau teks yang berisi komentar untuk mengetahui ekspresi didalamnya dan mengklasifikasikan menjadi komentar positif dan negatif. Penelitian ini menggunakan data realtime dari tweets pada Twitter. Selanjutnya mengolah data tersebut dengan terlebih dulu membersihkannya dari noise dengan menggunakan Phyton. Hasil pengujian dengan confusion matrix diperoleh nilai akurasi Neural Network sebesar 83%, K-Nearest Neighbor sebesar 52%, Support Vector Machine sebesar 83%, dan Decision Tree sebesar 81%. Penelitian ini menunjukkan metode Support Vector Machine dan Neural Network paling baik untuk mengklasifikasikan komentar positif dan negatif terkait usaha waralaba.

Download Full-text