Sentiment Analysis using Naive Bayes Classifier and Information Gain Feature Selection over Twitter

Author(s):  
Manjit Singh ◽  
Swati Gupta
2020 ◽  
Vol 10 (2) ◽  
pp. 157
Author(s):  
Siti Khomsah

<p class="JGI-AbstractIsi">Feature extraction plays an important role in the sentiment analysis process, especially of text data. The Naive Bayes Classifier performs well on low feature dimensions. However, the accuracy provided is not optimal. To acquire  optimal machine learning model,  information gain method, evolutionary algorithm, and swarm intelligent algorithm are applied. The objective of this study is to determine the performance of the Particle Swarm Optimization (PSO) to optimize the Naive Bayes Classifier. Vectorization of words is carried out using TF-IDF. In order to produce high PSO performance, the PSO-NBC model is tested with several parameters, namely the number of particles (k = 3), setting of the number of iterations and inertia weight, individual intelligence coefficient (c1 = 1), and social intelligence coefficient (c2 = 2). Inert weight is calculated using the formulation (w = 0.5+ Rand ([- 1,1])). In conclusion, PSO is able to solve the problem space of text-based sentiment analysis. PSO is able to optimize the accuracy of Naive Bayes at a value of 89% to 91.76%. PSO performance is determined by the parameters used, especially the number of particles, the number of iterations, and the weight of inertia. A large number of particles accompanied by an increase in inertia weight can increase accuracy. The number of particles 20-30 has reached the optimal accuracy.</p>


2019 ◽  
Vol 15 (2) ◽  
pp. 211-218
Author(s):  
Bobby Suryo Prakoso ◽  
Didi Rosiyadi ◽  
Dedi Aridarma ◽  
Heru Sukma Utama ◽  
Fariz Fauzi ◽  
...  

Penelitian ini adalah tentang pengklasifikasian berita yang mengoptimalisasi dengan kombinasi antar algoritma. Tentang dataset yang digunakan diambil pada situs pemberitaan online. Algoritma yang digunakan adalah algoritma Naive Bayes Classifier, dan Random Forest dengan pembobotan seleksi fitur Information Gain. Dataset yang digunakan terdapat 615 dataset dengan 3 katagori atau tema berita. Dalam permodelan terdapat 6 model skenario sebagai pembanding untuk menentukan skenario mana yang mendapatkan nilai terbaik, berdasarkan hasil penelitian ini nilai terbaik didapatkan oleh model Remove Useless Attributes, Naive bayes Classifier-Multinomial, dan Random Forest-Feature Selection Information gain. Hasil evaluasi yang didapatkan adalah nilai accuracy 85.67%, nilai recall 85.67%, dan nilai precision 86.23


2014 ◽  
Vol 3 (3) ◽  
pp. 92 ◽  
Author(s):  
JUEN LING ◽  
I PUTU EKA N. KENCANA ◽  
TJOKORDA BAGUS OKA

Sentiment analysis is the computational study of opinions, sentiments, and emotions expressed in texts. The basic task of sentiment analysis is to classify the polarity of the existing texts in documents, sentences, or opinions. Polarity has meaning if there is text in the document, sentence, or the opinion has a positive or negative aspect. In this study, classification of the polarity in sentiment analysis using machine learning techniques, that is Naïve Bayes classifier. Criteria for text classification decisions, learned automatically from learning the data. The need for manual classification is still required because training the data derived from manually labeling, the label (feature) refers to the process of adding a description of each data according to its category. In the process of labeling, feature selection is used and performed by chi-square feature selection, to reduce the disturbance (noise) in the classification. The results showed that the frequency of occurrences of the expected features in the true category and in the false category have an important role in the chi-square feature selection. Then classification breaking news by Naïve Bayes classifier obtained an accuracy of 83% and a harmonic average of 90.713%.


2020 ◽  
Vol 16 (1) ◽  
pp. 123-128
Author(s):  
Dinda Ayu Muthia

The closure of illegal movie streaming sites IndoXXI has been a trending topic on Twitter at the end of 2019. The reaction of netizens on Twitter shows positive and negative sentiments. Until now, there have been many studies in the field of Sentiment Analysis using data in the form of Tweets from Twitter users. In sentiment analysis research, there are so many method used, and Naïve Bayes is one of it, because it is very simple and efficient. The method has advantages and disadvantages. Naïve Bayes is so sensitive in feature selection. Too many features not only increase calculation time but also reduce classification accuracy. In order to solve the disadvantages and increase the performance of the Naïve Bayes classifier, this method often being combined with many kind of feature selection methods. This research aims to classify tweets into positive and negative using the Naïve Bayes classifier combined with the Genetic Algorithm. The accuracy of Naïve Bayes before using the combination of feature selection methods reaches 79.55%. While after using feature selection methods, which is the Genetic Algorithm, accuracy increased up to 88.64%. The accuracy improved by up to 9.09%.


2020 ◽  
Vol 10 (2) ◽  
pp. 157-168
Author(s):  
Siti Khomsah

Feature extraction plays an important role in the sentiment analysis process, especially of text data. The Naive Bayes Classifier performs well on low feature dimensions. However, the accuracy provided is not optimal. To acquire  optimal machine learning model,  information gain method, evolutionary algorithm, and swarm intelligent algorithm are applied. The objective of this study is to determine the performance of the Particle Swarm Optimization (PSO) to optimize the Naive Bayes Classifier. Vectorization of words is carried out using TF-IDF. In order to produce high PSO performance, the PSO-NBC model is tested with several parameters, namely the number of particles (k = 3), setting of the number of iterations and inertia weight, individual intelligence coefficient (c1 = 1), and social intelligence coefficient (c2 = 2). Inert weight is calculated using the formulation (w = 0.5+ Rand ([- 1,1])). In conclusion, PSO is able to solve the problem space of text-based sentiment analysis. PSO is able to optimize the accuracy of Naive Bayes at a value of 89% to 91.76%. PSO performance is determined by the parameters used, especially the number of particles, the number of iterations, and the weight of inertia. A large number of particles accompanied by an increase in inertia weight can increase accuracy. The number of particles 20-30 has reached the optimal accuracy.


Sign in / Sign up

Export Citation Format

Share Document