Twitter Sentiment Analysis Using an Ensemble Majority Vote Classifier

Twitter social media data generally uses ambiguous text that can cause difficulty in identifying positive or negative sentiments. There are more than one billion social media messages that need to be stored in a proper database and processed correctly to analyze them. In this paper, an ensemble majority vote classifier to enhance sentiment classification performance and accuracy is proposed. The proposed classification model is combined with four classifiers, using varying techniques—naive Bayes, decision trees, multilayer perceptron and logistic regression—to form a single ensemble classifier. In addition to these, a comparison is drawn among the four classifiers to evaluate the performance of the individual classifiers. The result shows that in terms of an individual classifier, the naive Bayes classifier is optimal as compared to the others. However, for comparing the proposed ensemble majority vote classifier with the four individual classifiers, the result illustrates that the performance of the proposed classifier is better than the independent one.

Download Full-text

Analisis Sentimen dan Pemodelan Topik Pariwisata Lombok Menggunakan Algoritma Naive Bayes dan Latent Dirichlet Allocation

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i1.2587 ◽

2021 ◽

Vol 5 (1) ◽

pp. 123-131

Author(s):

Ni Luh Putu Merawati Putu ◽

Ahmad Zuli Amrullah ◽

Ismarmiaty

Keyword(s):

Social Media ◽

Latent Dirichlet Allocation ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Model ◽

Bayes Method ◽

A Value ◽

Negative Class ◽

Naive Bayes Method ◽

Dirichlet Allocation

Lombok Island is one of the favorite tourist destinations. Various topics and comments about Lombok tourism experience through social media accounts are difficult to manually identify public sentiments and topics. The opinion expressed by tourists through social media is interesting for further research. This study aims to classify tourists' opinions into two classes, positive and negative, and topics modelling by using the Naive Bayes method and modeling the topic by using Latent Dirichlet Allocation (LDA). The stages of this research include data collection, data cleaning, data transformation, data classification. The results performance testing of the classification model using Naive Bayes method is shown with an accuracy value of 92%, precision of 100%, recall of 84% and specificity of 100%. The results of modeling topics using LDA in each positive and negative class from the coherence value shows the highest value for the positive class was obtained on the 8th topic with a value of 0.613 and for the negative class on the 12th topic with a value of 0.528. The use of the Naive Bayes and LDA algorithms is considered effective for analyzing the sentiment and topic modelling for Lombok tourism.

Download Full-text

Sentiment Analysis of Perpustakaan Nasional Republik Indonesia Through Social Media Twitter

MATICS ◽

10.18860/mat.v12i1.8973 ◽

2020 ◽

Vol 12 (1) ◽

pp. 90

Author(s):

Fakhris Khusnu Reza Mahfud

Keyword(s):

Social Media ◽

Public Opinion ◽

Sentiment Analysis ◽

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

User Needs ◽

Twitter Data ◽

Knn Classification ◽

National Libraries

The library is a gate of science and a heart of civilization. Indonesia already has a Perpustakaan Nasional consisted of 27 floors and is equipped with facilities that are adequate for user needs. Apart from that, we need to see opinions from the community as users. Public opinion about the library is critical for library managers to evaluate services and facilities from the library. One way to find out the views of the community is by using social media twitter. Twitter social media is often used in channelling opinions or expressing opinions about specific topics; besides social media, twitter is commonly used for digital campaign movements. Submission of views and even digital campaigns on Twitter social media greatly influence the opinions and even behaviour of society in various ways. This study analyzes tweets about national libraries by classifying, positive opinions, negative opinions and neutral opinions. In this study, twitter data will go through the preprocessing, weighting, and classification stages. TF-IDF and TF binary are used in weighting in this study. The classification used in this study is Naive Bayes and KNN. Accuracy, precision, and recall values were also used in this study to evaluate classification performance. The highest classification performance using KNN classification with TF-IDF weighting resulted in the value of accuracy, precision, and recall of 83.33%, 79.2%, and 83.3% respectively.

Download Full-text

Sentimen Analisis Komentar Toxic pada Grup Facebook Game Online Menggunakan Klasifikasi Naïve Bayes

Jurnal Informatika Universitas Pamulang ◽

10.32493/informatika.v5i3.6571 ◽

2020 ◽

Vol 5 (3) ◽

pp. 356

Author(s):

Renaldy Permana Sidiq ◽

Budi Arif Dermawan ◽

Yuyun Umaidah

Keyword(s):

Social Media ◽

Feature Selection ◽

Naive Bayes ◽

Information Gain ◽

Text Processing ◽

Confusion Matrix ◽

Naïve Bayes ◽

Classification Model ◽

Testing Data ◽

F Measure

Toxic comments are comments made by social media users that contain expressions of hatred, condescension, threatening, and insulting. Social media users who are on average still teenagers with a nature that still cannot be controlled completely becomes a matter of great concern when they comment, their comments can be studied as text processing. Sentiment analysis can be used as a solution to identifying toxic comments by dividing them into two classifications. Where the data used amounted to 1,500 taken from social media Facebook in the private group Arena of Valor community. The dataset is divided into 2 classes: toxic and non-toxic. This research uses Naive Bayes with TF-IDF transformation and Information Gain feature selection and use distribution ratio 80:20. It will be compared the results of the evaluation where Naive Bayes without transformation, using TF-IDF transformation, and TF-IDF using Information Gain feature selection. The results of the comparison of evaluations from confusion matrix that have been carried out obtained the best classification model is to use the ratio of training and testing data 80:20 with TF-IDF transformation resulting in an accuracy of 75%, precision of 63%, recall of 67%, and F-measure of 64%.

Download Full-text

Analysis of Sentiment of Moving a National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i3.1942 ◽

2020 ◽

Vol 4 (3) ◽

pp. 504-512

Author(s):

Faried Zamachsari ◽

Gabriel Vangeran Saragih ◽

Susafa'ati ◽

Windu Gata

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Feature Selection ◽

Public Opinion ◽

Naive Bayes ◽

Naïve Bayes ◽

Capital City ◽

Support Vector ◽

National Capital ◽

Bayes Algorithm

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.

Download Full-text

Children’s Activity Classification for Domestic Risk Scenarios Using Environmental Sound and a Bayesian Network

Healthcare ◽

10.3390/healthcare9070884 ◽

2021 ◽

Vol 9 (7) ◽

pp. 884

Author(s):

Antonio García-Domínguez ◽

Carlos E. Galván-Tejada ◽

Ramón F. Brena ◽

Antonio A. Aguileta ◽

Jorge I. Galván-Tejada ◽

...

Keyword(s):

Feature Selection ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Model ◽

Activity Classification ◽

Environmental Sound ◽

Non Invasive ◽

Akaike Criterion ◽

Data Source ◽

Feature Selection Techniques

Children’s healthcare is a relevant issue, especially the prevention of domestic accidents, since it has even been defined as a global health problem. Children’s activity classification generally uses sensors embedded in children’s clothing, which can lead to erroneous measurements for possible damage or mishandling. Having a non-invasive data source for a children’s activity classification model provides reliability to the monitoring system where it is applied. This work proposes the use of environmental sound as a data source for the generation of children’s activity classification models, implementing feature selection methods and classification techniques based on Bayesian networks, focused on the recognition of potentially triggering activities of domestic accidents, applicable in child monitoring systems. Two feature selection techniques were used: the Akaike criterion and genetic algorithms. Likewise, models were generated using three classifiers: naive Bayes, semi-naive Bayes and tree-augmented naive Bayes. The generated models, combining the methods of feature selection and the classifiers used, present accuracy of greater than 97% for most of them, with which we can conclude the efficiency of the proposal of the present work in the recognition of potentially detonating activities of domestic accidents.

Download Full-text

Structure Extension of Tree-Augmented Naive Bayes

Entropy ◽

10.3390/e21080721 ◽

2019 ◽

Vol 21 (8) ◽

pp. 721 ◽

Cited By ~ 1

Author(s):

YuGuang Long ◽

LiMin Wang ◽

MingHui Sun

Keyword(s):

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Training Data ◽

Conditional Probability Distribution ◽

Independence Assumption ◽

Bayesian Network Classifiers ◽

Leibler Divergence ◽

The Difference ◽

Structure Extension

Due to the simplicity and competitive classification performance of the naive Bayes (NB), researchers have proposed many approaches to improve NB by weakening its attribute independence assumption. Through the theoretical analysis of Kullback–Leibler divergence, the difference between NB and its variations lies in different orders of conditional mutual information represented by these augmenting edges in the tree-shaped network structure. In this paper, we propose to relax the independence assumption by further generalizing tree-augmented naive Bayes (TAN) from 1-dependence Bayesian network classifiers (BNC) to arbitrary k-dependence. Sub-models of TAN that are built to respectively represent specific conditional dependence relationships may “best match” the conditional probability distribution over the training data. Extensive experimental results reveal that the proposed algorithm achieves bias-variance trade-off and substantially better generalization performance than state-of-the-art classifiers such as logistic regression.

Download Full-text

Multiple Naïve Bayes Classifiers Ensemble for Traffic Incident Detection

Mathematical Problems in Engineering ◽

10.1155/2014/383671 ◽

2014 ◽

Vol 2014 ◽

pp. 1-16 ◽

Cited By ~ 7

Author(s):

Qingchao Liu ◽

Jian Lu ◽

Shuyan Chen ◽

Kangjia Zhao

Keyword(s):

Decision Tree ◽

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Classifier Ensemble ◽

Optimal Threshold ◽

Incident Detection ◽

Bayes Classifier ◽

Traffic Incident ◽

Better Than

This study presents the applicability of the Naïve Bayes classifier ensemble for traffic incident detection. The standard Naive Bayes (NB) has been applied to traffic incident detection and has achieved good results. However, the detection result of the practically implemented NB depends on the choice of the optimal threshold, which is determined mathematically by using Bayesian concepts in the incident-detection process. To avoid the burden of choosing the optimal threshold and tuning the parameters and, furthermore, to improve the limited classification performance of the NB and to enhance the detection performance, we propose an NB classifier ensemble for incident detection. In addition, we also propose to combine the Naïve Bayes and decision tree (NBTree) to detect incidents. In this paper, we discuss extensive experiments that were performed to evaluate the performances of three algorithms: standard NB, NB ensemble, and NBTree. The experimental results indicate that the performances of five rules of the NB classifier ensemble are significantly better than those of standard NB and slightly better than those of NBTree in terms of some indicators. More importantly, the performances of the NB classifier ensemble are very stable.

Download Full-text

The Sentiment Analysis Reviewing Indosat Services from Twitter Using the Naive Bayes Classifier

Journal of Applied Computer Science and Technology ◽

10.52158/jacost.v1i2.79 ◽

2020 ◽

Vol 1 (2) ◽

pp. 61-66

Author(s):

Febri Astiko ◽

Achmad Khodar

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Naive Bayes ◽

Learning Model ◽

Naïve Bayes ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Machine Learning Model ◽

Bayes Algorithm

This study aims to design a machine learning model of sentiment analysis on Indosat Ooredoo service reviews on social media twitter using the Naive Bayes algorithm as a classifier of positive and negative labels. This sentiment analysis uses machine learning to get patterns an model that can be used again to predict new data.

Download Full-text

Sentiment Analysis Of Online Lecture Opinions On Twitter Social Media Using Naive Bayes Classifier

10.1109/icomitee53461.2021.9650135 ◽

2021 ◽

Author(s):

Devi Ajeng Damaratih

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Online Lecture

Download Full-text

Perbandingan Optimasi Feature Selection pada Naïve Bayes untuk Klasifikasi Kepuasan Airline Passenger

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i3.3086 ◽

2021 ◽

Vol 5 (3) ◽

pp. 527-533

Author(s):

Yoga Religia ◽

Amali Amali

Keyword(s):

Feature Selection ◽

Customer Satisfaction ◽

Naive Bayes ◽

Naïve Bayes ◽

Point Of View ◽

Classification Model ◽

Passenger Satisfaction ◽

Airline Passenger ◽

Bayes Algorithm

The quality of an airline's services cannot be measured from the company's point of view, but must be seen from the point of view of customer satisfaction. Data mining techniques make it possible to predict airline customer satisfaction with a classification model. The Naïve Bayes algorithm has demonstrated outstanding classification accuracy, but currently independent assumptions are rarely discussed. Some literature suggests the use of attribute weighting to reduce independent assumptions, which can be done using particle swarm optimization (PSO) and genetic algorithm (GA) through feature selection. This study conducted a comparison of PSO and GA optimization on Naïve Bayes for the classification of Airline Passenger Satisfaction data taken from www.kaggle.com. After testing, the best performance is obtained from the model formed, namely the classification of Airline Passenger Satisfaction data using the Naïve Bayes algorithm with PSO optimization, where the accuracy value is 86.13%, the precision value is 87.90%, the recall value is 87.29%, and the value is AUC of 0.923.

Download Full-text