Chi-Square Feature Selection Effect On Naive Bayes Classifier Algorithm Performance For Sentiment Analysis Document

Author(s):  
Nurhayati ◽  
Armanda Eka Putra ◽  
Luh Kesuma Wardhani ◽  
Busman
2014 ◽  
Vol 3 (3) ◽  
pp. 92 ◽  
Author(s):  
JUEN LING ◽  
I PUTU EKA N. KENCANA ◽  
TJOKORDA BAGUS OKA

Sentiment analysis is the computational study of opinions, sentiments, and emotions expressed in texts. The basic task of sentiment analysis is to classify the polarity of the existing texts in documents, sentences, or opinions. Polarity has meaning if there is text in the document, sentence, or the opinion has a positive or negative aspect. In this study, classification of the polarity in sentiment analysis using machine learning techniques, that is Naïve Bayes classifier. Criteria for text classification decisions, learned automatically from learning the data. The need for manual classification is still required because training the data derived from manually labeling, the label (feature) refers to the process of adding a description of each data according to its category. In the process of labeling, feature selection is used and performed by chi-square feature selection, to reduce the disturbance (noise) in the classification. The results showed that the frequency of occurrences of the expected features in the true category and in the false category have an important role in the chi-square feature selection. Then classification breaking news by Naïve Bayes classifier obtained an accuracy of 83% and a harmonic average of 90.713%.


2019 ◽  
Vol 4 (3) ◽  
pp. 87
Author(s):  
Yono Cahyono ◽  
Saprudin Saprudin

At present the development of the use of social media in Indonesia is very rapid, in Indonesia there are a variety of regional languages, one of which is the Sundanese language, where some people especially those living in West Java use Sundanese language to express comments, opinions, suggestions, criticisms and others in social media. This information can be used as valuable data for individuals or organizations in decision making. The huge amount of data makes it impossible for humans to read and analyze it manually. Sentiment analysis is the process of classifying opinions, analyzing, understanding, evaluating, emotions and attitudes towards a particular entity such as individuals, organizations, products or services, topics, events, in order to obtain information. The purpose of this research is the Naїve Bayes Classifier (NBC) classification algorithm and Feature Chi Squared Statistics selection method can be used in Sundanese-language tweets sentiment analysis on Twitter social media into positive, negative and neutral categories. Chi Square Statistic feature test results can reduce irrelevant features in the Naïve Bayes Classifier classification process on Sundanese-language tweets with an accuracy of 78.48%.


2020 ◽  
Vol 16 (1) ◽  
pp. 123-128
Author(s):  
Dinda Ayu Muthia

The closure of illegal movie streaming sites IndoXXI has been a trending topic on Twitter at the end of 2019. The reaction of netizens on Twitter shows positive and negative sentiments. Until now, there have been many studies in the field of Sentiment Analysis using data in the form of Tweets from Twitter users. In sentiment analysis research, there are so many method used, and Naïve Bayes is one of it, because it is very simple and efficient. The method has advantages and disadvantages. Naïve Bayes is so sensitive in feature selection. Too many features not only increase calculation time but also reduce classification accuracy. In order to solve the disadvantages and increase the performance of the Naïve Bayes classifier, this method often being combined with many kind of feature selection methods. This research aims to classify tweets into positive and negative using the Naïve Bayes classifier combined with the Genetic Algorithm. The accuracy of Naïve Bayes before using the combination of feature selection methods reaches 79.55%. While after using feature selection methods, which is the Genetic Algorithm, accuracy increased up to 88.64%. The accuracy improved by up to 9.09%.


CAUCHY ◽  
2021 ◽  
Vol 7 (1) ◽  
pp. 28-39
Author(s):  
Adri Priadana ◽  
Ahmad Ashril Rizal

The COVID-19 pandemic impact has affected all industries in Indonesia and even the world, including the tourism industry. Researchers have a role in researching to answer the needs of the tourism industry, especially in making tourism and business destination management programs and carrying out activities oriented to meet the needs of the tourism industry. Meanwhile, the government has a role in making policies, especially in the roadmap, for developing the tourism industry. This study aims to track trending topics in social media Instagram since COVID-19 hit. The results of trending topics will be classified by sentiment analysis using a Lexicon-based and Naive Bayes Classifier. Based on Instagram data taken since January 2020, it shows the five highest topics in the tourism sector, namely health protocols, hotels, homes, streets, and beaches. Of the five topics, sentiment analysis was carried out with the Lexicon-based and Naive Bayes classifier, showing that beaches get an incredibly positive sentiment, namely 80.87%, and hotels provide the highest negative sentiment 57.89%. The accuracy of the Confusion matrix's sentiment results shows that the accuracy, precision, and recall are 82.53%, 86.99%, and 83.43%, respectively.


Sign in / Sign up

Export Citation Format

Share Document