Topic features for machine learning-based sentiment analysis in Indonesian tweets

Author(s):  
Hendri Murfi ◽  
Furida Lusi Siagian ◽  
Yudi Satria

Purpose The purpose of this paper is to analyze topics as alternative features for sentiment analysis in Indonesian tweets. Design/methodology/approach Given Indonesian tweets, the processes of sentiment analysis start by extracting features from the tweets. The features are words or topics. The authors use non-negative matrix factorization to extract the topics and apply a support vector machine to classify the tweets into its sentiment class. Findings The authors analyze the accuracy using the two-class and three-class sentiment analysis data sets. Both data sets are about sentiments of candidates for Indonesian presidential election. The experiments show that the standard word features give better accuracies than the topics features for the two-class sentiment analysis. Moreover, the topic features can slightly improve the accuracy of the standard word features. The topic features can also improve the accuracy of the standard word features for the three-class sentiment analysis. Originality/value The standard textual data representation for sentiment analysis using machine learning is bag of word and its extensions mainly created by natural language processing. This paper applies topics as novel features for the machine learning-based sentiment analysis in Indonesian tweets.

2018 ◽  
Vol 36 (4) ◽  
pp. 677-695 ◽  
Author(s):  
Shrawan Kumar Trivedi ◽  
Shubhamoy Dey ◽  
Anil Kumar

Purpose Sentiment analysis and opinion mining are emerging areas of research for analyzing Web data and capturing users’ sentiments. This research aims to present sentiment analysis of an Indian movie review corpus using natural language processing and various machine learning classifiers. Design/methodology/approach In this paper, a comparative study between three machine learning classifiers (Bayesian, naïve Bayesian and support vector machine [SVM]) was performed. All the classifiers were trained on the words/features of the corpus extracted, using five different feature selection algorithms (Chi-square, info-gain, gain ratio, one-R and relief-F [RF] attributes), and a comparative study was performed between them. The classifiers and feature selection approaches were evaluated using different metrics (F-value, false-positive [FP] rate and training time). Findings The results of this study show that, for the maximum number of features, the RF feature selection approach was found to be the best, with better F-values, a low FP rate and less time needed to train the classifiers, whereas for the least number of features, one-R was better than RF. When the evaluation was performed for machine learning classifiers, SVM was found to be superior, although the Bayesian classifier was comparable with SVM. Originality/value This is a novel research where Indian review data were collected and then a classification model for sentiment polarity (positive/negative) was constructed.


Sentiment Analysis is individuals' opinions and feedbacks study towards a substance, which can be items, services, movies, people or events. The opinions are mostly expressed as remarks or reviews. With the social network, gatherings and websites, these reviews rose as a significant factor for the client’s decision to buy anything or not. These days, a vast scalable computing environment provides us with very sophisticated way of carrying out various data-intensive natural language processing (NLP) and machine-learning tasks to examine these reviews. One such example is text classification, a compelling method for predicting the clients' sentiment. In this paper, we attempt to center our work of sentiment analysis on movie review database. We look at the sentiment expression to order the extremity of the movie reviews on a size of 0(highly disliked) to 4(highly preferred) and perform feature extraction and ranking and utilize these features to prepare our multilabel classifier to group the movie review into its right rating. This paper incorporates sentiment analysis utilizing feature-based opinion mining and managed machine learning. The principle center is to decide the extremity of reviews utilizing nouns, verbs, and adjectives as opinion words. In addition, a comparative study on different classification approaches has been performed to determine the most appropriate classifier to suit our concern problem space. In our study, we utilized six distinctive machine learning algorithms – Naïve Bayes, Logistic Regression, SVM (Support Vector Machine), RF (Random Forest) KNN (K nearest neighbors) and SoftMax Regression.


2021 ◽  
Vol 9 (2) ◽  
pp. 313-317
Author(s):  
Vanitha kakollu, Et. al.

Today we have large amounts of textual data to be processed and the procedure involved in classifying text is called natural language processing. The basic goal is to identify whether the text is positive or negative. This process is also called as opinion mining. In this paper, we consider three different data sets and perform sentiment analysis to find the test accuracy. We have three different cases- 1. If the text contains more positive data than negative data then the overall result leans towards positive. 2. If the text contains more negative data than positive data then the overall result leans towards negative. 3. In the final case the number or positive and negative data is nearly equal then we have a neutral output. For sentiment analysis we have several steps like term extraction, feature selection, sentiment classification etc. In this paper the key point of focus is on sentiment analysis by comparing the machine learning approach and lexicon-based approach and their respective accuracy loss graphs.


Author(s):  
Subhadip Chandra ◽  
Randrita Sarkar ◽  
Sayon Islam ◽  
Soham Nandi ◽  
Avishto Banerjee ◽  
...  

Sentiment analysis is the methodical recognition, extraction, quantification, and learning of affective states and subjective information using natural language processing, text analysis, computational linguistics, and biometrics. People frequently use Twitter, one of numerous popular social media platforms, to convey their thoughts and opinions about a business, a product, or a service. Analysis of tweet sentiments is particularly useful in detecting if people have a good, negative, or neutral opinion. This study assesses public opinion about an individual, activity, commodity, or organization. The Twitter API is utilised in this article to directly get tweets from Twitter and develop a sentiment categorization for the tweets. This paper has used Twitter data for two separate approaches, viz., Lexicon & Machine Learning. Lexicon based approach further categorized in Corpus-based and Dictionary-based. And various Machine learning-based approaches like Support Vector Machine (SVM), Naïve Bayes, Maximum entropy are used to analyse Twitter data. Neural Network (NN), Decision tree-based sentiment analysis is also covered in this research work, to find out better accuracy of the approaches in the various data range. Graphs and confusion matrices are used to visualise the results of the analysis for positive, negative, and neutral remarks regarding their opinions.


The main objective of this paper is Analyze the reviews of Social Media Big Data of E-Commerce product’s. And provides helpful result to online shopping customers about the product quality and also provides helpful decision making idea to the business about the customer’s mostly liking and buying products. This covers all features or opinion words, like capitalized words, sequence of repeated letters, emoji, slang words, exclamatory words, intensifiers, modifiers, conjunction words and negation words etc available in tweets. The existing work has considered only two or three features to perform Sentiment Analysis with the machine learning technique Natural Language Processing (NLP). In this proposed work familiar Machine Learning classification models namely Multinomial Naïve Bayes, Support Vector Machine, Decision Tree Classifier, and, Random Forest Classifier are used for sentiment classification. The sentiment classification is used as a decision support system for the customers and also for the business.


2018 ◽  
Vol 42 (3) ◽  
pp. 343-354 ◽  
Author(s):  
Mike Thelwall

Purpose The purpose of this paper is to investigate whether machine learning induces gender biases in the sense of results that are more accurate for male authors or for female authors. It also investigates whether training separate male and female variants could improve the accuracy of machine learning for sentiment analysis. Design/methodology/approach This paper uses ratings-balanced sets of reviews of restaurants and hotels (3 sets) to train algorithms with and without gender selection. Findings Accuracy is higher on female-authored reviews than on male-authored reviews for all data sets, so applications of sentiment analysis using mixed gender data sets will over represent the opinions of women. Training on same gender data improves performance less than having additional data from both genders. Practical implications End users of sentiment analysis should be aware that its small gender biases can affect the conclusions drawn from it and apply correction factors when necessary. Users of systems that incorporate sentiment analysis should be aware that performance will vary by author gender. Developers do not need to create gender-specific algorithms unless they have more training data than their system can cope with. Originality/value This is the first demonstration of gender bias in machine learning sentiment analysis.


Sentiment analysis is an area of natural language processing (NLP) and machine learning where the text is to be categorized into predefined classes i.e. positive and negative. As the field of internet and social media, both are increasing day by day, the product of these two nowadays is having many more feedbacks from the customer than before. Text generated through social media, blogs, post, review on any product, etc. has become the bested suited cases for consumer sentiment, providing a best-suited idea for that particular product. Features are an important source for the classification task as more the features are optimized, the more accurate are results. Therefore, this research paper proposes a hybrid feature selection which is a combination of Particle swarm optimization (PSO) and cuckoo search. Due to the subjective nature of social media reviews, hybrid feature selection technique outperforms the traditional technique. The performance factors like f-measure, recall, precision, and accuracy tested on twitter dataset using Support Vector Machine (SVM) classifier and compared with convolution neural network. Experimental results of this paper on the basis of different parameters show that the proposed work outperforms the existing work


The process of discovering and analyzing the customer feedback using Natural Language Processing (NLP) is said to be sentiment analysis. Based on the surge over the concept of rating level in sentiment analysis, sentiment is utilized as an attribute for certain aspects or features that get expressed and more attention are provided to the problem of detecting the customer reviews. Despite the wide use and popularity of some methods, a better technique for identifying the polarity of a text data is hard to find. Machine learning has recently attracted attention as an approach for sentiment analysis. This work extends the idea of evaluating the performance of various Machine Learning (ML) classifiers namely logistic regression, Naive Bayes, Support Vector Machine (SVM) and Neural Network (NN).To show their effectiveness in sentiment mining of customer product reviews, the customer feedback has been collected from Grocery and Gourmet Food. Nearly 90 thousands customers feedback reviews of various product related categories namely Product ID, rating, review test, review time reviewer ID and reviewer name are used in this analysis. The performance of the classifiers is measured in terms of accuracy, specificity and sensitivity. From the experimental results, the better machine learning classification algorithm is proposed for sentiment mining using online shopping customer review data.


Author(s):  
Erick Omuya ◽  
George Okeyo ◽  
Michael Kimwele

Social media has been embraced by different people as a convenient and official medium of communication. People write messages and attach images and videos on Twitter, Facebook and other social media which they share. Social media therefore generates a lot of data that is rich in sentiments from these updates. Sentiment analysis has been used to determine opinions of clients, for instance, relating to a particular product or company. Knowledge based approach and Machine learning approach are among the strategies that have been used to analyze these sentiments. The performance of sentiment analysis is however distorted by noise, the curse of dimensionality, the data domains and size of data used for training and testing. This research aims at developing a model for sentiment analysis in which dimensionality reduction and the use of different parts of speech improves sentiment analysis performance. It uses natural language processing for filtering, storing and performing sentiment analysis on the data from social media. The model is tested using Naïve Bayes, Support Vector Machines and K-Nearest neighbor machine learning algorithms and its performance compared with that of two other Sentiment Analysis models. Experimental results show that the model improves sentiment analysis performance using machine learning techniques.


2021 ◽  
Vol 17 (1) ◽  
pp. 45-53
Author(s):  
Le Hong Trang ◽  
Tran Duong Huy ◽  
Anh Ngoc Le

Purpose Pricing on the online booking systems is a difficult task for the host, the systems usually set the prices that are lower than the general premises and quality, and that only gives benefits to the system by easily attracting the customer to use the service. The setting price of the new accommodation is often based on location, the number of beds, type of house and so on. The main problem is to predict the most reasonable price for the host. This paper aims to study the use of machine learning and sentiment analysis for predicting the price of online booking systems. Design/methodology/approach In particular, an empirical study is performed first for some well-known classification models for the problems. The authors then propose to apply k-means, a clustering technique, together with Gradient Boost and XGBoost models to improve the prediction performance. Experiments are conducted and tested for real Airbnb data sets collected in London City. Findings Experimental results are given and compared to show that the authors’ method outperforms to an updated method. Originality/value The authors use k-means and sampling together with Gradient Boost and XGBoost models to improve the prediction performance.


Sign in / Sign up

Export Citation Format

Share Document