scholarly journals Tackling the Problem of Class Imbalance in Multi-class Sentiment Classification: An Experimental Study

2019 ◽  
Vol 44 (2) ◽  
pp. 151-178 ◽  
Author(s):  
Mateusz Lango

Abstract Sentiment classification is an important task which gained extensive attention both in academia and in industry. Many issues related to this task such as handling of negation or of sarcastic utterances were analyzed and accordingly addressed in previous works. However, the issue of class imbalance which often compromises the prediction capabilities of learning algorithms was scarcely studied. In this work, we aim to bridge the gap between imbalanced learning and sentiment analysis. An experimental study including twelve imbalanced learning preprocessing methods, four feature representations, and a dozen of datasets, is carried out in order to analyze the usefulness of imbalanced learning methods for sentiment classification. Moreover, the data difficulty factors — commonly studied in imbalanced learning — are investigated on sentiment corpora to evaluate the impact of class imbalance.

Author(s):  
Basant Agarwal ◽  
Namita Mittal

Opinion Mining or Sentiment Analysis is the study that analyzes people's opinions or sentiments from the text towards entities such as products and services. It has always been important to know what other people think. With the rapid growth of availability and popularity of online review sites, blogs', forums', and social networking sites' necessity of analysing and understanding these reviews has arisen. The main approaches for sentiment analysis can be categorized into semantic orientation-based approaches, knowledge-based, and machine-learning algorithms. This chapter surveys the machine learning approaches applied to sentiment analysis-based applications. The main emphasis of this chapter is to discuss the research involved in applying machine learning methods mostly for sentiment classification at document level. Machine learning-based approaches work in the following phases, which are discussed in detail in this chapter for sentiment classification: (1) feature extraction, (2) feature weighting schemes, (3) feature selection, and (4) machine-learning methods. This chapter also discusses the standard free benchmark datasets and evaluation methods for sentiment analysis. The authors conclude the chapter with a comparative study of some state-of-the-art methods for sentiment analysis and some possible future research directions in opinion mining and sentiment analysis.


Big Data ◽  
2016 ◽  
pp. 1917-1933
Author(s):  
Basant Agarwal ◽  
Namita Mittal

Opinion Mining or Sentiment Analysis is the study that analyzes people's opinions or sentiments from the text towards entities such as products and services. It has always been important to know what other people think. With the rapid growth of availability and popularity of online review sites, blogs', forums', and social networking sites' necessity of analysing and understanding these reviews has arisen. The main approaches for sentiment analysis can be categorized into semantic orientation-based approaches, knowledge-based, and machine-learning algorithms. This chapter surveys the machine learning approaches applied to sentiment analysis-based applications. The main emphasis of this chapter is to discuss the research involved in applying machine learning methods mostly for sentiment classification at document level. Machine learning-based approaches work in the following phases, which are discussed in detail in this chapter for sentiment classification: (1) feature extraction, (2) feature weighting schemes, (3) feature selection, and (4) machine-learning methods. This chapter also discusses the standard free benchmark datasets and evaluation methods for sentiment analysis. The authors conclude the chapter with a comparative study of some state-of-the-art methods for sentiment analysis and some possible future research directions in opinion mining and sentiment analysis.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Duy Ngoc Nguyen ◽  
Tuoi Thi Phan ◽  
Phuc Do

AbstractSentiment classification, which uses deep learning algorithms, has achieved good results when tested with popular datasets. However, it will be challenging to build a corpus on new topics to train machine learning algorithms in sentiment classification with high confidence. This study proposes a method that processes embedding knowledge in the ontology of opinion datasets called knowledge processing and representation based on ontology (KPRO) to represent the significant features of the dataset into the word embedding layer of deep learning algorithms in sentiment classification. Unlike the methods that lexical encode or add information to the corpus, this method adds presentation of raw data based on the expert’s knowledge in the ontology. Once the data has a rich knowledge of the topic, the efficiency of the machine learning algorithms is significantly enhanced. Thus, this method is appliable to embed knowledge in datasets in other languages. The test results show that deep learning methods achieved considerably higher accuracy when trained with the KPRO method’s dataset than when trained with datasets not processed by this method. Therefore, this method is a novel approach to improve the accuracy of deep learning algorithms and increase the reliability of new datasets, thus making them ready for mining.


2021 ◽  
Vol 26 (5) ◽  
pp. 501-506
Author(s):  
Anuj Kumar Singh ◽  
Sandeep Kumar ◽  
Shashi Bhushan ◽  
Pramod Kumar ◽  
Arun Vashishtha

When anyone is looking to enroll for a freely available online course so the first and famous name comes in front of the searcher is MOOC courses. So here in this article our focus is to collect the comments by enrolled users for the specified MOOC course and apply sentiment analysis over that data. The significance of our article is to introduce a proficient sentiment analysis algorithm with high perceptive execution in MOOC courses, by seeking after the standards of gathering various supervised learning methods where the performance of various supervised machine learning algorithms in performing sentiment analysis of MOOC data. Some research questions have been addressed on sentiment analysis of MOOC data. For the assessment task, we have investigated a large no of MOOC courses, with the different Supervised Learning methods and calculated accuracy of the data by using parameters such as Precision, Recall and F1 Score. From the results we can conclude that when the bigram model was applied to the logistic regression, the Multilayer Perceptron (MLP) overcomes the accuracy by other algorithms as SVM, Naive Bayes and achieved an accuracy of 92.44 percent. To determine the sentiment polarity of a sentence, the suggested method use term frequency (No of Positive, Negative terms in the text) to calculate the sentiment polarity of the text. We use a logistic regression Function to predict the sentiment classification accuracy of positive and negative comments from the data.


In this digitized world, the Internet has become a prominent source to glean various kinds of information. In today’s scenario, people prefer virtual reality instead of one to one communication. The Majority of the population prefers social networking sites to voice themselves through posts, blogs, comments, likes, dislikes. Their sentiments can be found/traced using opinion mining or Sentiment analysis. Sentiment analysis of social media text is a useful technique for identifying peoples’ positive, negative or neutral emotions/sentiments/opinions. Sentiment analysis has gained special attention by researchers from last few years. Traditionally many machine learning algorithms were used to implement it like navie bays, Support Vector Machine and many more. But to overcome the drawbacks of ML in terms of complex classification algorithms different deep learning-based algorithms are introduced like CNN, RNN, and HNN. In this paper, we have studied different deep learning algorithms and intended to propose a deep learning-based model to analyze the behavior of an individual using social media text. Results given by the proposed model can utilize in a range of different fields like business, education, industry, politics, psychology, security, etc.


Author(s):  
Taynan Ferreira ◽  
Francisco Paiva ◽  
Roberto Silva ◽  
Angel Paula ◽  
Anna Costa ◽  
...  

Sentiment analysis (SA) is increasing its importance due to the enormous amount of opinionated textual data available today. Most of the researches have investigated different models, feature representation and hyperparameters in SA classification tasks. However, few studies were conducted to evaluate the impact of these features on regression SA tasks. In this paper, we conduct such assessment on a financial domain data set by investigating different feature representations and hyperparameters in two important models -- Support Vector Regression (SVR) and Convolution Neural Networks (CNN). We conclude presenting the most relevant feature representations and hyperparameters and how they impact outcomes on a regression SA task.


MATEMATIKA ◽  
2020 ◽  
Vol 36 (2) ◽  
pp. 99-111
Author(s):  
Kartika Fithriasari ◽  
Saidah Zahrotul Jannah ◽  
Zakya Reyhana

Social media is used as a tool by many people to express their opinions. Sentiment analysis for social media is very important, as it allows information to be obtained about public opinion on government performance. The goal of this research is to learn about the opinions of Surabaya citizens, using deep learning methods. The data are extracted from the official Twitter accounts of the Surabaya government and a private radio station in Surabaya. The data are grouped into two categories: positive and negative sentiments. This research is conducted in three steps: data pre-processing, sentiment classification, and visualization. Data pre-processing is required before modelling approaches are applied. It is used to transform the unstructured text data into structured data. The data pre-processing consists of case folding, tokenizing, and the removal of stop words. Deep learning methods are then applied to the data. A Backpropagation Neural Network (BNN) and a Convolutional Neural Network (CNN) are used to perform the sentiment classification. The BNN and CNN are compared using various metrics, such as precision, sensitivity, and area under the receiver operating characteristic curve (AUC). A word cloud is then used to visualize the data and find the most frequent words in each class. The results show that the sentiment classification with CNN is better than that with the BNN because the values for the precision, sensitivity and AUC are higher.


Author(s):  
Omar Alharbi

One crucial aspect of sentiment analysis is negation handling, where the occurrence of negation can flip the sentiment of a review and negatively affects the machine learning-based sentiment classification. The role of negation in Arabic sentiment analysis has been explored only to a limited extent, especially for colloquial Arabic. In this paper, the authors address the negation problem in colloquial Arabic sentiment classification using the machine learning approach. To this end, they propose a simple rule-based algorithm for handling the problem that affects the performance of a machine learning classifier. The rules were crafted based on observing many cases of negation, simple linguistic knowledge, and sentiment lexicon. They also examine the impact of the proposed algorithm on the performance of different machine learning algorithms. Furthermore, they compare the performance of the classifiers when their algorithm is used against three baselines. The experimental results show that there is a positive impact on the classifiers when the proposed algorithm is used compared to the baselines.


2020 ◽  
Vol 9 (1) ◽  
pp. 2254-2261

Sentiments are the emotions which are communicated among individuals. These are opinions given by people on any item, product or service availed or experience online. This paper discusses that part of research area which involves the analysis of sentiments exchanged by people online that further tells how sentiments and features through online tourist reviews are extracted using deep learning techniques. Tourist behavior can be judged by tourists reviews for various tourist places, hotels and other services provided by tourism industry. The proposed idea of the paper is to show the high efficiency of deep learning techniques like CNN, RNN,LSTM to extract the features online by use of extra hidden layers. Further, comparison of these techniques as well as comparison of these techniques with machine learning classical algorithms like SVM, Naïve Bayes, KNN,RF etc has been done to show that deep learning methods are more efficient than classical machine learning algorithms. The accurate capturing of attitudes of tourists towards tourist places, hotels & other services of tourism industry plays utmost important role to enhance the business model of tourism industry. This can be done through sentiment analysis using deep learning methods efficiently. Classification of polarity will be done by extracting textual features using CNN,RNN,LSTM deep learning algorithms. Extracting features are fed to deep learning classifier to classify the review into either positive, negative or neutral type of reviews. After comparing various deep learning and classical techniques of machine learning, it has been concluded that LSTM,RNN give best results to classify reviews into positive and negative reviews rather than SVM,KNN classical techniques. In this way sentiment analysis has been done and the proposed idea of this research paper is change in the machine learning techniques or methods from classical algorithms to neural network deep learning methods which in future definitely will give better results to analyze deeply the sentiments of tourists to find out the liking and disliking of various tourist places, hotels and related tourism services that will help tourism business industry to work on the gap in existing services provided by them and system can become more efficient in future. Such improved tourism system will give benefits to tourists or users in terms of better services and undoubtedly it will help tourism industry to enhance business in future.


Author(s):  
Siwi Cahyaningtyas ◽  
Dhomas Hatta Fudholi ◽  
Ahmad Fathan Hidayatullah

Tourism is one of the fastest-growing industries. Many travelers book hotels and share their experiences using travel e-commerce sites. To improve the quality of products and services, we can take advantage by analyzing their reviews. We can see the good and the bad thing reviews in every aspect of the hotel. However, research to analyze sentiment in every aspect using Indonesian hotel reviews is still relatively new. In this work, we propose to create an Aspect-based Sentiment Analysis (ABSA) using Indonesian hotel reviews to solve the problem. This research consists of four steps: collecting data, preprocessing, aspect classification, and sentiment classification. Our classification process compares with eight deep learning methods (RNN, LSTM, GRU, BiLSTM, Attention BiLSTM, CNN, CNN-LSTM, and CNN-BiLSTM). In aspect classification, we have six classes of aspects which are harga (price), hotel, kamar (room), lokasi (location), pelayanan (service), and restoran (restaurant). In sentiment analysis, we compared two scenarios to classify sentiments as positive or negative. The first one is to classify sentiment in all aspects, and the second one is to classify sentiment in every aspect. The results showed that LSTM achieved the best model for aspect classification with an accuracy value of 0.926. For sentiment classification, our experiments showed that classify sentiment in every aspect achieved a better result than classify sentiment in all aspects. The result showed that the CNN model gets an average accuracy score of 0.904.


Sign in / Sign up

Export Citation Format

Share Document