A Generalized Method for Sentiment Analysis across Different Sources

Sentiment analysis is widely used in a variety of applications such as online opinion gathering for policy directives in government, monitoring of customers, and staff satisfactions in corporate bodies, in politics and security structures for public tension monitoring, and so on. In recent times, the field met with new set of challenges where new algorithms have to contend with highly unstructured sources for sentiment expressions emanating from online social media fora. In this study, a rule and lexical-based procedure is proposed together with unsupervised machine learning to implement sentiment analysis with an improved generalization ability across different sources. To deal with sources devoid of syntactic and grammatical structure, the approach incorporates a ruled-based technique for emoticon detection, word contraction expansion, noise removal, and lexicon-based text preprocessing using lexical features such as part of speech (POS), stop words, and lemmatization for local context analysis. A text is broken into number of tokens with each representing a sentence and then lexicon-dependent features are extracted from each token. The features are merged together using a combining function for a given text before being used to train a machine learning classifier. The proposed combining functions leverage on averaging and information gain concepts. Experimental results with different machine leaning classifiers indicate that improved performance with great deal of generalization capacity across both structured and nonstructured sources can be realized. The finding shows that carefully designed lexical features reinforce learning process in unsupervised learning more than using word embeddings alone as the features. Obtained experimental results from movie review dataset (recall = 74.9%, precision = 70.9%, F1-score = 72.9%, and accuracy = 72.0%) and twitter samples’ datasets (recall = 93.4%, precision = 89.5%, F1-score = 91.4%, and accuracy = 91.1%) show the efficacy of the proposed approach in comparison with other state-of-the-art research studies.

Download Full-text

TEXT SENTIMENT ANALYSIS BASED ON CNNS AND SVM

International Journal of Research -GRANTHAALAYAH ◽

10.29121/granthaalayah.v7.i6.2019.761 ◽

2019 ◽

Vol 7 (6) ◽

pp. 77-83 ◽

Cited By ~ 1

Author(s):

Dr. C. Arunabala ◽

P. Jwalitha ◽

Soniya Nuthalapati

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Sentiment Analysis ◽

Expressive Power ◽

Sentiment Classification ◽

Experimental Results ◽

Analysis Method ◽

Mapping Functions ◽

Generalization Ability ◽

Text Sentiment Analysis

The traditional text sentiment analysis method is mainly based on machine learning. However, its dependence on emotion dictionary construction and artificial design and extraction features makes the generalization ability limited. In contrast, depth models have more powerful expressive power, and can learn complex mapping functions from data to affective semantics better. In this paper, a Convolution Neural Networks (CNNs) model combined with SVM text sentiment analysis is proposed. The experimental results show that the proposed method improves the accuracy of text sentiment classification effectively compared with traditional CNN, and confirms the effectiveness of sentiment analysis based on CNNs and SVM

Download Full-text

Impact of Deep Learning on Semantic Sentiment Analysis

Examining the Impact of Deep Learning and IoT on Multi-Industry Applications - Advances in Web Technologies and Engineering ◽

10.4018/978-1-7998-7511-6.ch007 ◽

2021 ◽

pp. 97-117

Author(s):

Neha Gupta ◽

Rashmi Agrawal

Keyword(s):

Machine Learning ◽

Social Media ◽

Deep Learning ◽

Sentiment Analysis ◽

Semantic Technologies ◽

Sources Of Information ◽

Accurate Analysis ◽

Online Social Media ◽

Semantic Orientation ◽

Semantically Enhanced

Online social media (forums, blogs, and social networks) are increasing explosively, and utilization of these new sources of information has become important. Semantics plays a significant role in accurate analysis of an emotion speech context. Adding to this area, the already advanced semantic technologies have proven to increase the precision of the tests. Deep learning has emerged as a prominent machine learning technique that learns multiple layers or data characteristics and delivers state-of-the-art output. Throughout recent years, deep learning has been widely used in the study of sentiments, along with the growth of deep learning in many other fields of use. This chapter will offer a description of deep learning and its application in the analysis of sentiments. This chapter will focus on the semantic orientation-based approaches for sentiment analysis. In this work, a semantically enhanced methodology for the annotation of sentiment polarity in Twitter/ Facebook data will be presented.

Download Full-text

To use or not to use: Feature selection for sentiment analysis of highly imbalanced data

Natural Language Engineering ◽

10.1017/s1351324917000298 ◽

2017 ◽

Vol 24 (1) ◽

pp. 3-37 ◽

Cited By ~ 5

Author(s):

SANDRA KÜBLER ◽

CAN LIU ◽

ZEESHAN ALI SAYYED

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Sentiment Analysis ◽

Information Gain ◽

Binary Classification ◽

Small Subset ◽

Large Set ◽

Learning Approaches ◽

Selection Methods ◽

Data Set

AbstractWe investigate feature selection methods for machine learning approaches in sentiment analysis. More specifically, we use data from the cooking platform Epicurious and attempt to predict ratings for recipes based on user reviews. In machine learning approaches to such tasks, it is a common approach to use word or part-of-speech n-grams. This results in a large set of features, out of which only a small subset may be good indicators for the sentiment. One of the questions we investigate concerns the extension of feature selection methods from a binary classification setting to a multi-class problem. We show that an inherently multi-class approach, multi-class information gain, outperforms ensembles of binary methods. We also investigate how to mitigate the effects of extreme skewing in our data set by making our features more robust and by using review and recipe sampling. We show that over-sampling is the best method for boosting performance on the minority classes, but it also results in a severe drop in overall accuracy of at least 6 per cent points.

Download Full-text

Sentiment Analysis on Movie Reviews Using Twitter

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9326 ◽

2020 ◽

Vol 17 (7) ◽

pp. 2869-2875

Author(s):

Sajay Thomas Samuel ◽

Booma Poolan Marikannan

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Learning Algorithm ◽

Instant Messaging ◽

Machine Learning Algorithms ◽

Depth Information ◽

Implementation Phase ◽

Online Social Media ◽

Past Data

Machine learning can help people to perform complex tasks and solve problems as it uses historical data to learn its pattern and make predictions based on the past data. This research addresses the problem about movie reviews on social media specifically Twitter; where it will gather the tweets on movie reviews and display a rating based on the sentiment of the tweet. Twitter is an online social media website where people from all walks of life communicate by tweeting short updates without exceeding the character limit which is 240 characters. Twitter is continuously growing as a business and became one of the biggest platform for communication and instant messaging. Due to the large number of users, there are voluminous amounts of data available that can be used for more in depth information and insights and to get the sentiments from analysing the tweets. In today’s world, there are many applications that are using sentiment analysis in various fields such as to gets insights about a particular brand or product. To do sentiment analysis using the traditional ways can be time consuming and becomes very complex. The aim of this research is to investigate about the domain of sentiment analysis and incorporate a machine learning algorithm to create a system that is able to get and display the ratings of a particular movie. The machine learning algorithms used are Naïve Bayes Classifier and SVM. The algorithm with better accuracy will be chosen for the implementation phase.

Download Full-text

On the Feature Selection and Classification Based on Information Gain for Document Sentiment Analysis

Applied Computational Intelligence and Soft Computing ◽

10.1155/2018/1407817 ◽

2018 ◽

Vol 2018 ◽

pp. 1-5 ◽

Cited By ~ 14

Author(s):

Asriyanti Indah Pratiwi ◽

Adiwijaya

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Classification Scheme ◽

Information Gain ◽

Sentiment Classification ◽

Experimental Results ◽

Enormous Number

Sentiment analysis in a movie review is the needs of today lifestyle. Unfortunately, enormous features make the sentiment of analysis slow and less sensitive. Finding the optimum feature selection and classification is still a challenge. In order to handle an enormous number of features and provide better sentiment classification, an information-based feature selection and classification are proposed. The proposed method reduces more than 90% unnecessary features while the proposed classification scheme achieves 96% accuracy of sentiment classification. From the experimental results, it can be concluded that the combination of proposed feature selection and classification achieves the best performance so far.

Download Full-text

Clustering helps to improve price prediction in online booking systems

International Journal of Web Information Systems ◽

10.1108/ijwis-11-2020-0065 ◽

2021 ◽

Vol 17 (1) ◽

pp. 45-53

Author(s):

Le Hong Trang ◽

Tran Duong Huy ◽

Anh Ngoc Le

Keyword(s):

Machine Learning ◽

Empirical Study ◽

Sentiment Analysis ◽

Design Methodology ◽

Prediction Performance ◽

Experimental Results ◽

Data Sets ◽

Classification Models ◽

Content Type ◽

Price Prediction

Purpose Pricing on the online booking systems is a difficult task for the host, the systems usually set the prices that are lower than the general premises and quality, and that only gives benefits to the system by easily attracting the customer to use the service. The setting price of the new accommodation is often based on location, the number of beds, type of house and so on. The main problem is to predict the most reasonable price for the host. This paper aims to study the use of machine learning and sentiment analysis for predicting the price of online booking systems. Design/methodology/approach In particular, an empirical study is performed first for some well-known classification models for the problems. The authors then propose to apply k-means, a clustering technique, together with Gradient Boost and XGBoost models to improve the prediction performance. Experiments are conducted and tested for real Airbnb data sets collected in London City. Findings Experimental results are given and compared to show that the authors’ method outperforms to an updated method. Originality/value The authors use k-means and sampling together with Gradient Boost and XGBoost models to improve the prediction performance.

Download Full-text

Inverse local context analysis

The Electronic Library ◽

10.1108/el-12-2014-0211 ◽

2016 ◽

Vol 34 (3) ◽

pp. 405-418

Author(s):

Wei Lu ◽

Xinghu Yue ◽

Qikai Cheng ◽

Rui Meng

Keyword(s):

Design Methodology ◽

Experimental Results ◽

Data Sources ◽

Local Context ◽

Context Analysis ◽

Content Type ◽

Practical Applications ◽

Data Source

Purpose The purpose of this paper is to explore the use of inverse local context analysis (ILCA) to obtain data from limited accessible data sources. Design/methodology/approach The experimental results show that the method the authors proposed can obtain all retrieved documents from the limited accessible data source using the least number of queries. Findings The experimental results show that the method we proposed can obtain all retrieved documents from the limited accessible data source using the least number of queries. Originality/value To the best of the authors’ knowledge, this paper provides the first attempt to gather all the retrieved documents from limited accessible data source, and the efficiency and ease of implementation of the proposed solution make it feasible for practical applications. The method the authors proposed can also benefit the construction of web corpus.

Download Full-text

Performance Assessment of Multiple Classifiers Based on Ensemble Feature Selection Scheme for Sentiment Analysis

Applied Computational Intelligence and Soft Computing ◽

10.1155/2018/8909357 ◽

2018 ◽

Vol 2018 ◽

pp. 1-12 ◽

Cited By ~ 4

Author(s):

Monalisa Ghosh ◽

Goutam Sanyal

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Sentiment Analysis ◽

Gini Index ◽

Feature Vector ◽

Information Gain ◽

Feature Subset ◽

Selection Methods ◽

Prominent Feature ◽

Chi Square

Sentiment classification or sentiment analysis has been acknowledged as an open research domain. In recent years, an enormous research work is being performed in these fields by applying various numbers of methodologies. Feature generation and selection are consequent for text mining as the high-dimensional feature set can affect the performance of sentiment analysis. This paper investigates the inability or incompetency of the widely used feature selection methods (IG, Chi-square, and Gini Index) with unigram and bigram feature set on four machine learning classification algorithms (MNB, SVM, KNN, and ME). The proposed methods are evaluated on the basis of three standard datasets, namely, IMDb movie review and electronics and kitchen product review dataset. Initially, unigram and bigram features are extracted by applying n-gram method. In addition, we generate a composite features vector CompUniBi (unigram + bigram), which is sent to the feature selection methods Information Gain (IG), Gini Index (GI), and Chi-square (CHI) to get an optimal feature subset by assigning a score to each of the features. These methods offer a ranking to the features depending on their score; thus a prominent feature vector (CompIG, CompGI, and CompCHI) can be generated easily for classification. Finally, the machine learning classifiers SVM, MNB, KNN, and ME used prominent feature vector for classifying the review document into either positive or negative. The performance of the algorithm is measured by evaluation methods such as precision, recall, and F-measure. Experimental results show that the composite feature vector achieved a better performance than unigram feature, which is encouraging as well as comparable to the related research. The best results were obtained from the combination of Information Gain with SVM in terms of highest accuracy.

Download Full-text

Stance detection using diverse feature sets based on machine learning techniques

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202269 ◽

2021 ◽

pp. 1-20

Author(s):

Kashif Ayyub ◽

Saqib Iqbal ◽

Muhammad Wasif Nisar ◽

Saima Gulzar Ahmad ◽

Ehsan Ullah Munir

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Sentiment Analysis ◽

Information Gain ◽

Real Life ◽

Machine Learning Techniques ◽

Base Line ◽

Feature Sets ◽

Part Of Speech ◽

Learning Techniques

Sentiment analysis is the field that analyzes sentiments, and opinions of people about entities such as products, businesses, and events. As opinions influence the people’s behaviors, it has numerous applications in real life such as marketing, politics, social media etc. Stance detection is the sub-field of sentiment analysis. The stance classification aims to automatically identify from the source text, whether the source is in favor, neutral, or opposed to the target. This research study proposed a framework to explore the performance of the conventional (NB, DT, SVM), ensemble learning (RF, AdaBoost) and deep learning-based (DBN, CNN-LSTM, and RNN) machine learning techniques. The proposed method is feature centric and extracted the (sentiment, content, tweet specific and part-of-speech) features from both datasets of SemEval2016 and SemEval2017. The proposed study has also explored the role of deep features such as GloVe and Word2Vec for stance classification which has not received attention yet for stance detection. Some base line features such as Bag of words, N-gram, TF-IDF are also extracted from both datasets to compare the proposed features along with deep features. The proposed features are ranked using feature ranking methods such as (information gain, gain ration and relief-f). Further, the results are evaluated using standard performance evaluation measures for stance classification with existing studies. The calculated results show that the proposed feature sets including sentiment, (part-of-speech, content, and tweet specific) are helpful for stance classification when applied with SVM and GloVe a deep feature has given the best results when applied with deep learning method RNN.

Download Full-text

An Improved Sentiment Analysis Approach to Detect Radical Content on Twitter

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2021100103 ◽

2021 ◽

Vol 16 (4) ◽

pp. 52-73

Author(s):

kamel Ahsene Djaballah ◽

Kamel Boukhalfa ◽

Omar Boussaid ◽

Yassine Ramdane

Keyword(s):

Machine Learning ◽

Data Mining ◽

Social Networks ◽

Fuzzy Logic ◽

Sentiment Analysis ◽

Experimental Results ◽

Data Mining Algorithms ◽

Terrorist Groups ◽

Radical Content ◽

Mining Algorithms

Social networks are used by terrorist groups and people who support them to propagate their ideas, ideologies, or doctrines and share their views on terrorism. To analyze tweets related to terrorism, several studies have been proposed in the literature. Some works rely on data mining algorithms; others use lexicon-based or machine learning sentiment analysis. Some recent works adopt other methods that combine multi-techniques. This paper proposes an improved approach for sentiment analysis of radical content related to terrorist activity on Twitter. Unlike other solutions, the proposed approach focuses on using a dictionary of weighted terms, the Word2vec method, and trigrams, with a classification based on fuzzy logic. The authors have conducted experiments with 600 manually annotated tweets and 200,000 automatically collected tweets in English and Arabic to evaluate this approach. The experimental results revealed that the new technique provides between 75% to 78% of precision for radicality detection and 61% to 64% to detect radicality degrees.

Download Full-text