Sentiment Weighted Word Embedding for Big Text Data

Jenish Dhanani; Rupa Mehta; Dipti Rana

doi:10.4018/ijwltt.20211101.oa2

Sentiment Weighted Word Embedding for Big Text Data

International Journal of Web-Based Learning and Teaching Technologies ◽

10.4018/ijwltt.20211101.oa2 ◽

2021 ◽

Vol 16 (6) ◽

pp. 1-17

Author(s):

Jenish Dhanani ◽

Rupa Mehta ◽

Dipti Rana

Keyword(s):

Sentiment Analysis ◽

Semantic Feature ◽

Word Embedding ◽

Text Data ◽

Computing Power ◽

Feature Vectors ◽

Textual Data ◽

Performance Deterioration ◽

Sentiment Dictionary ◽

Sentiment Orientation

Sentiment analysis is the practice of eliciting a sentiment orientation of people's opinions (i.e. positive, negative and neutral) toward the specific entity. Word embedding technique like Word2vec is an effective approach to encode text data into real-valued semantic feature vectors. However, it fails to preserve sentiment information that results in performance deterioration for sentiment analysis. Additionally, big sized textual data consisting of large vocabulary and its associated feature vectors demands huge memory and computing power. To overcome these challenges, this research proposed a MapReduce based Sentiment weighted Word2Vec (MSW2V), which learns the sentiment and semantic feature vectors using sentiment dictionary and big textual data in a distributed MapReduce environment, where memory and computing power of multiple computing nodes are integrated to accomplish the huge resource demand. Experimental results demonstrate the outperforming performance of the MSW2V compared to the existing distributed and non-distributed approaches.

Download Full-text

Semisupervised sentiment analysis method for online text reviews

Journal of Information Science ◽

10.1177/0165551520910032 ◽

2020 ◽

pp. 016555152091003

Author(s):

Gyeong Taek Lee ◽

Chang Ouk Kim ◽

Min Song

Keyword(s):

Unsupervised Learning ◽

Sentiment Analysis ◽

Supervised Learning ◽

Model Space ◽

Training Dataset ◽

Learning Approach ◽

Learning Models ◽

Text Data ◽

Learning Techniques ◽

Sentiment Dictionary

Sentiment analysis plays an important role in understanding individual opinions expressed in websites such as social media and product review sites. The common approaches to sentiment analysis use the sentiments carried by words that express opinions and are based on either supervised or unsupervised learning techniques. The unsupervised learning approach builds a word-sentiment dictionary, but it requires lengthy time periods and high costs to build a reliable dictionary. The supervised learning approach uses machine learning models to learn the sentiment scores of words; however, training a classifier model requires large amounts of labelled text data to achieve a good performance. In this article, we propose a semisupervised approach that performs well despite having only small amounts of labelled data available for training. The proposed method builds a base sentiment dictionary from a small training dataset using a lasso-based ensemble model with minimal human effort. The scores of words not in the training dataset are estimated using an adaptive instance-based learning model. In a pretrained word2vec model space, the sentiment values of the words in the dictionary are propagated to the words that did not exist in the training dataset. Through two experiments, we demonstrate that the performance of the proposed method is comparable to that of supervised learning models trained on large datasets.

Download Full-text

Weibo Text Sentiment Analysis Based on BERT and Deep Learning

Applied Sciences ◽

10.3390/app112210774 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10774

Author(s):

Hongchan Li ◽

Yu Ma ◽

Zishuai Ma ◽

Haodong Zhu

Keyword(s):

Deep Learning ◽

Public Opinion ◽

Sentiment Analysis ◽

Monitoring Network ◽

Feature Representation ◽

Vector Representation ◽

Text Data ◽

Proposed Model ◽

Sentiment Dictionary ◽

Text Sentiment Analysis

With the rapid increase of public opinion data, the technology of Weibo text sentiment analysis plays a more and more significant role in monitoring network public opinion. Due to the sparseness and high-dimensionality of text data and the complex semantics of natural language, sentiment analysis tasks face tremendous challenges. To solve the above problems, this paper proposes a new model based on BERT and deep learning for Weibo text sentiment analysis. Specifically, first using BERT to represent the text with dynamic word vectors and using the processed sentiment dictionary to enhance the sentiment features of the vectors; then adopting the BiLSTM to extract the contextual features of the text, the processed vector representation is weighted by the attention mechanism. After weighting, using the CNN to extract the important local sentiment features in the text, finally the processed sentiment feature representation is classified. A comparative experiment was conducted on the Weibo text dataset collected during the COVID-19 epidemic; the results showed that the performance of the proposed model was significantly improved compared with other similar models.

Download Full-text

Metoo Movement Analysis through the Lens of Social Media

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c4432.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 1649-1651

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Movement Analysis ◽

Gender Classification ◽

Topic Modelling ◽

Classification Models ◽

Textual Data ◽

Sentiment Dictionary

Sentiment analysis is an errand which is used to analyse people’s opinions which has been derived out of textual data seems productive for palpating various NLP applications. The grievances associated with this task is that, there prevails variety of sentiments within these documents, accompanied with diverse expressions. Therefore, it seems hard to whip out all sentiments employing a dictionary which is commonly used. This work attempts at constructing the domain sentiment dictionary, by employing the external textual data. Besides, various classification models could be utilised to classify the documents congruent to their opinion. We have also implemented topic modelling, emoticon analysis and optimized gender classification in our proposed system. Many sectors have been identified where women are being abused. Clusters are formed for these sectors and the most affected sector is also identified.

Download Full-text

Conducting Sentiment Analysis and Post-Sentiment Data Exploration through Automated Means

Social Media Data Extraction and Content Analysis - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-0648-5.ch008 ◽

2017 ◽

pp. 202-240

Author(s):

Shalin Hai-Jew

Keyword(s):

Sentiment Analysis ◽

Mixed Methods Research ◽

Analysis Tool ◽

Text Corpora ◽

Textual Data ◽

Sentiment Dictionary ◽

New Feature ◽

Data Visualizations ◽

Frequency Counts ◽

Proper Unit

One new feature in NVivo 11 Plus, a qualitative and mixed methods research suite, is its sentiment analysis tool; this enables the autocoding of unlabeled and unstructured text corpora against a built-in sentiment dictionary. The software labels selected texts into four categories: (1) very negative, (2) moderately negative, (3) moderately positive, and (4) very positive. After the initial coding for sentiment, there are many ways to augment that initial coding, including theme and subtheme extraction, word frequency counts, text searches, sociogram mapping, geolocational mapping, data visualizations, and others. This chapter provides a light overview of how the sentiment analysis feature in NVivo 11 Plus works, proposes some insights about the proper unit of analysis for sentiment analyses (sentence, paragraph, or cell) based on text dataset features, and identifies ways to further explore the textual data post-sentiment analysis—to create coherence and insight.

Download Full-text

A Comparative Study of Sentiment Analysis on Mask-Wearing Practices during the COVID-19 Pandemic

Quaid-e-Awam University Research Journal of Engineering, Science & Technology ◽

10.52584/qrj.1802.17 ◽

2020 ◽

Vol 18 (02) ◽

pp. 116-126

Author(s):

Bishrul Haq ◽

Ghulam Mujtaba ◽

Zahid Hussain Khand ◽

Javed Ahmad ◽

Zafar Ali

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Sentiment Analysis ◽

Preventive Measure ◽

Feature Representation ◽

Word Embedding ◽

Decision Models ◽

Supervised Machine Learning ◽

Feature Vectors ◽

F Measure

COVID-19 has become one of the most highly orated subject matter in these days. Countries have taken many viable actions to prevent the spread of the virus directed by international recommendations, which led to many disputes concerning wearing a face mask as a preventive measure against the virus. This study aims to assess and compare the overall accuracy, macro precision, macro F-measure and macro recall of the different decision models towards the COVID-19 mask-wearing practices via sentiment analysis. Tweets are labeled and text pre-processing techniques are applied as stemming, normalization, tokenization, and stop-word removal. Subsequently, the tweets are transformed into master feature vectors by applying various feature extraction, feature representation, feature selection and word embedding techniques with five supervised machine learning decision models to predict mask wearing practices reinforced from Twitter tweets. Moreover, the highest macro F-measure and macro precision are found with feature extraction as hybrid-grams, feature representation as TF-IDF, feature selection as Chi-Squared Test, and highest macro recall with feature extraction as BOW, feature representation as TF-IDF, feature selection as ANOVA F-value. Hence, this study concludes that the Naive Bayes (NB) algorithm outperforms other decision models with master feature vectors applied. In addition, it also outperforms word embedding techniques.

Download Full-text

Deep Learning for text in limted data settings

10.36227/techrxiv.12100692 ◽

2020 ◽

Author(s):

Pathikkumar Patel ◽

Bhargav Lad ◽

Jinan Fiaidhi

Keyword(s):

Machine Learning ◽

Time Series ◽

Deep Learning ◽

Sentiment Analysis ◽

Transfer Learning ◽

Text Classification ◽

State Of The Art ◽

Time Series Forecasting ◽

Text Data ◽

Performance Levels

During the last few years, RNN models have been extensively used and they have proven to be better for sequence and text data. RNNs have achieved state-of-the-art performance levels in several applications such as text classification, sequence to sequence modelling and time series forecasting. In this article we will review different Machine Learning and Deep Learning based approaches for text data and look at the results obtained from these methods. This work also explores the use of transfer learning in NLP and how it affects the performance of models on a specific application of sentiment analysis.

Download Full-text

A Review on Sentiment Classification: Natural Language Understanding

Recent Patents on Engineering ◽

10.2174/1872212112666180731113353 ◽

2019 ◽

Vol 13 (1) ◽

pp. 20-27 ◽

Cited By ~ 1

Author(s):

Srishty Jindal ◽

Kamlesh Sharma

Keyword(s):

Natural Language ◽

Sentiment Analysis ◽

Social Networking Sites ◽

Natural Language Understanding ◽

Business Analytics ◽

Language Understanding ◽

Text Data ◽

Data Set ◽

Market Positioning ◽

Illegal Activities

Background: With the tremendous increase in the use of social networking sites for sharing the emotions, views, preferences etc. a huge volume of data and text is available on the internet, there comes the need for understanding the text and analysing the data to determine the exact intent behind the same for a greater good. This process of understanding the text and data involves loads of analytical methods, several phases and multiple techniques. Efficient use of these techniques is important for an effective and relevant understanding of the text/data. This analysis can in turn be very helpful in ecommerce for targeting audience, social media monitoring for anticipating the foul elements from society and take proactive actions to avoid unethical and illegal activities, business analytics, market positioning etc. Method: The goal is to understand the basic steps involved in analysing the text data which can be helpful in determining sentiments behind them. This review provides detailed description of steps involved in sentiment analysis with the recent research done. Patents related to sentiment analysis and classification are reviewed to throw some light in the work done related to the field. Results: Sentiment analysis determines the polarity behind the text data/review. This analysis helps in increasing the business revenue, e-health, or determining the behaviour of a person. Conclusion: This study helps in understanding the basic steps involved in natural language understanding. At each step there are multiple techniques that can be applied on data. Different classifiers provide variable accuracy depending upon the data set and classification technique used.

Download Full-text

GINS: A Global intensifier-based N-Gram sentiment dictionary

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202879 ◽

2021 ◽

pp. 1-14

Author(s):

Hamed Zargari ◽

Morteza Zahedi ◽

Marziea Rahimi

Keyword(s):

Sentiment Analysis ◽

Essential Elements ◽

Analysis Methods ◽

Sentiment Dictionary ◽

N Gram ◽

Heterogeneous Effect ◽

Sentiment Word

Words are one of the most essential elements of expressing sentiments in context although they are not the only ones. Also, syntactic relationships between words, morphology, punctuation, and linguistic phenomena are influential. Merely considering the concept of words as isolated phenomena causes a lot of mistakes in sentiment analysis systems. So far, a large amount of research has been conducted on generating sentiment dictionaries containing only sentiment words. A number of these dictionaries have addressed the role of combinations of sentiment words, negators, and intensifiers, while almost none of them considered the heterogeneous effect of the occurrence of multiple linguistic phenomena in sentiment compounds. Regarding the weaknesses of the existing sentiment dictionaries, in addressing the heterogeneous effect of the occurrence of multiple intensifiers, this research presents a sentiment dictionary based on the analysis of sentiment compounds including sentiment words, negators, and intensifiers by considering the multiple intensifiers relative to the sentiment word and assigning a location-based coefficient to the intensifier, which increases the covered sentiment phrase in the dictionary, and enhanced efficiency of proposed dictionary-based sentiment analysis methods up to 7% compared to the latest methods.

Download Full-text

MoSa: A Modeling and Sentiment Analysis System for Mobile Application Big Data

Symmetry ◽

10.3390/sym11010115 ◽

2019 ◽

Vol 11 (1) ◽

pp. 115 ◽

Cited By ~ 2

Author(s):

Yaocheng Zhang ◽

Wei Ren ◽

Tianqing Zhu ◽

Ehoche Faith

Keyword(s):

Sentiment Analysis ◽

Interaction Design ◽

Mobile Application ◽

Situational Awareness ◽

Mobile Internet ◽

The Public ◽

Public Sentiment ◽

High Altitude Area ◽

Sentiment Dictionary ◽

Analysis System

The development of mobile internet has led to a massive amount of data being generated from mobile devices daily, which has become a source for analyzing human behavior and trends in public sentiment. In this paper, we build a system called MoSa (Mobile Sentiment analysis) to analyze this data. In this system, sentiment analysis is used to analyze news comments on the THAAD (Terminal High Altitude Area Defense) event from Toutiao by employing algorithms to calculate the sentiment value of the comment. This paper is based on HowNet; after the comparison of different sentiment dictionaries, we discover that the method proposed in this paper, which use a mixed sentiment dictionary, has a higher accuracy rate in its analysis of comment sentiment tendency. We then statistically analyze the relevant attributes of the comments and their sentiment values and discover that the standard deviation of the comments’ sentiment value can quickly reflect sentiment changes among the public. Besides that, we also derive some special models from the data that can reflect some specific characteristics. We find that the intrinsic characteristics of situational awareness have implicit symmetry. By using our system, people can obtain some practical results to guide interaction design in applications including mobile Internet, social networks, and blockchain based crowdsourcing.

Download Full-text

PWEBSA: Twitter sentiment analysis by combining Plutchik wheel of emotion and word embedding

International Journal of Information Technology ◽

10.1007/s41870-021-00767-y ◽

2022 ◽

Author(s):

Pravin Kumar ◽

Manu Vardhan

Keyword(s):

Sentiment Analysis ◽

Word Embedding

Download Full-text