scholarly journals Sentiment Weighted Word Embedding for Big Text Data

Author(s):  
Jenish Dhanani ◽  
Rupa Mehta ◽  
Dipti Rana

Sentiment analysis is the practice of eliciting a sentiment orientation of people's opinions (i.e. positive, negative and neutral) toward the specific entity. Word embedding technique like Word2vec is an effective approach to encode text data into real-valued semantic feature vectors. However, it fails to preserve sentiment information that results in performance deterioration for sentiment analysis. Additionally, big sized textual data consisting of large vocabulary and its associated feature vectors demands huge memory and computing power. To overcome these challenges, this research proposed a MapReduce based Sentiment weighted Word2Vec (MSW2V), which learns the sentiment and semantic feature vectors using sentiment dictionary and big textual data in a distributed MapReduce environment, where memory and computing power of multiple computing nodes are integrated to accomplish the huge resource demand. Experimental results demonstrate the outperforming performance of the MSW2V compared to the existing distributed and non-distributed approaches.

2020 ◽  
pp. 016555152091003
Author(s):  
Gyeong Taek Lee ◽  
Chang Ouk Kim ◽  
Min Song

Sentiment analysis plays an important role in understanding individual opinions expressed in websites such as social media and product review sites. The common approaches to sentiment analysis use the sentiments carried by words that express opinions and are based on either supervised or unsupervised learning techniques. The unsupervised learning approach builds a word-sentiment dictionary, but it requires lengthy time periods and high costs to build a reliable dictionary. The supervised learning approach uses machine learning models to learn the sentiment scores of words; however, training a classifier model requires large amounts of labelled text data to achieve a good performance. In this article, we propose a semisupervised approach that performs well despite having only small amounts of labelled data available for training. The proposed method builds a base sentiment dictionary from a small training dataset using a lasso-based ensemble model with minimal human effort. The scores of words not in the training dataset are estimated using an adaptive instance-based learning model. In a pretrained word2vec model space, the sentiment values of the words in the dictionary are propagated to the words that did not exist in the training dataset. Through two experiments, we demonstrate that the performance of the proposed method is comparable to that of supervised learning models trained on large datasets.


2021 ◽  
Vol 11 (22) ◽  
pp. 10774
Author(s):  
Hongchan Li ◽  
Yu Ma ◽  
Zishuai Ma ◽  
Haodong Zhu

With the rapid increase of public opinion data, the technology of Weibo text sentiment analysis plays a more and more significant role in monitoring network public opinion. Due to the sparseness and high-dimensionality of text data and the complex semantics of natural language, sentiment analysis tasks face tremendous challenges. To solve the above problems, this paper proposes a new model based on BERT and deep learning for Weibo text sentiment analysis. Specifically, first using BERT to represent the text with dynamic word vectors and using the processed sentiment dictionary to enhance the sentiment features of the vectors; then adopting the BiLSTM to extract the contextual features of the text, the processed vector representation is weighted by the attention mechanism. After weighting, using the CNN to extract the important local sentiment features in the text, finally the processed sentiment feature representation is classified. A comparative experiment was conducted on the Weibo text dataset collected during the COVID-19 epidemic; the results showed that the performance of the proposed model was significantly improved compared with other similar models.


2019 ◽  
Vol 8 (3) ◽  
pp. 1649-1651

Sentiment analysis is an errand which is used to analyse people’s opinions which has been derived out of textual data seems productive for palpating various NLP applications. The grievances associated with this task is that, there prevails variety of sentiments within these documents, accompanied with diverse expressions. Therefore, it seems hard to whip out all sentiments employing a dictionary which is commonly used. This work attempts at constructing the domain sentiment dictionary, by employing the external textual data. Besides, various classification models could be utilised to classify the documents congruent to their opinion. We have also implemented topic modelling, emoticon analysis and optimized gender classification in our proposed system. Many sectors have been identified where women are being abused. Clusters are formed for these sectors and the most affected sector is also identified.


Author(s):  
Shalin Hai-Jew

One new feature in NVivo 11 Plus, a qualitative and mixed methods research suite, is its sentiment analysis tool; this enables the autocoding of unlabeled and unstructured text corpora against a built-in sentiment dictionary. The software labels selected texts into four categories: (1) very negative, (2) moderately negative, (3) moderately positive, and (4) very positive. After the initial coding for sentiment, there are many ways to augment that initial coding, including theme and subtheme extraction, word frequency counts, text searches, sociogram mapping, geolocational mapping, data visualizations, and others. This chapter provides a light overview of how the sentiment analysis feature in NVivo 11 Plus works, proposes some insights about the proper unit of analysis for sentiment analyses (sentence, paragraph, or cell) based on text dataset features, and identifies ways to further explore the textual data post-sentiment analysis—to create coherence and insight.


Author(s):  
Bishrul Haq ◽  
Ghulam Mujtaba ◽  
Zahid Hussain Khand ◽  
Javed Ahmad ◽  
Zafar Ali

COVID-19 has become one of the most highly orated subject matter in these days. Countries have taken many viable actions to prevent the spread of the virus directed by international recommendations, which led to many disputes concerning wearing a face mask as a preventive measure against the virus. This study aims to assess and compare the overall accuracy, macro precision, macro F-measure and macro recall of the different decision models towards the COVID-19 mask-wearing practices via sentiment analysis. Tweets are labeled and text pre-processing techniques are applied as stemming, normalization, tokenization, and stop-word removal. Subsequently, the tweets are transformed into master feature vectors by applying various feature extraction, feature representation, feature selection and word embedding techniques with five supervised machine learning decision models to predict mask wearing practices reinforced from Twitter tweets. Moreover, the highest macro F-measure and macro precision are found with feature extraction as hybrid-grams, feature representation as TF-IDF, feature selection as Chi-Squared Test, and highest macro recall with feature extraction as BOW, feature representation as TF-IDF, feature selection as ANOVA F-value. Hence, this study concludes that the Naive Bayes (NB) algorithm outperforms other decision models with master feature vectors applied. In addition, it also outperforms word embedding techniques.


2020 ◽  
Author(s):  
Pathikkumar Patel ◽  
Bhargav Lad ◽  
Jinan Fiaidhi

During the last few years, RNN models have been extensively used and they have proven to be better for sequence and text data. RNNs have achieved state-of-the-art performance levels in several applications such as text classification, sequence to sequence modelling and time series forecasting. In this article we will review different Machine Learning and Deep Learning based approaches for text data and look at the results obtained from these methods. This work also explores the use of transfer learning in NLP and how it affects the performance of models on a specific application of sentiment analysis.


2019 ◽  
Vol 13 (1) ◽  
pp. 20-27 ◽  
Author(s):  
Srishty Jindal ◽  
Kamlesh Sharma

Background: With the tremendous increase in the use of social networking sites for sharing the emotions, views, preferences etc. a huge volume of data and text is available on the internet, there comes the need for understanding the text and analysing the data to determine the exact intent behind the same for a greater good. This process of understanding the text and data involves loads of analytical methods, several phases and multiple techniques. Efficient use of these techniques is important for an effective and relevant understanding of the text/data. This analysis can in turn be very helpful in ecommerce for targeting audience, social media monitoring for anticipating the foul elements from society and take proactive actions to avoid unethical and illegal activities, business analytics, market positioning etc. Method: The goal is to understand the basic steps involved in analysing the text data which can be helpful in determining sentiments behind them. This review provides detailed description of steps involved in sentiment analysis with the recent research done. Patents related to sentiment analysis and classification are reviewed to throw some light in the work done related to the field. Results: Sentiment analysis determines the polarity behind the text data/review. This analysis helps in increasing the business revenue, e-health, or determining the behaviour of a person. Conclusion: This study helps in understanding the basic steps involved in natural language understanding. At each step there are multiple techniques that can be applied on data. Different classifiers provide variable accuracy depending upon the data set and classification technique used.


2021 ◽  
pp. 1-14
Author(s):  
Hamed Zargari ◽  
Morteza Zahedi ◽  
Marziea Rahimi

Words are one of the most essential elements of expressing sentiments in context although they are not the only ones. Also, syntactic relationships between words, morphology, punctuation, and linguistic phenomena are influential. Merely considering the concept of words as isolated phenomena causes a lot of mistakes in sentiment analysis systems. So far, a large amount of research has been conducted on generating sentiment dictionaries containing only sentiment words. A number of these dictionaries have addressed the role of combinations of sentiment words, negators, and intensifiers, while almost none of them considered the heterogeneous effect of the occurrence of multiple linguistic phenomena in sentiment compounds. Regarding the weaknesses of the existing sentiment dictionaries, in addressing the heterogeneous effect of the occurrence of multiple intensifiers, this research presents a sentiment dictionary based on the analysis of sentiment compounds including sentiment words, negators, and intensifiers by considering the multiple intensifiers relative to the sentiment word and assigning a location-based coefficient to the intensifier, which increases the covered sentiment phrase in the dictionary, and enhanced efficiency of proposed dictionary-based sentiment analysis methods up to 7% compared to the latest methods.


Symmetry ◽  
2019 ◽  
Vol 11 (1) ◽  
pp. 115 ◽  
Author(s):  
Yaocheng Zhang ◽  
Wei Ren ◽  
Tianqing Zhu ◽  
Ehoche Faith

The development of mobile internet has led to a massive amount of data being generated from mobile devices daily, which has become a source for analyzing human behavior and trends in public sentiment. In this paper, we build a system called MoSa (Mobile Sentiment analysis) to analyze this data. In this system, sentiment analysis is used to analyze news comments on the THAAD (Terminal High Altitude Area Defense) event from Toutiao by employing algorithms to calculate the sentiment value of the comment. This paper is based on HowNet; after the comparison of different sentiment dictionaries, we discover that the method proposed in this paper, which use a mixed sentiment dictionary, has a higher accuracy rate in its analysis of comment sentiment tendency. We then statistically analyze the relevant attributes of the comments and their sentiment values and discover that the standard deviation of the comments’ sentiment value can quickly reflect sentiment changes among the public. Besides that, we also derive some special models from the data that can reflect some specific characteristics. We find that the intrinsic characteristics of situational awareness have implicit symmetry. By using our system, people can obtain some practical results to guide interaction design in applications including mobile Internet, social networks, and blockchain based crowdsourcing.


Sign in / Sign up

Export Citation Format

Share Document