scholarly journals Lexical data augmentation for sentiment analysis

Author(s):  
Rong Xiang ◽  
Emmanuele Chersoni ◽  
Qin Lu ◽  
Chu‐Ren Huang ◽  
Wenjie Li ◽  
...  
2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Huu-Thanh Duong ◽  
Tram-Anh Nguyen-Thi

AbstractIn literature, the machine learning-based studies of sentiment analysis are usually supervised learning which must have pre-labeled datasets to be large enough in certain domains. Obviously, this task is tedious, expensive and time-consuming to build, and hard to handle unseen data. This paper has approached semi-supervised learning for Vietnamese sentiment analysis which has limited datasets. We have summarized many preprocessing techniques which were performed to clean and normalize data, negation handling, intensification handling to improve the performances. Moreover, data augmentation techniques, which generate new data from the original data to enrich training data without user intervention, have also been presented. In experiments, we have performed various aspects and obtained competitive results which may motivate the next propositions.


Author(s):  
Peilian Zhao ◽  
Cunli Mao ◽  
Zhengtao Yu

Aspect-Based Sentiment Analysis (ABSA), a fine-grained task of opinion mining, which aims to extract sentiment of specific target from text, is an important task in many real-world applications, especially in the legal field. Therefore, in this paper, we study the problem of limitation of labeled training data required and ignorance of in-domain knowledge representation for End-to-End Aspect-Based Sentiment Analysis (E2E-ABSA) in legal field. We proposed a new method under deep learning framework, named Semi-ETEKGs, which applied E2E framework using knowledge graph (KG) embedding in legal field after data augmentation (DA). Specifically, we pre-trained the BERT embedding and in-domain KG embedding for unlabeled data and labeled data with case elements after DA, and then we put two embeddings into the E2E framework to classify the polarity of target-entity. Finally, we built a case-related dataset based on a popular benchmark for ABSA to prove the efficiency of Semi-ETEKGs, and experiments on case-related dataset from microblog comments show that our proposed model outperforms the other compared methods significantly.


2021 ◽  
Author(s):  
Ting-Wei Hsu ◽  
Chung-Chi Chen ◽  
Hen-Hsen Huang ◽  
Hsin-Hsi Chen

2021 ◽  
Vol 7 ◽  
pp. e816
Author(s):  
Heng-yang Lu ◽  
Jun Yang ◽  
Cong Hu ◽  
Wei Fang

Background Fine-grained sentiment analysis is used to interpret consumers’ sentiments, from their written comments, towards specific entities on specific aspects. Previous researchers have introduced three main tasks in this field (ABSA, TABSA, MEABSA), covering all kinds of social media data (e.g., review specific, questions and answers, and community-based). In this paper, we identify and address two common challenges encountered in these three tasks, including the low-resource problem and the sentiment polarity bias. Methods We propose a unified model called PEA by integrating data augmentation methodology with the pre-trained language model, which is suitable for all the ABSA, TABSA and MEABSA tasks. Two data augmentation methods, which are entity replacement and dual noise injection, are introduced to solve both challenges at the same time. An ensemble method is also introduced to incorporate the results of the basic RNN-based and BERT-based models. Results PEA shows significant improvements on all three fine-grained sentiment analysis tasks when compared with state-of-the-art models. It also achieves comparable results with what the baseline models obtain while using only 20% of their training data, which demonstrates its extraordinary performance under extreme low-resource conditions.


Author(s):  
Vincent Karas ◽  
Björn W. Schuller

Sentiment analysis is an important area of natural language processing that can help inform business decisions by extracting sentiment information from documents. The purpose of this chapter is to introduce the reader to selected concepts and methods of deep learning and show how deep models can be used to increase performance in sentiment analysis. It discusses the latest advances in the field and covers topics including traditional sentiment analysis approaches, the fundamentals of sentence modelling, popular neural network architectures, autoencoders, attention modelling, transformers, data augmentation methods, the benefits of transfer learning, the potential of adversarial networks, and perspectives on explainable AI. The authors' intent is that through this chapter, the reader can gain an understanding of recent developments in this area as well as current trends and potentials for future research.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 1001
Author(s):  
Hu Ng ◽  
Glenn Jun Weng Chia ◽  
Timothy Tzen Vun Yap ◽  
Vik Tor Goh

Background: The proliferation of digital commerce has allowed merchants to reach out to a wider customer base, prompting a study of customer reviews to gauge service and product quality through sentiment analysis. Sentiment analysis can be enhanced through subjectivity and objectivity classification with attention mechanisms. Methods: This research includes input corpora of contrasting levels of subjectivity and objectivity from different databases to perform sentiment analysis on user reviews, incorporating attention mechanisms at the aspect level. Three large corpora are chosen as the subjectivity and objectivity datasets, the Shopee user review dataset (ShopeeRD) for subjectivity, together with the Wikipedia English dataset (Wiki-en) and Internet Movie Database (IMDb) for objectivity. Word embeddings are created using Word2Vec with Skip-Gram. Then, a bidirectional LSTM with an attention layer (LSTM-ATT) imposed on word vectors. The performance of the model is evaluated and benchmarked against classification models of Logistics Regression (LR) and Linear SVC (L-SVC). Three models are trained with subjectivity (70% of ShopeeRD) and the objectivity (Wiki-en) embeddings, with ten-fold cross-validation. Next, the three models are evaluated against two datasets (IMDb and 20% of ShopeeRD). The experiments are based on benchmark comparisons, embedding comparison and model comparison with 70-10-20 train-validation-test splits. Data augmentation using AUG-BERT is performed and selected models incorporating AUG-BERT, are compared. Results: L-SVC scored the highest accuracy with 56.9% for objective embeddings (Wiki-en) while the LSTM-ATT scored 69.0% on subjective embeddings (ShopeeRD).  Improved performances were observed with data augmentation using AUG-BERT, where the LSTM-ATT+AUG-BERT model scored the highest accuracy at 60.0% for objective embeddings and 70.0% for subjective embeddings, compared to 57% (objective) and 69% (subjective) for L-SVC+AUG-BERT, and 56% (objective) and 68% (subjective) for L-SVC. Conclusions: Utilizing attention layers with subjectivity and objectivity notions has shown improvement to the accuracy of sentiment analysis models.


Sign in / Sign up

Export Citation Format

Share Document