scholarly journals Weibo Text Sentiment Analysis Based on BERT and Deep Learning

2021 ◽  
Vol 11 (22) ◽  
pp. 10774
Author(s):  
Hongchan Li ◽  
Yu Ma ◽  
Zishuai Ma ◽  
Haodong Zhu

With the rapid increase of public opinion data, the technology of Weibo text sentiment analysis plays a more and more significant role in monitoring network public opinion. Due to the sparseness and high-dimensionality of text data and the complex semantics of natural language, sentiment analysis tasks face tremendous challenges. To solve the above problems, this paper proposes a new model based on BERT and deep learning for Weibo text sentiment analysis. Specifically, first using BERT to represent the text with dynamic word vectors and using the processed sentiment dictionary to enhance the sentiment features of the vectors; then adopting the BiLSTM to extract the contextual features of the text, the processed vector representation is weighted by the attention mechanism. After weighting, using the CNN to extract the important local sentiment features in the text, finally the processed sentiment feature representation is classified. A comparative experiment was conducted on the Weibo text dataset collected during the COVID-19 epidemic; the results showed that the performance of the proposed model was significantly improved compared with other similar models.

2021 ◽  
Vol 5 (3) ◽  
pp. 1018
Author(s):  
Widi Widayat

The increasing number of internet users is directly in line with the increasing number of data on the internet that is available for analysis, especially data in text form. The availability of this text data encourages a lot of sentiment analysis research. However, it turns out that the availability of abundant text data is also one of the challenges in sentiment analysis research. Datasets that consist of long and complex text documents require a different approach. In this study, LSTM was chosen to be used as a sentiment classification method. This research uses a movie review dataset that consists of 25,000 review documents, with an average length per review is 233 words. The research uses CBOW and Skip-Gram methods on word2vec to form a vector representation of each word (word vector) in the corpus data. Several dimensions of the word vector was used in this research, there are 50, 60, 100, 150, 200, and 500, this tuning parameter is used to determine their effect on the resulting accuracy. The best accuracy around 88.17% is obtained at the word vector 100 dimension and the lowest accuracy is 85.86% at the word vector 500 dimension.


2020 ◽  
Author(s):  
Pathikkumar Patel ◽  
Bhargav Lad ◽  
Jinan Fiaidhi

During the last few years, RNN models have been extensively used and they have proven to be better for sequence and text data. RNNs have achieved state-of-the-art performance levels in several applications such as text classification, sequence to sequence modelling and time series forecasting. In this article we will review different Machine Learning and Deep Learning based approaches for text data and look at the results obtained from these methods. This work also explores the use of transfer learning in NLP and how it affects the performance of models on a specific application of sentiment analysis.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Xinxin Lu ◽  
Hong Zhang

In order to solve the problems existing in the current method of emotional analysis of network text, such as long training time, complex calculation, and large space cost, this paper proposes an Internet text sentiment analysis method based on the improved AT-BiGRU model. Firstly, the textblob package is imported to correct spelling errors before text preprocessing. Secondly, pad_sequences are used to fill in the input layer with a fixed length, the two-way gated recurrent network is used to extract information, and the attention mechanism is used to highlight the key information of the word vector. Finally, the GNU memory unit is transformed, and an improved BiGRU that can adapt to the recursive network structure is constructed. The proposed model is experimentally demonstrated on the SemEval-2014 Task 4 and SemEval-2017 Task 4 datasets. Experimental results show that the proposed model can effectively avoid the text sentiment analysis bias caused by spelling errors and prove the effectiveness of the improved AT-BiGRU model in terms of accuracy, loss rate, and iteration time.


2021 ◽  
Vol 297 ◽  
pp. 01072
Author(s):  
Rajae Bensoltane ◽  
Taher Zaki

Aspect category detection (ACD) is a task of aspect-based sentiment analysis (ABSA) that aims to identify the discussed category in a given review or sentence from a predefined list of categories. ABSA tasks were widely studied in English; however, studies in other low-resource languages such as Arabic are still limited. Moreover, most of the existing Arabic ABSA work is based on rule-based or feature-based machine learning models, which require a tedious task of feature-engineering and the use of external resources like lexicons. Therefore, the aim of this paper is to overcome these shortcomings by handling the ACD task using a deep learning method based on a bidirectional gated recurrent unit model. Additionally, we examine the impact of using different vector representation models on the performance of the proposed model. The experimental results show that our model outperforms the baseline and related work models significantly by achieving an enhanced F1-score of more than 7%.


Vector representations for language have been shown to be useful in a number of Natural Language Processing tasks. In this paper, we aim to investigate the effectiveness of word vector representations for the problem of Sentiment Analysis. In particular, we target three sub-tasks namely sentiment words extraction, polarity of sentiment words detection, and text sentiment prediction. We investigate the effectiveness of vector representations over different text data and evaluate the quality of domain-dependent vectors. Vector representations has been used to compute various vector-based features and conduct systematically experiments to demonstrate their effectiveness. Using simple vector based features can achieve better results for text sentiment analysis of APP.


2020 ◽  
pp. 016555152091003
Author(s):  
Gyeong Taek Lee ◽  
Chang Ouk Kim ◽  
Min Song

Sentiment analysis plays an important role in understanding individual opinions expressed in websites such as social media and product review sites. The common approaches to sentiment analysis use the sentiments carried by words that express opinions and are based on either supervised or unsupervised learning techniques. The unsupervised learning approach builds a word-sentiment dictionary, but it requires lengthy time periods and high costs to build a reliable dictionary. The supervised learning approach uses machine learning models to learn the sentiment scores of words; however, training a classifier model requires large amounts of labelled text data to achieve a good performance. In this article, we propose a semisupervised approach that performs well despite having only small amounts of labelled data available for training. The proposed method builds a base sentiment dictionary from a small training dataset using a lasso-based ensemble model with minimal human effort. The scores of words not in the training dataset are estimated using an adaptive instance-based learning model. In a pretrained word2vec model space, the sentiment values of the words in the dictionary are propagated to the words that did not exist in the training dataset. Through two experiments, we demonstrate that the performance of the proposed method is comparable to that of supervised learning models trained on large datasets.


Sign in / Sign up

Export Citation Format

Share Document