Study Comparison Stemmer to Optimize Text Preprocessing In Sentiment Analysis Indonesian E-Commerce Reviews

During the COVID-19 pandemic, many research studies have been conducted to examine the impact of the outbreak on the financial sector, especially on cryptocurrencies. Social media, such as Twitter, plays a significant role as a meaningful indicator in forecasting the Bitcoin (BTC) prices. However, there is a research gap in determining the optimal preprocessing strategy in BTC tweets to develop an accurate machine learning prediction model for bitcoin prices. This paper develops different text preprocessing strategies for correlating the sentiment scores of Twitter text with Bitcoin prices during the COVID-19 pandemic. We explore the effect of different preprocessing functions, features, and time lengths of data on the correlation results. Out of 13 strategies, we discover that splitting sentences, removing Twitter-specific tags, or their combination generally improve the correlation of sentiment scores and volume polarity scores with Bitcoin prices. The prices only correlate well with sentiment scores over shorter timespans. Selecting the optimum preprocessing strategy would prompt machine learning prediction models to achieve better accuracy as compared to the actual prices.

Download Full-text

Choosing The Most Optimum Text Preprocessing Method for Sentiment Analysis: Case:iPhone Tweets

2019 Fourth International Conference on Informatics and Computing (ICIC) ◽

10.1109/icic47613.2019.8985943 ◽

2019 ◽

Author(s):

Fero Resyanto ◽

Yuliant Sibaroni ◽

Ade Romadhony

Keyword(s):

Sentiment Analysis ◽

Preprocessing Method ◽

Text Preprocessing

Download Full-text

Assessing the Impact of Text Preprocessing in Sentiment Analysis of Short Social Network Messages in the Russian Language

2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI) ◽

10.1109/icdabi51230.2020.9325654 ◽

2020 ◽

Author(s):

Egor Araslanov ◽

Evgeniy Komotskiy ◽

Ebenezer Agbozo

Keyword(s):

Social Network ◽

Sentiment Analysis ◽

Russian Language ◽

Text Preprocessing ◽

The Impact ◽

The Russian Language

Download Full-text

CAMPUS SENTIMENT ANALYSIS E-COMPLAINT USING PROBABILISTIC NEURAL NETWORK ALGORITHM

Kursor ◽

10.28961/kursor.v8i3.88 ◽

2017 ◽

Vol 8 (3) ◽

pp. 135

Author(s):

Mohammad Zoqi Sarwani

Keyword(s):

Neural Network ◽

Sentiment Analysis ◽

Probabilistic Neural Network ◽

Training Data ◽

Electronic Systems ◽

Testing Dataset ◽

Testing Data ◽

Neural Network Algorithm ◽

Last Stage ◽

Text Preprocessing

E-complaint is one of the technologies which is used to collect feedback from customers in the form of criticism and suggestions using electronic systems. For some companies or agencies, ecomplaint is used to provide better services to its customers. This study is aimed to perform sentiment analysis of an e-complaint service, with the case of Brawijaya University. There are three main stages for the proposed system, i.e. Text Preprocessing, Text Weighting, and PNN forthe classification. Tokenization, filtering, and stemming are done in the text preprocessing. Resulted text from the preprocessing stage is weighting using Term Inverse Document Frequent (TFIDF). To classify the negative or positive complaints, PNN are used in the last stage. For the experiments, 70 data are used as the training data, and 20 data are used as the testing data. The experimental results based on the combination of the number of training and testing dataset, showed that the accuracy achieved up to 90%.

Download Full-text

On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis

10.18653/v1/w18-5406 ◽

2018 ◽

Cited By ~ 11

Author(s):

Jose Camacho-Collados ◽

Mohammad Taher Pilehvar

Keyword(s):

Neural Network ◽

Sentiment Analysis ◽

Text Categorization ◽

Evaluation Study ◽

Network Architectures ◽

Text Preprocessing ◽

Neural Network Architectures

Download Full-text

Persian sentiment analysis of an online store independent of pre-processing using convolutional neural network with fastText embeddings

PeerJ Computer Science ◽

10.7717/peerj-cs.422 ◽

2021 ◽

Vol 7 ◽

pp. e422

Author(s):

Sajjad Shumaly ◽

Mohsen Yazdinejad ◽

Yanhui Guo

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Sentiment Analysis ◽

Web Mining ◽

Word Embedding ◽

Processing Stage ◽

Online Store ◽

Persian Language ◽

Text Preprocessing ◽

Better Than

Sentiment analysis plays a key role in companies, especially stores, and increasing the accuracy in determining customers’ opinions about products assists to maintain their competitive conditions. We intend to analyze the users’ opinions on the website of the most immense online store in Iran; Digikala. However, the Persian language is unstructured which makes the pre-processing stage very difficult and it is the main problem of sentiment analysis in Persian. What exacerbates this problem is the lack of available libraries for Persian pre-processing, while most libraries focus on English. To tackle this, approximately 3 million reviews were gathered in Persian from the Digikala website using web-mining techniques, and the fastText method was used to create a word embedding. It was assumed that this would dramatically cut down on the need for text pre-processing through the skip-gram method considering the position of the words in the sentence and the words’ relations to each other. Another word embedding has been created using the TF-IDF in parallel with fastText to compare their performance. In addition, the results of the Convolutional Neural Network (CNN), BiLSTM, Logistic Regression, and Naïve Bayes models have been compared. As a significant result, we obtained 0.996 AUC and 0.956 F-score using fastText and CNN. In this article, not only has it been demonstrated to what extent it is possible to be independent of pre-processing but also the accuracy obtained is better than other researches done in Persian. Avoiding complex text preprocessing is also important for other languages since most text preprocessing algorithms have been developed for English and cannot be used for other languages. The created word embedding due to its high accuracy and independence of pre-processing has other applications in Persian besides sentiment analysis.

Download Full-text

Novel Text Preprocessing Framework for Sentiment Analysis

Smart Intelligent Computing and Applications - Smart Innovation, Systems and Technologies ◽

10.1007/978-981-13-1927-3_33 ◽

2018 ◽

pp. 309-317 ◽

Cited By ~ 2

Author(s):

C. S. Pavan Kumar ◽

L. D. Dhinesh Babu

Keyword(s):

Sentiment Analysis ◽

Text Preprocessing

Download Full-text

Toward Improving Arabic Text Preprocessing in Sentiment Analysis

Digital Technologies and Applications - Lecture Notes in Networks and Systems ◽

10.1007/978-3-030-73882-2_63 ◽

2021 ◽

pp. 695-705

Author(s):

Mohcine Maghfour ◽

Abdeljalil Elouardighi

Keyword(s):

Sentiment Analysis ◽

Arabic Text ◽

Text Preprocessing

Download Full-text

Improving Arabic Sentiment Analysis Using CNN-Based Architectures and Text Preprocessing

Computational Intelligence and Neuroscience ◽

10.1155/2021/5538791 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Mustafa Mhamed ◽

Richard Sutcliffe ◽

Xia Sun ◽

Jun Feng ◽

Eiad Almekhlafi ◽

...

Keyword(s):

Natural Language ◽

Sentiment Analysis ◽

Dense Layer ◽

Close Attention ◽

Global Average ◽

Max Pooling ◽

Text Preprocessing ◽

Ablation Study ◽

Arabic Sentiment Analysis

Sentiment analysis is an essential process which is important to many natural language applications. In this paper, we apply two models for Arabic sentiment analysis to the ASTD and ATDFS datasets, in both 2-class and multiclass forms. Model MC1 is a 2-layer CNN with global average pooling, followed by a dense layer. MC2 is a 2-layer CNN with max pooling, followed by a BiGRU and a dense layer. On the difficult ASTD 4-class task, we achieve 73.17%, compared to 65.58% reported by Attia et al., 2018. For the easier 2-class task, we achieve 90.06% with MC1 compared to 85.58% reported by Kwaik et al., 2019. We carry out experiments on various data splits, to match those used by other researchers. We also pay close attention to Arabic preprocessing and include novel steps not reported in other works. In an ablation study, we investigate the effect of two steps in particular, the processing of emoticons and the use of a custom stoplist. On the 4-class task, these can make a difference of up to 4.27% and 5.48%, respectively. On the 2-class task, the maximum improvements are 2.95% and 3.87%.

Download Full-text