ASA: A framework for Arabic sentiment analysis

2019 ◽  
Vol 46 (4) ◽  
pp. 544-559 ◽  
Author(s):  
Ahmed Oussous ◽  
Fatima-Zahra Benjelloun ◽  
Ayoub Ait Lahcen ◽  
Samir Belfkih

Sentiment analysis (SA), also known as opinion mining, is a growing important research area. Generally, it helps to automatically determine if a text expresses a positive, negative or neutral sentiment. It enables to mine the huge increasing resources of shared opinions such as social networks, review sites and blogs. In fact, SA is used by many fields and for various languages such as English and Arabic. However, since Arabic is a highly inflectional and derivational language, it raises many challenges. In fact, SA of Arabic text should handle such complex morphology. To better handle these challenges, we decided to provide the research community and Arabic users with a new efficient framework for Arabic Sentiment Analysis (ASA). Our primary goal is to improve the performance of ASA by exploiting deep learning while varying the preprocessing techniques. For that, we implement and evaluate two deep learning models namely convolutional neural network (CNN) and long short-term memory (LSTM) models. The framework offers various preprocessing techniques for ASA (including stemming, normalisation, tokenization and stop words). As a result of this work, we first provide a new rich and publicly available Arabic corpus called Moroccan Sentiment Analysis Corpus (MSAC). Second, the proposed framework demonstrates improvement in ASA. In fact, the experimental results prove that deep learning models have a better performance for ASA than classical approaches (support vector machines, naive Bayes classifiers and maximum entropy). They also show the key role of morphological features in Arabic Natural Language Processing (NLP).

Author(s):  
Ali Bou Nassif ◽  
Abdollah Masoud Darya ◽  
Ashraf Elnagar

This work presents a detailed comparison of the performance of deep learning models such as convolutional neural networks, long short-term memory, gated recurrent units, their hybrids, and a selection of shallow learning classifiers for sentiment analysis of Arabic reviews. Additionally, the comparison includes state-of-the-art models such as the transformer architecture and the araBERT pre-trained model. The datasets used in this study are multi-dialect Arabic hotel and book review datasets, which are some of the largest publicly available datasets for Arabic reviews. Results showed deep learning outperforming shallow learning for binary and multi-label classification, in contrast with the results of similar work reported in the literature. This discrepancy in outcome was caused by dataset size as we found it to be proportional to the performance of deep learning models. The performance of deep and shallow learning techniques was analyzed in terms of accuracy and F1 score. The best performing shallow learning technique was Random Forest followed by Decision Tree, and AdaBoost. The deep learning models performed similarly using a default embedding layer, while the transformer model performed best when augmented with araBERT.


2020 ◽  
Vol 26 (6) ◽  
pp. 85-93
Author(s):  
Abdulhakeem Qusay Al-Bayati ◽  
Ahmed S. Al-Araji ◽  
Saman Hameed Ameen

Sentiment analysis is one of the major fields in natural language processing whose main task is to extract sentiments, opinions, attitudes, and emotions from a subjective text. And for its importance in decision making and in people's trust with reviews on web sites, there are many academic researches to address sentiment analysis problems. Deep Learning (DL) is a powerful Machine Learning (ML) technique that has emerged with its ability of feature representation and differentiating data, leading to state-of-the-art prediction results. In recent years, DL has been widely used in sentiment analysis, however, there is scarce in its implementation in the Arabic language field. Most of the previous researches address other languages like English. The proposed model tackles Arabic Sentiment Analysis (ASA) by using a DL approach. ASA is a challenging field where Arabic language has a rich morphological structure more than other languages. In this work, Long Short-Term Memory (LSTM) as a deep neural network has been used for training the model combined with word embedding as a first hidden layer for features extracting. The results show an accuracy of about 82% is achievable using DL method.


Information ◽  
2021 ◽  
Vol 12 (9) ◽  
pp. 374
Author(s):  
Babacar Gaye ◽  
Dezheng Zhang ◽  
Aziguli Wulamu

With the extensive availability of social media platforms, Twitter has become a significant tool for the acquisition of peoples’ views, opinions, attitudes, and emotions towards certain entities. Within this frame of reference, sentiment analysis of tweets has become one of the most fascinating research areas in the field of natural language processing. A variety of techniques have been devised for sentiment analysis, but there is still room for improvement where the accuracy and efficacy of the system are concerned. This study proposes a novel approach that exploits the advantages of the lexical dictionary, machine learning, and deep learning classifiers. We classified the tweets based on the sentiments extracted by TextBlob using a stacked ensemble of three long short-term memory (LSTM) as base classifiers and logistic regression (LR) as a meta classifier. The proposed model proved to be effective and time-saving since it does not require feature extraction, as LSTM extracts features without any human intervention. We also compared our proposed approach with conventional machine learning models such as logistic regression, AdaBoost, and random forest. We also included state-of-the-art deep learning models in comparison with the proposed model. Experiments were conducted on the sentiment140 dataset and were evaluated in terms of accuracy, precision, recall, and F1 Score. Empirical results showed that our proposed approach manifested state-of-the-art results by achieving an accuracy score of 99%.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Cach N. Dang ◽  
María N. Moreno-García ◽  
Fernando De la Prieta

Sentiment analysis on public opinion expressed in social networks, such as Twitter or Facebook, has been developed into a wide range of applications, but there are still many challenges to be addressed. Hybrid techniques have shown to be potential models for reducing sentiment errors on increasingly complex training data. This paper aims to test the reliability of several hybrid techniques on various datasets of different domains. Our research questions are aimed at determining whether it is possible to produce hybrid models that outperform single models with different domains and types of datasets. Hybrid deep sentiment analysis learning models that combine long short-term memory (LSTM) networks, convolutional neural networks (CNN), and support vector machines (SVM) are built and tested on eight textual tweets and review datasets of different domains. The hybrid models are compared against three single models, SVM, LSTM, and CNN. Both reliability and computation time were considered in the evaluation of each technique. The hybrid models increased the accuracy for sentiment analysis compared with single models on all types of datasets, especially the combination of deep learning models with SVM. The reliability of the latter was significantly higher.


Author(s):  
Youssra Zahidi ◽  
Yacine El Younoussi ◽  
Yassine Al-Amrani

Deep learning (DL) is a machine learning (ML) subdomain that involves algorithms taken from the brain function named artificial neural networks (ANNs). Recently, DL approaches have gained major accomplishments across various Arabic natural language processing (ANLP) tasks, especially in the domain of Arabic sentiment analysis (ASA). For working on Arabic SA, researchers can use various DL libraries in their projects, but without justifying their choice or they choose a group of libraries relying on their particular programming language familiarity. We are basing in this work on Java and Python programming languages because they have a large set of deep learning libraries that are very useful in the ASA domain. This paper focuses on a comparative analysis of different valuable Python and Java libraries to conclude the most relevant and robust DL libraries for ASA. Throw this comparative analysis, and we find that: TensorFlow, Theano, and Keras Python frameworks are very popular and very used in this research domain.


2019 ◽  
Vol 9 (13) ◽  
pp. 2760 ◽  
Author(s):  
Khai Tran ◽  
Thi Phan

Sentiment analysis is an active research area in natural language processing. The task aims at identifying, extracting, and classifying sentiments from user texts in post blogs, product reviews, or social networks. In this paper, the ensemble learning model of sentiment classification is presented, also known as CEM (classifier ensemble model). The model contains various data feature types, including language features, sentiment shifting, and statistical techniques. A deep learning model is adopted with word embedding representation to address explicit, implicit, and abstract sentiment factors in textual data. The experiments conducted based on different real datasets found that our sentiment classification system is better than traditional machine learning techniques, such as Support Vector Machines and other ensemble learning systems, as well as the deep learning model, Long Short-Term Memory network, which has shown state-of-the-art results for sentiment analysis in almost corpuses. Our model’s distinguishing point consists in its effective application to different languages and different domains.


Sentiment Analysis (SA) is a field of Natural Language Processing (NLP) whose goal is to extract the emotion, sentiment or more general opinion expressed in a human-written text. Opinions and emotions play a central role in human life. Therefore, there are many academic researches in this field for processing many languages like English However, there is scarce in its implementation with addressing Arabic Sentiment Analysis (ASA). It is a challenging field where Arabic language has a rich morphological structure and there are many other defies more than in other languages. For that, the proposed model tackles ASA by using a Deep Learning approach. In this work, one of word embedding methods, such as a first hidden layer for features extracting from the input dataset and Long Short-Term Memory (LSTM) as a deep neural network, has been used for training. The model combined with Softmax layer is applied to turn numeric outputs from LSTM layer into probabilities to classify the outputs to positive or negative. There are two datasets that are used for training the model separately with each one. The first one is ASTD dataset as a dialectal Arabic type about different tweets from internet, the results with this dataset is compared with another academic work that used the same one. The results from this work outperforms through accuracy about 14.95% and F-score about 15.14% more than what performed in the previous work. The second one is HTL dataset as a modern standard Arabic type about opinions of reviewers on different hotels from several countries. This dataset is bigger in size than the first one to show the size effect on the results of this model. So, the accuracy increased about 11% and F-score about 10.8% more than what performed with the first dataset.


2021 ◽  
pp. 016555152110065
Author(s):  
Rahma Alahmary ◽  
Hmood Al-Dossari

Sentiment analysis (SA) aims to extract users’ opinions automatically from their posts and comments. Almost all prior works have used machine learning algorithms. Recently, SA research has shown promising performance in using the deep learning approach. However, deep learning is greedy and requires large datasets to learn, so it takes more time for data annotation. In this research, we proposed a semiautomatic approach using Naïve Bayes (NB) to annotate a new dataset in order to reduce the human effort and time spent on the annotation process. We created a dataset for the purpose of training and testing the classifier by collecting Saudi dialect tweets. The dataset produced from the semiautomatic model was then used to train and test deep learning classifiers to perform Saudi dialect SA. The accuracy achieved by the NB classifier was 83%. The trained semiautomatic model was used to annotate the new dataset before it was fed into the deep learning classifiers. The three deep learning classifiers tested in this research were convolutional neural network (CNN), long short-term memory (LSTM) and bidirectional long short-term memory (Bi-LSTM). Support vector machine (SVM) was used as the baseline for comparison. Overall, the performance of the deep learning classifiers exceeded that of SVM. The results showed that CNN reported the highest performance. On one hand, the performance of Bi-LSTM was higher than that of LSTM and SVM, and, on the other hand, the performance of LSTM was higher than that of SVM. The proposed semiautomatic annotation approach is usable and promising to increase speed and save time and effort in the annotation process.


2018 ◽  
Vol 7 (2.14) ◽  
pp. 5726
Author(s):  
Oumaima Hourrane ◽  
El Habib Benlahmar ◽  
Ahmed Zellou

Sentiment analysis is one of the new absorbing parts appeared in natural language processing with the emergence of community sites on the web. Taking advantage of the amount of information now available, research and industry have been seeking ways to automatically analyze the sentiments expressed in texts. The challenge for this task is the human language ambiguity, and also the lack of labeled data. In order to solve this issue, sentiment analysis and deep learning have been merged as deep learning models are effective due to their automatic learning capability. In this paper, we provide a comparative study on IMDB movie review dataset, we compare word embeddings and further deep learning models on sentiment analysis and give broad empirical outcomes for those keen on taking advantage of deep learning for sentiment analysis in real-world settings.


Computers ◽  
2019 ◽  
Vol 8 (1) ◽  
pp. 4 ◽  
Author(s):  
Jurgita Kapočiūtė-Dzikienė ◽  
Robertas Damaševičius ◽  
Marcin Woźniak

We describe the sentiment analysis experiments that were performed on the Lithuanian Internet comment dataset using traditional machine learning (Naïve Bayes Multinomial—NBM and Support Vector Machine—SVM) and deep learning (Long Short-Term Memory—LSTM and Convolutional Neural Network—CNN) approaches. The traditional machine learning techniques were used with the features based on the lexical, morphological, and character information. The deep learning approaches were applied on the top of two types of word embeddings (Vord2Vec continuous bag-of-words with negative sampling and FastText). Both traditional and deep learning approaches had to solve the positive/negative/neutral sentiment classification task on the balanced and full dataset versions. The best deep learning results (reaching 0.706 of accuracy) were achieved on the full dataset with CNN applied on top of the FastText embeddings, replaced emoticons, and eliminated diacritics. The traditional machine learning approaches demonstrated the best performance (0.735 of accuracy) on the full dataset with the NBM method, replaced emoticons, restored diacritics, and lemma unigrams as features. Although traditional machine learning approaches were superior when compared to the deep learning methods; deep learning demonstrated good results when applied on the small datasets.


Sign in / Sign up

Export Citation Format

Share Document