Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis

Arabic language is a challenging language for automatic processing. This is due to several intrinsic reasons such as Arabic multi-dialects, ambiguous syntax, syntactical flexibility and diacritics. Machine learning and deep learning frameworks require big datasets for training to ensure accurate predictions. This leads to another challenge faced by researches using Arabic text; as Arabic textual datasets of high quality are still scarce. In this paper, an intelligent framework for expanding or augmenting Arabic sentences is presented. The sentences were initially labelled by human annotators for sentiment analysis. The novel approach presented in this work relies on the rich morphology of Arabic, synonymy lists, syntactical or grammatical rules, and negation rules to generate new sentences from the seed sentences with their proper labels. Most augmentation techniques target image or video data. This study is the first work to target text augmentation for Arabic language. Using this framework, we were able to increase the size of the initial seed datasets by 10 folds. Experiments that assess the impact of this augmentation on sentiment analysis showed a 42% average increase in accuracy, due to the reliability and the high quality of the rules used to build this framework.

Download Full-text

Arabic Sentiment Analysis (ASA) Using Deep Learning Approach

Journal of Engineering ◽

10.31026/j.eng.2020.06.07 ◽

2020 ◽

Vol 26 (6) ◽

pp. 85-93

Author(s):

Abdulhakeem Qusay Al-Bayati ◽

Ahmed S. Al-Araji ◽

Saman Hameed Ameen

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Web Sites ◽

Short Term Memory ◽

Morphological Structure ◽

Arabic Language ◽

Feature Representation ◽

Main Task ◽

Arabic Sentiment Analysis

Sentiment analysis is one of the major fields in natural language processing whose main task is to extract sentiments, opinions, attitudes, and emotions from a subjective text. And for its importance in decision making and in people's trust with reviews on web sites, there are many academic researches to address sentiment analysis problems. Deep Learning (DL) is a powerful Machine Learning (ML) technique that has emerged with its ability of feature representation and differentiating data, leading to state-of-the-art prediction results. In recent years, DL has been widely used in sentiment analysis, however, there is scarce in its implementation in the Arabic language field. Most of the previous researches address other languages like English. The proposed model tackles Arabic Sentiment Analysis (ASA) by using a DL approach. ASA is a challenging field where Arabic language has a rich morphological structure more than other languages. In this work, Long Short-Term Memory (LSTM) as a deep neural network has been used for training the model combined with word embedding as a first hidden layer for features extracting. The results show an accuracy of about 82% is achievable using DL method.

Download Full-text

Sentiment Analysis of Arabic Documents

Advances in Business Information Systems and Analytics - Natural Language Processing for Global and Local Business ◽

10.4018/978-1-7998-4240-8.ch013 ◽

2021 ◽

pp. 307-331

Author(s):

Hichem Rahab ◽

Mahieddine Djoudi ◽

Abdelhafid Zitouni

Keyword(s):

Decision Making ◽

Sentiment Analysis ◽

Arabic Language ◽

Rule Based ◽

Arabic Speakers ◽

European Languages ◽

Internet Users ◽

Arabic Sentiment Analysis ◽

Considerable Work ◽

The Web

Today, it is usual that a consumer seeks for others' feelings about their purchasing experience on the web before a simple decision of buying a product or a service. Sentiment analysis intends to help people in taking profit from the available opinionated texts on the web for their decision making, and business is one of its challenging areas. Considerable work of sentiment analysis has been achieved in English and other Indo-European languages. Despite the important number of Arabic speakers and internet users, studies in Arabic sentiment analysis are still insufficient. The current chapter vocation is to give the main challenges of Arabic sentiment together with their recent proposed solutions in the literature. The chapter flowchart is presented in a novel manner that obtains the main challenges from presented literature works. Then it gives the proposed solutions for each challenge. The chapter reaches the finding that the future tendency will be toward rule-based techniques and deep learning, allowing for more dealings with Arabic language inherent characteristics.

Download Full-text

THE IMPACT OF THE MORPHOLOGICAL AFFIXES IN THE LINGUISTIC ECONOMY“A COMPARATIVE STUDY BETWEEN HEBREW AND ARABIC LANGUAGE”

RIMAK International Journal of Humanities and Social Sciences ◽

10.47832/2717-8293.4-3.32 ◽

2021 ◽

Vol 3 (4) ◽

pp. 320-338

Author(s):

Ibtisam Jebur MNEHIL, Ban Salih Mahdi AL KHAFAJI ◽

Rasheed Ghazwan MAJEED

Keyword(s):

Comparative Study ◽

Research Paper ◽

Arabic Language ◽

The Other ◽

Foreign Languages ◽

Great Similarity ◽

The Third ◽

Hebrew Language ◽

The Rich ◽

The Impact

The research paper focuses on the morphological affixes in the two languages, Arabic and Hebrew and the impact of these affixes in the linguistic economy. The study aims at gaining knowledge of what linguistic economy achieved by morphological affixes which contribute in creating the rich meaning by little pronunciation as well as making a comparison between the two languages to know the language that is the most economic than the other and investigating the reasons behind this economy. The research is divided into three sections. The first one focuses on the morphological prefixes; the second one on the internal affixations; and the third one on morphological suffixes. The study concluded that there is a great similarity between the Hebrew and Arabic languages in many of the morphological affixations in addition to the simple differences between the two languages. An aspect of this difference is that the Hebrew language tends to borrow the affixations from the foreign languages more than the Arabic language.

Download Full-text

Negation Handling in Machine Learning-Based Sentiment Classification for Colloquial Arabic

International Journal of Operations Research and Information Systems ◽

10.4018/ijoris.2020100102 ◽

2020 ◽

Vol 11 (4) ◽

pp. 33-45

Author(s):

Omar Alharbi

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Positive Impact ◽

Machine Learning Algorithms ◽

Sentiment Classification ◽

Linguistic Knowledge ◽

Sentiment Lexicon ◽

Colloquial Arabic ◽

Arabic Sentiment Analysis ◽

The Impact

One crucial aspect of sentiment analysis is negation handling, where the occurrence of negation can flip the sentiment of a review and negatively affects the machine learning-based sentiment classification. The role of negation in Arabic sentiment analysis has been explored only to a limited extent, especially for colloquial Arabic. In this paper, the authors address the negation problem in colloquial Arabic sentiment classification using the machine learning approach. To this end, they propose a simple rule-based algorithm for handling the problem that affects the performance of a machine learning classifier. The rules were crafted based on observing many cases of negation, simple linguistic knowledge, and sentiment lexicon. They also examine the impact of the proposed algorithm on the performance of different machine learning algorithms. Furthermore, they compare the performance of the classifiers when their algorithm is used against three baselines. The experimental results show that there is a positive impact on the classifiers when the proposed algorithm is used compared to the baselines.

Download Full-text

A Hybrid Method of Linguistic and Statistical Features for Arabic Sentiment Analysis

Baghdad Science Journal ◽

10.21123/bsj.2020.17.1(suppl.).0385 ◽

2020 ◽

Vol 17 (1(Suppl.)) ◽

pp. 0385

Author(s):

Ahmed Sabah AL-Jumaili

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Hybrid Method ◽

Training Model ◽

Arabic Language ◽

Machine Learning Techniques ◽

Statistical Features ◽

Hybrid Features ◽

Pos Tagging ◽

Arabic Sentiment Analysis

Sentiment analysis refers to the task of identifying polarity of positive and negative for particular text that yield an opinion. Arabic language has been expanded dramatically in the last decade especially with the emergence of social websites (e.g. Twitter, Facebook, etc.). Several studies addressed sentiment analysis for Arabic language using various techniques. The most efficient techniques according to the literature were the machine learning due to their capabilities to build a training model. Yet, there is still issues facing the Arabic sentiment analysis using machine learning techniques. Such issues are related to employing robust features that have the ability to discriminate the polarity of sentiments. This paper proposes a hybrid method of linguistic and statistical features along with classification methods for Arabic sentiment analysis. Linguistic features contains stemming and POS tagging, while statistical contains the TF-IDF. A benchmark dataset of Arabic tweets have been used in the experiments. In addition, three classifiers have been utilized including SVM, KNN and ME. Results showed that SVM has outperformed the other classifiers by obtaining an f-score of 72.15%. This indicates the usefulness of using SVM with the proposed hybrid features.

Download Full-text

Ensemble Classifiers for Arabic Sentiment Analysis of Social Network (Twitter Data) towards COVID-19-Related Conspiracy Theories

Applied Computational Intelligence and Soft Computing ◽

10.1155/2022/6614730 ◽

2022 ◽

Vol 2022 ◽

pp. 1-10

Author(s):

Abdullah Al-Hashedi ◽

Belal Al-Fuhaidi ◽

Abdulqader M. Mohsen ◽

Yousef Ali ◽

Hasan Ali Gamal Al-Kaf ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

English Language ◽

Word Embedding ◽

Ensemble Classifiers ◽

Conspiracy Theories ◽

Machine Learning Model ◽

Arabic Sentiment Analysis ◽

The Impact

Sentiment analysis has recently become increasingly important with a massive increase in online content. It is associated with the analysis of textual data generated by social media that can be easily accessed, obtained, and analyzed. With the emergence of COVID-19, most published studies related to COVID-19’s conspiracy theories were surveys on the people's sentiments and opinions and studied the impact of the pandemic on their lives. Just a few studies utilized sentiment analysis of social media using a machine learning approach. These studies focused more on sentiment analysis of Twitter tweets in the English language and did not pay more attention to other languages such as Arabic. This study proposes a machine learning model to analyze the Arabic tweets from Twitter. In this model, we apply Word2Vec for word embedding which formed the main source of features. Two pretrained continuous bag-of-words (CBOW) models are investigated, and Naïve Bayes was used as a baseline classifier. Several single-based and ensemble-based machine learning classifiers have been used with and without SMOTE (synthetic minority oversampling technique). The experimental results show that applying word embedding with an ensemble and SMOTE achieved good improvement on average of F1 score compared to the baseline classifier and other classifiers (single-based and ensemble-based) without SMOTE.

Download Full-text

Different valuable tools for Arabic sentiment analysis: a comparative evaluation

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i1.pp753-762 ◽

2021 ◽

Vol 11 (1) ◽

pp. 753

Author(s):

Youssra Zahidi ◽

Yacine El Younoussi ◽

Yassine Al-Amrani

Keyword(s):

Sentiment Analysis ◽

Programming Languages ◽

Language Processing ◽

Comparative Evaluation ◽

Research Work ◽

Arabic Language ◽

Arabic Natural Language Processing ◽

Arabic Sentiment Analysis ◽

Python Programming ◽

Research Domain

Arabic Natural language processing (ANLP) is a subfield of artificial intelligence (AI) that tries to build various applications in the Arabic language like Arabic sentiment analysis (ASA) that is the operation of classifying the feelings and emotions expressed for defining the attitude of the writer (neutral, negative or positive). In order to work on ASA, researchers can use various tools in their research projects without explaining the cause behind this use, or they choose a set of libraries according to their knowledge about a specific programming language. Because of their libraries' abundance in the ANLP field, especially in ASA, we are relying on JAVA and Python programming languages in our research work. This paper relies on making an in-depth comparative evaluation of different valuable Python and Java libraries to deduce the most useful ones in Arabic sentiment analysis (ASA). According to a large variety of great and influential works in the domain of ASA, we deduce that the NLTK, Gensim and TextBlob libraries are the most useful for Python ASA task. In connection with Java ASA libraries, we conclude that Weka and CoreNLP tools are the most used, and they have great results in this research domain.

Download Full-text

Analyzing the Effect of Negation in Sentiment Polarity of Facebook Dialectal Arabic Text

Applied Sciences ◽

10.3390/app11114768 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4768

Author(s):

Sanaa Kaddoura ◽

Maher Itani ◽

Chris Roast

Keyword(s):

Social Networks ◽

Social Network ◽

Sentiment Analysis ◽

Arabic Language ◽

Network Data ◽

Arabic Text ◽

Social Network Data ◽

Dialectal Arabic ◽

The Impact ◽

Modern Standard

With the increase in the number of users on social networks, sentiment analysis has been gaining attention. Sentiment analysis establishes the aggregation of these opinions to inform researchers about attitudes towards products or topics. Social network data commonly contain authors’ opinions about specific subjects, such as people’s opinions towards steps taken to manage the COVID-19 pandemic. Usually, people use dialectal language in their posts on social networks. Dialectal language has obstacles that make opinion analysis a challenging process compared to working with standard language. For the Arabic language, Modern Standard Arabic tools (MSA) cannot be employed with social network data that contain dialectal language. Another challenge of the dialectal Arabic language is the polarity of opinionated words affected by inverters, such as negation, that tend to change the word’s polarity from positive to negative and vice versa. This work analyzes the effect of inverters on sentiment analysis of social network dialectal Arabic posts. It discusses the different reasons that hinder the trivial resolution of inverters. An experiment is conducted on a corpus of data collected from Facebook. However, the same work can be applied to other social network posts. The results show the impact that resolution of negation may have on the classification accuracy. The results show that the F1 score increases by 20% if negation is treated in the text.

Download Full-text

A Method of Deep Learning Tackles Sentiment Analysis Problem in Arabic Texts

Iraqi Journal of Computer Communication Control and System Engineering ◽

10.33103/uot.ijccce.20.4.2 ◽

2020 ◽

pp. 9-20

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Short Term Memory ◽

Human Life ◽

Morphological Structure ◽

Arabic Language ◽

Written Text ◽

Hidden Layer ◽

Arabic Sentiment Analysis

Sentiment Analysis (SA) is a field of Natural Language Processing (NLP) whose goal is to extract the emotion, sentiment or more general opinion expressed in a human-written text. Opinions and emotions play a central role in human life. Therefore, there are many academic researches in this field for processing many languages like English However, there is scarce in its implementation with addressing Arabic Sentiment Analysis (ASA). It is a challenging field where Arabic language has a rich morphological structure and there are many other defies more than in other languages. For that, the proposed model tackles ASA by using a Deep Learning approach. In this work, one of word embedding methods, such as a first hidden layer for features extracting from the input dataset and Long Short-Term Memory (LSTM) as a deep neural network, has been used for training. The model combined with Softmax layer is applied to turn numeric outputs from LSTM layer into probabilities to classify the outputs to positive or negative. There are two datasets that are used for training the model separately with each one. The first one is ASTD dataset as a dialectal Arabic type about different tweets from internet, the results with this dataset is compared with another academic work that used the same one. The results from this work outperforms through accuracy about 14.95% and F-score about 15.14% more than what performed in the previous work. The second one is HTL dataset as a modern standard Arabic type about opinions of reviewers on different hotels from several countries. This dataset is bigger in size than the first one to show the size effect on the results of this model. So, the accuracy increased about 11% and F-score about 10.8% more than what performed with the first dataset.

Download Full-text

Research on behavior recognition based on feature fusion of automatic coder and recurrent neural network

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189290 ◽

2020 ◽

Vol 39 (6) ◽

pp. 8927-8935

Author(s):

Bing Zheng ◽

Dawei Yun ◽

Yan Liang

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Behavior Pattern ◽

Rapid Development ◽

Video Data ◽

Support Vector ◽

Behavior Recognition ◽

Learning Methods ◽

The Impact ◽

Internet Of Things Technology

Under the impact of COVID-19, research on behavior recognition are highly needed. In this paper, we combine the algorithm of self-adaptive coder and recurrent neural network to realize the research of behavior pattern recognition. At present, most of the research of human behavior recognition is focused on the video data, which is based on the video number. At the same time, due to the complexity of video image data, it is easy to violate personal privacy. With the rapid development of Internet of things technology, it has attracted the attention of a large number of experts and scholars. Researchers have tried to use many machine learning methods, such as random forest, support vector machine and other shallow learning methods, which perform well in the laboratory environment, but there is still a long way to go from practical application. In this paper, a recursive neural network algorithm based on long and short term memory (LSTM) is proposed to realize the recognition of behavior patterns, so as to improve the accuracy of human activity behavior recognition.

Download Full-text