arabic natural language processing Latest Research Papers

Simple Extensible Deep Learning Model for Automatic Arabic Diacritization

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3480938 ◽

2022 ◽

Vol 21 (2) ◽

pp. 1-16

Author(s):

Hamza Abbad ◽

Shengwu Xiong

Keyword(s):

Deep Learning ◽

Language Processing ◽

Error Rate ◽

Short Term Memory ◽

Confusion Matrix ◽

Learning Model ◽

Model Errors ◽

Sequence Elements ◽

Arabic Natural Language Processing ◽

Deep Learning Model

Automatic diacritization is an Arabic natural language processing topic based on the sequence labeling task where the labels are the diacritics and the letters are the sequence elements. A letter can have from zero up to two diacritics. The dataset used was a subset of the preprocessed version of the Tashkeela corpus. We developed a deep learning model composed of a stack of four bidirectional long short-term memory hidden layers of the same size and an output layer at every level. The levels correspond to the groups that we classified the diacritics into (short vowels, double case-endings, Shadda, and Sukoon). Before training, the data were divided into input vectors containing letter indexes and outputs vectors containing the indexes of diacritics regarding their groups. Both input and output vectors are concatenated, then a sliding window operation with overlapping is performed to generate continuous and fixed-size data. Such data is used for both training and evaluation. Finally, we realize some tests using the standard metrics with all of their variations and compare our results with two recent state-of-the-art works. Our model achieved 3% diacritization error rate and 8.99% word error rate when including all letters. We have also generated the confusion matrix to show the performances per output and analyzed the mismatches of the first 500 lines to classify the model errors according to their linguistic nature.

Sentiment Analysis in Poems in Misurata Sub-dialect

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v21i.9105 ◽

2021 ◽

Vol 21 ◽

pp. 103-114

Author(s):

Azza Abugharsa

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

Figurative Language ◽

Arabic Language ◽

Support Vector ◽

Standard Arabic ◽

Learning Techniques ◽

Vector Machines ◽

Arabic Natural Language Processing ◽

Modern Standard

Over the recent decades, there has been a significant increase and development of resources for Arabic natural language processing. This includes the task of exploring Arabic Language Sentiment Analysis (ALSA) from Arabic utterances in both Modern Standard Arabic (MSA) and different Arabic dialects. This study focuses on detecting sentiment in poems written in Misurata Arabic sub-dialect spoken in Misurata, Libya. The tools used to detect sentiment from the dataset are Sklearn as well as Mazajak sentiment tool1. Logistic Regression, Random Forest, Naive Bayes (NB), and Support Vector Machines (SVM) classifiers are used with Sklearn, while the Convolutional Neural Network (CNN) is implemented with Mazajak. The results show that the traditional classifiers score a higher level of accuracy as compared to Mazajak which is built on an algorithm that includes deep learning techniques. More research is suggested to analyze Arabic sub-dialect poetry in order to investigate the aspects that contribute to sentiments in these multi-line texts; for example, the use of figurative language such as metaphors.

Arabic Natural Language Processing for Qur’anic Research: A Systematic Review

10.36227/techrxiv.14381891.v1 ◽

2021 ◽

Author(s):

Muhammad Huzaifa Bashir ◽

Aqil M. Azmi ◽

Haq Nawaz ◽

Wajdi Zaghouani ◽

Mona Diab ◽

...

Keyword(s):

Systematic Review ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Arabic Natural Language Processing

Arabic Natural Language Processing for Qur’anic Research: A Systematic Review

10.36227/techrxiv.14381891 ◽

2021 ◽

Author(s):

Muhammad Huzaifa Bashir ◽

Aqil M. Azmi ◽

Haq Nawaz ◽

Wajdi Zaghouani ◽

Mona Diab ◽

...

Keyword(s):

Systematic Review ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Arabic Natural Language Processing

Arabic Natural Language Processing for Qur’anic Research: A Systematic Review

A powerful comparison of deep learning frameworks for Arabic sentiment analysis

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i1.pp745-752 ◽

2021 ◽

Vol 11 (1) ◽

pp. 745

Author(s):

Youssra Zahidi ◽

Yacine El Younoussi ◽

Yassine Al-Amrani

Keyword(s):

Deep Learning ◽

Comparative Analysis ◽

Sentiment Analysis ◽

Programming Languages ◽

Language Processing ◽

Large Set ◽

Language Familiarity ◽

Arabic Natural Language Processing ◽

Arabic Sentiment Analysis ◽

Python Programming

Deep learning (DL) is a machine learning (ML) subdomain that involves algorithms taken from the brain function named artificial neural networks (ANNs). Recently, DL approaches have gained major accomplishments across various Arabic natural language processing (ANLP) tasks, especially in the domain of Arabic sentiment analysis (ASA). For working on Arabic SA, researchers can use various DL libraries in their projects, but without justifying their choice or they choose a group of libraries relying on their particular programming language familiarity. We are basing in this work on Java and Python programming languages because they have a large set of deep learning libraries that are very useful in the ASA domain. This paper focuses on a comparative analysis of different valuable Python and Java libraries to conclude the most relevant and robust DL libraries for ASA. Throw this comparative analysis, and we find that: TensorFlow, Theano, and Keras Python frameworks are very popular and very used in this research domain.

Different valuable tools for Arabic sentiment analysis: a comparative evaluation

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i1.pp753-762 ◽

2021 ◽

Vol 11 (1) ◽

pp. 753

Author(s):

Youssra Zahidi ◽

Yacine El Younoussi ◽

Yassine Al-Amrani

Keyword(s):

Sentiment Analysis ◽

Programming Languages ◽

Language Processing ◽

Comparative Evaluation ◽

Research Work ◽

Arabic Language ◽

Arabic Natural Language Processing ◽

Arabic Sentiment Analysis ◽

Python Programming ◽

Research Domain

Arabic Natural language processing (ANLP) is a subfield of artificial intelligence (AI) that tries to build various applications in the Arabic language like Arabic sentiment analysis (ASA) that is the operation of classifying the feelings and emotions expressed for defining the attitude of the writer (neutral, negative or positive). In order to work on ASA, researchers can use various tools in their research projects without explaining the cause behind this use, or they choose a set of libraries according to their knowledge about a specific programming language. Because of their libraries' abundance in the ANLP field, especially in ASA, we are relying on JAVA and Python programming languages in our research work. This paper relies on making an in-depth comparative evaluation of different valuable Python and Java libraries to deduce the most useful ones in Arabic sentiment analysis (ASA). According to a large variety of great and influential works in the domain of ASA, we deduce that the NLTK, Gensim and TextBlob libraries are the most useful for Python ASA task. In connection with Java ASA libraries, we conclude that Weka and CoreNLP tools are the most used, and they have great results in this research domain.

An Arabic Dataset for Disease Named Entity Recognition with Multi-Annotation Schemes

Data ◽

10.3390/data5030060 ◽

2020 ◽

Vol 5 (3) ◽

pp. 60

Author(s):

Nasser Alshammari ◽

Saad Alanazi

Keyword(s):

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Definite Article ◽

Linguistic Features ◽

Annotation Scheme ◽

Named Entity ◽

Part Of Speech ◽

Annotation Process ◽

Arabic Natural Language Processing

This article outlines a novel data descriptor that provides the Arabic natural language processing community with a dataset dedicated to named entity recognition tasks for diseases. The dataset comprises more than 60 thousand words, which were annotated manually by two independent annotators using the inside–outside (IO) annotation scheme. To ensure the reliability of the annotation process, the inter-annotator agreements rate was calculated, and it scored 95.14%. Due to the lack of research efforts in the literature dedicated to studying Arabic multi-annotation schemes, a distinguishing and a novel aspect of this dataset is the inclusion of six more annotation schemes that will bridge the gap by allowing researchers to explore and compare the effects of these schemes on the performance of the Arabic named entity recognizers. These annotation schemes are IOE, IOB, BIES, IOBES, IE, and BI. Additionally, five linguistic features, including part-of-speech tags, stopwords, gazetteers, lexical markers, and the presence of the definite article, are provided for each record in the dataset.

ARBML: Democritizing Arabic Natural Language Processing Tools

10.18653/v1/2020.nlposs-1.2 ◽

2020 ◽

Author(s):

Zaid Alyafeai ◽

Maged Al-Shaibani

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Arabic Natural Language Processing

Novel Prototype for Handling Arabic Natural Language Processing: Smart Morphological Analyser

2019 Second International Conference on Artificial Intelligence for Industries (AI4I) ◽

10.1109/ai4i46381.2019.00010 ◽

2019 ◽

Author(s):

Mohammed M. Abu Shquier

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Arabic Natural Language Processing

ASA: A framework for Arabic sentiment analysis

Journal of Information Science ◽

10.1177/0165551519849516 ◽

2019 ◽

Vol 46 (4) ◽

pp. 544-559 ◽

Cited By ~ 4

Author(s):

Ahmed Oussous ◽

Fatima-Zahra Benjelloun ◽

Ayoub Ait Lahcen ◽

Samir Belfkih

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Opinion Mining ◽

Short Term Memory ◽

Research Area ◽

Support Vector ◽

Learning Models ◽

Arabic Natural Language Processing ◽

Arabic Sentiment Analysis

Sentiment analysis (SA), also known as opinion mining, is a growing important research area. Generally, it helps to automatically determine if a text expresses a positive, negative or neutral sentiment. It enables to mine the huge increasing resources of shared opinions such as social networks, review sites and blogs. In fact, SA is used by many fields and for various languages such as English and Arabic. However, since Arabic is a highly inflectional and derivational language, it raises many challenges. In fact, SA of Arabic text should handle such complex morphology. To better handle these challenges, we decided to provide the research community and Arabic users with a new efficient framework for Arabic Sentiment Analysis (ASA). Our primary goal is to improve the performance of ASA by exploiting deep learning while varying the preprocessing techniques. For that, we implement and evaluate two deep learning models namely convolutional neural network (CNN) and long short-term memory (LSTM) models. The framework offers various preprocessing techniques for ASA (including stemming, normalisation, tokenization and stop words). As a result of this work, we first provide a new rich and publicly available Arabic corpus called Moroccan Sentiment Analysis Corpus (MSAC). Second, the proposed framework demonstrates improvement in ASA. In fact, the experimental results prove that deep learning models have a better performance for ASA than classical approaches (support vector machines, naive Bayes classifiers and maximum entropy). They also show the key role of morphological features in Arabic Natural Language Processing (NLP).

arabic natural language processing
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Simple Extensible Deep Learning Model for Automatic Arabic Diacritization

Sentiment Analysis in Poems in Misurata Sub-dialect

Arabic Natural Language Processing for Qur’anic Research: A Systematic Review

Arabic Natural Language Processing for Qur’anic Research: A Systematic Review

A powerful comparison of deep learning frameworks for Arabic sentiment analysis

Different valuable tools for Arabic sentiment analysis: a comparative evaluation

An Arabic Dataset for Disease Named Entity Recognition with Multi-Annotation Schemes

ARBML: Democritizing Arabic Natural Language Processing Tools

Novel Prototype for Handling Arabic Natural Language Processing: Smart Morphological Analyser

ASA: A framework for Arabic sentiment analysis

Export Citation Format

arabic natural language processingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Simple Extensible Deep Learning Model for Automatic Arabic Diacritization

Sentiment Analysis in Poems in Misurata Sub-dialect

Arabic Natural Language Processing for Qur’anic Research: A Systematic Review

Arabic Natural Language Processing for Qur’anic Research: A Systematic Review

A powerful comparison of deep learning frameworks for Arabic sentiment analysis

Different valuable tools for Arabic sentiment analysis: a comparative evaluation

An Arabic Dataset for Disease Named Entity Recognition with Multi-Annotation Schemes

ARBML: Democritizing Arabic Natural Language Processing Tools

Novel Prototype for Handling Arabic Natural Language Processing: Smart Morphological Analyser

ASA: A framework for Arabic sentiment analysis

arabic natural language processing
Recently Published Documents