Text Classification of Arabic Text: Deep Learning in ANLP

<p>There is a huge content of Arabic text available over online that requires an organization of these texts. As result, here are many applications of natural languages processing (NLP) that concerns with text organization. One of the is text classification (TC). TC helps to make dealing with unorganized text. However, it is easier to classify them into suitable class or labels. This paper is a survey of Arabic text classification. Also, it presents comparison among different methods in the classification of Arabic texts, where Arabic text is represented a complex text due to its vocabularies. Arabic language is one of the richest languages in the world, where it has many linguistic bases. The researche in Arabic language processing is very few compared to English. As a result, these problems represent challenges in the classification, and organization of specific Arabic text. Text classification (TC) helps to access the most documents, or information that has already classified into specific classes, or categories to one or more classes or categories. In addition, classification of documents facilitate search engine to decrease the amount of document to, and then to become easier to search and matching with queries.</p>

Download Full-text

A systematic review of text classification research based on deep learning models in Arabic language

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i6.pp6629-6643 ◽

2020 ◽

Vol 10 (6) ◽

pp. 6629

Author(s):

Ahlam Wahdan ◽

Sendeyah AL Hantoobi ◽

Said A. Salloum ◽

Khaled Shaalan

Keyword(s):

Neural Network ◽

Systematic Review ◽

Neural Networks ◽

Deep Learning ◽

Text Classification ◽

Arabic Language ◽

Machine Learning Techniques ◽

Learning Models ◽

Learning Techniques

Classifying or categorizing texts is the process by which documents are classified into groups by subject, title, author, etc. This paper undertakes a systematic review of the latest research in the field of the classification of Arabic texts. Several machine learning techniques can be used for text classification, but we have focused only on the recent trend of neural network algorithms. In this paper, the concept of classifying texts and classification processes are reviewed. Deep learning techniques in classification and its type are discussed in this paper as well. Neural networks of various types, namely, RNN, CNN, FFNN, and LSTM, are identified as the subject of study. Through systematic study, 12 research papers related to the field of the classification of Arabic texts using neural networks are obtained: for each paper the methodology for each type of neural network and the accuracy ration for each type is determined. The evaluation criteria used in the algorithms of different neural network types and how they play a large role in the highly accurate classification of Arabic texts are discussed. Our results provide some findings regarding how deep learning models can be used to improve text classification research in Arabic language.

Download Full-text

Technical Domain Classification of Bangla Text using BERT

Proceedings of Intelligent Computing and Technologies Conference ◽

10.21467/proceedings.115.16 ◽

2021 ◽

Author(s):

Koyel Ghosh ◽

Apurbalal Senapati

Keyword(s):

Deep Learning ◽

Computer Science ◽

Communication Technology ◽

Text Classification ◽

Language Model ◽

Learning Model ◽

Coarse Grained ◽

Deep Learning Model

Coarse-grained tasks are primarily based on Text classification, one of the earliest problems in NLP, and these tasks are done on document and sentence levels. Here, our goal is to identify the technical domain of a given Bangla text. In Coarse-grained technical domain classification, such a piece of the Bangla text provides information about specific Coarse-grained technical domains like Biochemistry (bioche), Communication Technology (com-tech), Computer Science (cse), Management (mgmt), Physics (phy) Etc. This paper uses a recent deep learning model called the Bangla Bidirectional Encoder Representations Transformers (Bangla BERT) mechanism to identify the domain of a given text. Bangla BERT (Bangla-Bert-Base) is a pretrained language model of the Bangla language. Later, we discuss the Bangla BERT accuracy and compare it with other models that solve the same problem.

Download Full-text