arabic text
Recently Published Documents


TOTAL DOCUMENTS

1168
(FIVE YEARS 334)

H-INDEX

25
(FIVE YEARS 7)

Author(s):  
Sally Mohamed Ali El-Morsy ◽  
Mahmoud Hussein ◽  
Hamdy M. Mousa

<p>Arabic is a Semitic language and one of the most natural languages distinguished by the richness in morphological enunciation and derivation. This special and complex nature makes extracting information from the Arabic language difficult and always needs improvement. Open information extraction systems (OIE) have been emerged and used in different languages, especially in English. However, it has almost not been used for the Arabic language. Accordingly, this paper aims to introduce an OIE system that extracts the relation tuple from Arabic web text, exploiting Arabic dependency parsing and thinking carefully about all possible text relations. Based on clause types' propositions as extractable relations and constituents' grammatical functions, the identities of corresponding clause types are established. The proposed system named Arabic open information extraction(AOIE) can extract highly scalable Arabic text relations while being domain independent. Implementing the proposed system handles the problem using supervised strategies while the system relies on unsupervised extraction strategies. Also, the system has been implemented in several domains to avoid information extraction in a specific field. The results prove that the system achieves high efficiency in extracting clauses from large amounts of text.</p>


Author(s):  
Ali Fadel ◽  
Ibraheem Tuffaha ◽  
Mahmoud Al-Ayyoub

In this work, we present several deep learning models for the automatic diacritization of Arabic text. Our models are built using two main approaches, viz. Feed-Forward Neural Network (FFNN) and Recurrent Neural Network (RNN), with several enhancements such as 100-hot encoding, embeddings, Conditional Random Field (CRF), and Block-Normalized Gradient (BNG). The models are tested on the only freely available benchmark dataset and the results show that our models are either better or on par with other models even those requiring human-crafted language-dependent post-processing steps, unlike ours. Moreover, we show how diacritics in Arabic can be used to enhance the models of downstream NLP tasks such as Machine Translation (MT) and Sentiment Analysis (SA) by proposing novel Translation over Diacritization (ToD) and Sentiment over Diacritization (SoD) approaches.


2022 ◽  
Vol 2022 ◽  
pp. 1-14
Author(s):  
Y.M. Wazery ◽  
Marwa E. Saleh ◽  
Abdullah Alharbi ◽  
Abdelmgeid A. Ali

Text summarization (TS) is considered one of the most difficult tasks in natural language processing (NLP). It is one of the most important challenges that stand against the modern computer system’s capabilities with all its new improvement. Many papers and research studies address this task in literature but are being carried out in extractive summarization, and few of them are being carried out in abstractive summarization, especially in the Arabic language due to its complexity. In this paper, an abstractive Arabic text summarization system is proposed, based on a sequence-to-sequence model. This model works through two components, encoder and decoder. Our aim is to develop the sequence-to-sequence model using several deep artificial neural networks to investigate which of them achieves the best performance. Different layers of Gated Recurrent Units (GRU), Long Short-Term Memory (LSTM), and Bidirectional Long Short-Term Memory (BiLSTM) have been used to develop the encoder and the decoder. In addition, the global attention mechanism has been used because it provides better results than the local attention mechanism. Furthermore, AraBERT preprocess has been applied in the data preprocessing stage that helps the model to understand the Arabic words and achieves state-of-the-art results. Moreover, a comparison between the skip-gram and the continuous bag of words (CBOW) word2Vec word embedding models has been made. We have built these models using the Keras library and run-on Google Colab Jupiter notebook to run seamlessly. Finally, the proposed system is evaluated through ROUGE-1, ROUGE-2, ROUGE-L, and BLEU evaluation metrics. The experimental results show that three layers of BiLSTM hidden states at the encoder achieve the best performance. In addition, our proposed system outperforms the other latest research studies. Also, the results show that abstractive summarization models that use the skip-gram word2Vec model outperform the models that use the CBOW word2Vec model.


2022 ◽  
Vol 12 (1) ◽  
pp. 130-142
Author(s):  
Leen Al-Khalafat ◽  
Ahmad S. Haider

Translation is defined as transferring meaning and style from one language to another, taking the text producer's intended purpose and the audience culture into account. This paper uses a 256,000-word Arabic-English parallel corpus of the speeches of King Abdullah II of Jordan from 1999 to 2015 to examine how some culture-bound expressions were translated from Arabic into English. To do so, two software packages were used, namely Wordsmith 6 and SketchEngine. Comparing the size of the Arabic corpus with its English counterpart using the wordlist tool of WS6, the researchers found that the number of words (tokens) in the English translation is more than the Arabic source text. However, the results showed that the Arabic language has more unique words, which means that it has more lexical density than its English counterpart. The researchers carried out a keyword analysis and compared the Arabic corpus with the ArTenTen corpus to identify the words that King Abdullah II saliently used in his speeches. Most of the keywords were culture-bound and related to the Jordanian context, which might be challenging to render. Using the parallel concordance tool and comparing the Arabic text with its English translation showed that the translator/s mainly resorted to the strategies of deletion, addition, substitution, and transliteration. The researchers recommend that further studies be conducted using the same approach but on larger corpora of other genres, such as legal, religious, press, and scientific texts.


2022 ◽  
Vol 40 (2) ◽  
pp. 421-440
Author(s):  
Suliman A. Alsuhibany ◽  
Meznah Alquraishi

2022 ◽  
Author(s):  
Ahmed H. Aliwy ◽  
Kadhim B. S. Aljanabi ◽  
Huda A. Alameen

2022 ◽  
Vol 31 (1) ◽  
pp. 523-537
Author(s):  
Suliman A. Alsuhibany ◽  
Hessah Abdulaziz Alhodathi
Keyword(s):  

2021 ◽  
Author(s):  
Khalil Boukthir ◽  
Abdulrahman M. Qahtani ◽  
Omar Almutiry ◽  
habib dhahri ◽  
Adel Alimi

<div>- A novel approach is presented to reduced annotation based on Deep Active Learning for Arabic text detection in Natural Scene Images.</div><div>- A new Arabic text images dataset (7k images) using the Google Street View service named TSVD.</div><div>- A new semi-automatic method for generating natural scene text images from the streets.</div><div>- Training samples is reduced to 1/5 of the original training size on average.</div><div>- Much less training data to achieve better dice index : 0.84</div>


2021 ◽  
Author(s):  
Khalil Boukthir ◽  
Abdulrahman M. Qahtani ◽  
Omar Almutiry ◽  
habib dhahri ◽  
Adel Alimi

<div>- A novel approach is presented to reduced annotation based on Deep Active Learning for Arabic text detection in Natural Scene Images.</div><div>- A new Arabic text images dataset (7k images) using the Google Street View service named TSVD.</div><div>- A new semi-automatic method for generating natural scene text images from the streets.</div><div>- Training samples is reduced to 1/5 of the original training size on average.</div><div>- Much less training data to achieve better dice index : 0.84</div>


Author(s):  
Houda Gaddour ◽  
Slim Kanoun ◽  
Nicole Vincent

Text in scene images can provide useful and vital information for content-based image analysis. Therefore, text detection and script identification in images are an important task. In this paper, we propose a new method for text detection in natural scene images, particularly for Arabic text, based on a bottom-up approach where four principal steps can be highlighted. The detection of extremely stable and homogeneous regions of interest (ROIs) is based on the Color Stability and Homogeneity Regions (CSHR) proposed technique. These regions are then labeled as textual or non-textual ROI. This identification is based on a structural approach. The textual ROIs are grouped to constitute zones according to spatial relations between them. Finally, the textual or non-textual nature of the constituted zones is refined. This last identification is based on handcrafted features and on features built from a Convolutional Neural Network (CNN) after learning. The proposed method was evaluated on the databases used for text detection in natural scene images: the competitions organized in 2017 edition of the International Conference on Document Analysis and Recognition (ICDAR2017), the Urdu-text database and our Natural Scene Image Database for Arabic Text detection (NSIDAT) database. The obtained experimental results seem to be interesting.


Sign in / Sign up

Export Citation Format

Share Document