A survey of automatic Arabic diacritization techniques

2013 ◽  
Vol 21 (3) ◽  
pp. 477-495 ◽  
Author(s):  
AQIL M. AZMI ◽  
REHAM S. ALMAJED

AbstractIn Modern Standard Arabic texts are typically written without diacritical markings. The diacritics are important to clarify the sense and meaning of words. Lack of these markings may lead to ambiguity even for the natives. Often the natives successfully disambiguate the meaning through the context; however, many Arabic applications, such as machine translation, text-to-speech, and information retrieval, are vulnerable due to lack of diacritics. The process of automatically restoring diacritical marks is called diacritization or diacritic restoration. In this paper we discuss the properties of the Arabic language and the issues that are related to the lack of the diacritical marking. It will be followed by a survey of the recent algorithms that were developed to solve the diacritization problem. We also look into the future trend for researchers working in this area.

2021 ◽  
Author(s):  
El Moatez Billah Nagoudi ◽  
AbdelRahim Elmadany ◽  
Muhammad Abdul-Mageed

2020 ◽  
Vol 10 (2) ◽  
pp. 93-100 ◽  
Author(s):  
Chamseddine Lamri ◽  
Amira Cherifi

Linguistic interference is a phenomenon which occurs when the learners’ knowledge of his first language or the mother tongue interferes with the knowledge of the language that is being learnt. This problem is recurrent among foreign language learners, a case in point in Algeria, Modern Standard Arabic interfere with English in students oral and written productions. Hence, stylistic errors are produced by the learners because the knowledge about the foreign language is established incorrectly. Accordingly,   this paper will explore the   types of linguistic interference errors done by pupils on their English writings at Bouazza Miloud high school in Tlemcen-Algeria. The quantitative and qualitative analysis of pupils’ productions revealed the existence of syntactic, lexical and semantic errors. These findings underline the need for a detailed analysis to propose pedagogical solutions as using authentic materials and the focus on reading to write correctly and coherently.   Keywords: Modern Standard Arabic; language interference; Negative transfer; interlingual errors; writing.


2020 ◽  
Vol 7 (4) ◽  
Author(s):  
Rashid Yahiaoui ◽  
Basema Alqumboz ◽  
Ashraf Fattah ◽  
Amer Al Adwan

Monsters Inc., an animated feature film produced by Pixar Animation Studios in 2001, received significant recognition worldwide. The film was nominated in 2002 for the ASCAP Film and Television Music Awards by the Box Office Films. Two dubbed versions of the film were later released with Arabic translations using Egyptian Vernacular, a spoken dialect, and Modern Standard Arabic, used primarily in formal, written communications.This study examines humor in translation and irony as humor which represents a common technique in “Pixar plotting”. The research investigates the strategies, types, and categories of irony as humor within the translations and the success of those translations at accurately transmitting the humorous meaning. Towards exploring the problems of translating irony across languages and cultures, this research examines the shifts in translations between the two Arabic language versions using an interdisciplinary theoretical approach encompassing humor studies, audiovisual translation studies, and descriptive translation studies. Furthermore, the research adopts Muecke’s (1978) classification of irony markers to categorize and identify the strategies used in translating irony as humor. The study finds that the two different versions of Arabic utilize similar strategies at times and divergent ones at others, such as explication, substitution, omission or addition, in translating irony as humor with each succeeding/failing at varied levels of meaning transmission. The research suggests translators’ creativity, or lack thereof, and the language variant used are primarily responsible for the success or failure of transmitting irony as humor for dubbing into Arabic. 


Sensors ◽  
2021 ◽  
Vol 21 (19) ◽  
pp. 6509
Author(s):  
Laith H. Baniata ◽  
Isaac. K. E. Ampomah ◽  
Seyoung Park

Languages that allow free word order, such as Arabic dialects, are of significant difficulty for neural machine translation (NMT) because of many scarce words and the inefficiency of NMT systems to translate these words. Unknown Word (UNK) tokens represent the out-of-vocabulary words for the reason that NMT systems run with vocabulary that has fixed size. Scarce words are encoded completely as sequences of subword pieces employing the Word-Piece Model. This research paper introduces the first Transformer-based neural machine translation model for Arabic vernaculars that employs subword units. The proposed solution is based on the Transformer model that has been presented lately. The use of subword units and shared vocabulary within the Arabic dialect (the source language) and modern standard Arabic (the target language) enhances the behavior of the multi-head attention sublayers for the encoder by obtaining the overall dependencies between words of input sentence for Arabic vernacular. Experiments are carried out from Levantine Arabic vernacular (LEV) to modern standard Arabic (MSA) and Maghrebi Arabic vernacular (MAG) to MSA, Gulf–MSA, Nile–MSA, Iraqi Arabic (IRQ) to MSA translation tasks. Extensive experiments confirm that the suggested model adequately addresses the unknown word issue and boosts the quality of translation from Arabic vernaculars to Modern standard Arabic (MSA).


2011 ◽  
Vol 2011 ◽  
pp. 1-15 ◽  
Author(s):  
Mourad Gridach ◽  
Noureddine Chenfour

We present an XML approach for the production of an Arabic morphological database for Arabic language that will be used in morphological analysis for modern standard Arabic (MSA). Optimizing the production, maintenance, and extension of morphological database is one of the crucial aspects impacting natural language processing (NLP). For Arabic language, producing a morphological database is not an easy task, because this it has some particularities such as the phenomena of agglutination and a lot of morphological ambiguity phenomenon. The method presented can be exploited by NLP applications such as syntactic analysis, semantic analysis, information retrieval, and orthographical correction.


1970 ◽  
Vol 15 ◽  
pp. 1-17
Author(s):  
Manuel Sartori

From surveys made on the Internet, in newspapers and in novels written in Modern Standard Arabic, this article shows the existence of other forms of negation in the future than that of lan + subjunctive. It demonstrates that the so called MSA grammar books are, once again, descriptively inadequate when facing the reality of the texts. While arguing for a renewal of the teaching of the MSA grammar, this article shows that these forms are actually much older than it appears and proposes assumptions to analyze the conditions for their emergence. More specifically, the article proposes after Larcher and on the basis of non synonymy to see in the joint existence of several forms of negations in the future a probable reorganization of the negation system where, on logical and pragmatic bases, the difference would be made between a descrip-tive negation on one hand and a modal negation (= denial) on the other.Keywords: Arabic linguistics, corpus, denial, descriptive negation, didactic, future tense, Ibn al-Ḥājib, Ibn Hišām al-ʾAnṣārī, Modern Standard Arabic (MSA), MSA grammar books, modal negation, Raḍī al-Dīn al-ʾAstarābāḏī, Zamaḫšarī


Sign in / Sign up

Export Citation Format

Share Document