A survey of automatic Arabic diacritization techniques

AbstractIn Modern Standard Arabic texts are typically written without diacritical markings. The diacritics are important to clarify the sense and meaning of words. Lack of these markings may lead to ambiguity even for the natives. Often the natives successfully disambiguate the meaning through the context; however, many Arabic applications, such as machine translation, text-to-speech, and information retrieval, are vulnerable due to lack of diacritics. The process of automatically restoring diacritical marks is called diacritization or diacritic restoration. In this paper we discuss the properties of the Arabic language and the issues that are related to the lack of the diacritical marking. It will be followed by a survey of the recent algorithms that were developed to solve the diacritization problem. We also look into the future trend for researchers working in this area.

Download Full-text

Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation

10.18653/v1/2021.calcs-1.8 ◽

2021 ◽

Author(s):

El Moatez Billah Nagoudi ◽

AbdelRahim Elmadany ◽

Muhammad Abdul-Mageed

Keyword(s):

Machine Translation ◽

Modern Standard Arabic ◽

Standard Arabic ◽

Modern Standard

Download Full-text

Towards including prosody in a text-to-speech system for modern standard Arabic

Computer Speech & Language ◽

10.1016/j.csl.2007.06.004 ◽

2008 ◽

Vol 22 (1) ◽

pp. 84-103 ◽

Cited By ~ 7

Author(s):

Allan Ramsay ◽

Hanady Mansour

Keyword(s):

Text To Speech ◽

Modern Standard Arabic ◽

Standard Arabic ◽

Modern Standard

Download Full-text

Modern standard Arabic interference in Algerian English as a foreign language students’ writings

Global Journal of Foreign Language Teaching ◽

10.18844/gjflt.v10i2.4664 ◽

2020 ◽

Vol 10 (2) ◽

pp. 93-100 ◽

Cited By ~ 1

Author(s):

Chamseddine Lamri ◽

Amira Cherifi

Keyword(s):

Foreign Language ◽

Mother Tongue ◽

Arabic Language ◽

Modern Standard Arabic ◽

Authentic Materials ◽

Standard Arabic ◽

Foreign Language Learners ◽

Linguistic Interference ◽

Semantic Errors ◽

Modern Standard

Linguistic interference is a phenomenon which occurs when the learners’ knowledge of his first language or the mother tongue interferes with the knowledge of the language that is being learnt. This problem is recurrent among foreign language learners, a case in point in Algeria, Modern Standard Arabic interfere with English in students oral and written productions. Hence, stylistic errors are produced by the learners because the knowledge about the foreign language is established incorrectly. Accordingly, this paper will explore the types of linguistic interference errors done by pupils on their English writings at Bouazza Miloud high school in Tlemcen-Algeria. The quantitative and qualitative analysis of pupils’ productions revealed the existence of syntactic, lexical and semantic errors. These findings underline the need for a detailed analysis to propose pedagogical solutions as using authentic materials and the focus on reading to write correctly and coherently. Keywords: Modern Standard Arabic; language interference; Negative transfer; interlingual errors; writing.

Download Full-text

Borrowing as one of the Efficient Ways to Widen the Vocabulary of Military Terminology of Modern Standard Arabic Language

The Oriental Studies ◽

10.15407/skhodoznavstvo2013.62-63.142 ◽

2013 ◽

Vol 2013 (62-63) ◽

pp. 142-150

Author(s):

R.V. Statsyuk

Keyword(s):

Arabic Language ◽

Modern Standard Arabic ◽

Standard Arabic ◽

Modern Standard

Download Full-text

Translating irony into Arabic – who’s having the last laugh? Dubbing Monsters Inc.: Egyptian vernacular vs. modern standard Arabic

European Journal of Humour Research ◽

10.7592/ejhr2019.7.4.yahiaoui ◽

2020 ◽

Vol 7 (4) ◽

Author(s):

Rashid Yahiaoui ◽

Basema Alqumboz ◽

Ashraf Fattah ◽

Amer Al Adwan

Keyword(s):

Translation Studies ◽

Arabic Language ◽

Modern Standard Arabic ◽

Feature Film ◽

Standard Arabic ◽

Film And Television ◽

Box Office ◽

Humor Studies ◽

Common Technique ◽

Modern Standard

Monsters Inc., an animated feature film produced by Pixar Animation Studios in 2001, received significant recognition worldwide. The film was nominated in 2002 for the ASCAP Film and Television Music Awards by the Box Office Films. Two dubbed versions of the film were later released with Arabic translations using Egyptian Vernacular, a spoken dialect, and Modern Standard Arabic, used primarily in formal, written communications.This study examines humor in translation and irony as humor which represents a common technique in “Pixar plotting”. The research investigates the strategies, types, and categories of irony as humor within the translations and the success of those translations at accurately transmitting the humorous meaning. Towards exploring the problems of translating irony across languages and cultures, this research examines the shifts in translations between the two Arabic language versions using an interdisciplinary theoretical approach encompassing humor studies, audiovisual translation studies, and descriptive translation studies. Furthermore, the research adopts Muecke’s (1978) classification of irony markers to categorize and identify the strategies used in translating irony as humor. The study finds that the two different versions of Arabic utilize similar strategies at times and divergent ones at others, such as explication, substitution, omission or addition, in translating irony as humor with each succeeding/failing at varied levels of meaning transmission. The research suggests translators’ creativity, or lack thereof, and the language variant used are primarily responsible for the success or failure of transmitting irony as humor for dubbing into Arabic.

Download Full-text

A Transformer-Based Neural Machine Translation Model for Arabic Dialects That Utilizes Subword Units

Sensors ◽

10.3390/s21196509 ◽

2021 ◽

Vol 21 (19) ◽

pp. 6509

Author(s):

Laith H. Baniata ◽

Isaac. K. E. Ampomah ◽

Seyoung Park

Keyword(s):

Machine Translation ◽

Target Language ◽

Modern Standard Arabic ◽

Neural Machine Translation ◽

Unknown Word ◽

Translation Model ◽

Standard Arabic ◽

Arabic Dialects ◽

Transformer Model ◽

Modern Standard

Languages that allow free word order, such as Arabic dialects, are of significant difficulty for neural machine translation (NMT) because of many scarce words and the inefficiency of NMT systems to translate these words. Unknown Word (UNK) tokens represent the out-of-vocabulary words for the reason that NMT systems run with vocabulary that has fixed size. Scarce words are encoded completely as sequences of subword pieces employing the Word-Piece Model. This research paper introduces the first Transformer-based neural machine translation model for Arabic vernaculars that employs subword units. The proposed solution is based on the Transformer model that has been presented lately. The use of subword units and shared vocabulary within the Arabic dialect (the source language) and modern standard Arabic (the target language) enhances the behavior of the multi-head attention sublayers for the encoder by obtaining the overall dependencies between words of input sentence for Arabic vernacular. Experiments are carried out from Levantine Arabic vernacular (LEV) to modern standard Arabic (MSA) and Maghrebi Arabic vernacular (MAG) to MSA, Gulf–MSA, Nile–MSA, Iraqi Arabic (IRQ) to MSA translation tasks. Extensive experiments confirm that the suggested model adequately addresses the unknown word issue and boosts the quality of translation from Arabic vernaculars to Modern standard Arabic (MSA).

Download Full-text

Problems in Translation of Military-Technical Terminology of Modern Standard Arabic Language

The Oriental Studies ◽

10.15407/skhodoznavstvo2014.68.141 ◽

2014 ◽

Vol 2014 (68) ◽

pp. 141-152

Author(s):

R.V. Statsyuk

Keyword(s):

Arabic Language ◽

Modern Standard Arabic ◽

Standard Arabic ◽

Modern Standard

Download Full-text

Neural Machine Translation from Jordanian Dialect to Modern Standard Arabic

2020 11th International Conference on Information and Communication Systems (ICICS) ◽

10.1109/icics49469.2020.239505 ◽

2020 ◽

Author(s):

Roqayah Al-Ibrahim ◽

Rehab M. Duwairi

Keyword(s):

Machine Translation ◽

Modern Standard Arabic ◽

Neural Machine Translation ◽

Standard Arabic ◽

Modern Standard

Download Full-text

An XML Approach of Coding a Morphological Database for Arabic Language

Advances in Human-Computer Interaction ◽

10.1155/2011/629305 ◽

2011 ◽

Vol 2011 ◽

pp. 1-15 ◽

Cited By ~ 1

Author(s):

Mourad Gridach ◽

Noureddine Chenfour

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Morphological Analysis ◽

Semantic Analysis ◽

Arabic Language ◽

Syntactic Analysis ◽

Standard Arabic ◽

Modern Standard

We present an XML approach for the production of an Arabic morphological database for Arabic language that will be used in morphological analysis for modern standard Arabic (MSA). Optimizing the production, maintenance, and extension of morphological database is one of the crucial aspects impacting natural language processing (NLP). For Arabic language, producing a morphological database is not an easy task, because this it has some particularities such as the phenomena of agglutination and a lot of morphological ambiguity phenomenon. The method presented can be exploited by NLP applications such as syntactic analysis, semantic analysis, information retrieval, and orthographical correction.

Download Full-text

"Sawfa lā/lan yafʿal-" et "lā/lan sawfa yafʿal-": Étude de cas sur corpus pour une grammaire didactique et renouvelée de l’arabe moderne

Journal of Arabic and Islamic Studies ◽

10.5617/jais.4655 ◽

1970 ◽

Vol 15 ◽

pp. 1-17

Author(s):

Manuel Sartori

Keyword(s):

The Internet ◽

Future Tense ◽

Modern Standard Arabic ◽

Standard Arabic ◽

Arabic Linguistics ◽

Etude De Cas ◽

The Future ◽

Étude De Cas ◽

The Difference ◽

Modern Standard

From surveys made on the Internet, in newspapers and in novels written in Modern Standard Arabic, this article shows the existence of other forms of negation in the future than that of lan + subjunctive. It demonstrates that the so called MSA grammar books are, once again, descriptively inadequate when facing the reality of the texts. While arguing for a renewal of the teaching of the MSA grammar, this article shows that these forms are actually much older than it appears and proposes assumptions to analyze the conditions for their emergence. More specifically, the article proposes after Larcher and on the basis of non synonymy to see in the joint existence of several forms of negations in the future a probable reorganization of the negation system where, on logical and pragmatic bases, the difference would be made between a descrip-tive negation on one hand and a modal negation (= denial) on the other.Keywords: Arabic linguistics, corpus, denial, descriptive negation, didactic, future tense, Ibn al-Ḥājib, Ibn Hišām al-ʾAnṣārī, Modern Standard Arabic (MSA), MSA grammar books, modal negation, Raḍī al-Dīn al-ʾAstarābāḏī, Zamaḫšarī

Download Full-text