A Transformer-Based Neural Machine Translation Model for Arabic Dialects That Utilizes Subword Units

Languages that allow free word order, such as Arabic dialects, are of significant difficulty for neural machine translation (NMT) because of many scarce words and the inefficiency of NMT systems to translate these words. Unknown Word (UNK) tokens represent the out-of-vocabulary words for the reason that NMT systems run with vocabulary that has fixed size. Scarce words are encoded completely as sequences of subword pieces employing the Word-Piece Model. This research paper introduces the first Transformer-based neural machine translation model for Arabic vernaculars that employs subword units. The proposed solution is based on the Transformer model that has been presented lately. The use of subword units and shared vocabulary within the Arabic dialect (the source language) and modern standard Arabic (the target language) enhances the behavior of the multi-head attention sublayers for the encoder by obtaining the overall dependencies between words of input sentence for Arabic vernacular. Experiments are carried out from Levantine Arabic vernacular (LEV) to modern standard Arabic (MSA) and Maghrebi Arabic vernacular (MAG) to MSA, Gulf–MSA, Nile–MSA, Iraqi Arabic (IRQ) to MSA translation tasks. Extensive experiments confirm that the suggested model adequately addresses the unknown word issue and boosts the quality of translation from Arabic vernaculars to Modern standard Arabic (MSA).

Download Full-text

A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic Dialects

Applied Sciences ◽

10.3390/app8122502 ◽

2018 ◽

Vol 8 (12) ◽

pp. 2502 ◽

Cited By ~ 2

Author(s):

Laith H. Baniata ◽

Seyoung Park ◽

Seong-Bae Park

Keyword(s):

Machine Translation ◽

Modern Standard Arabic ◽

Neural Machine Translation ◽

Standard Arabic ◽

Translation Quality ◽

Pos Tagging ◽

Part Of Speech ◽

Arabic Dialects ◽

Pos Tagger ◽

Modern Standard

The statistical machine translation for the Arabic language integrates external linguistic resources such as part-of-speech tags. The current research presents a Bidirectional Long Short-Term Memory (Bi-LSTM) - Conditional Random Fields (CRF) segment-level Arabic Dialect POS tagger model, which will be integrated into the Multitask Neural Machine Translation (NMT) model. The proposed solution for NMT is based on the recurrent neural network encoder-decoder NMT model that has been introduced recently. The study has proposed and developed a unified Multitask NMT model that shares an encoder between the two tasks; Arabic Dialect (AD) to Modern Standard Arabic (MSA) translation task and the segment-level POS tagging tasks. A shared layer and an invariant layer are shared between the translation tasks. By training translation tasks and POS tagging task alternately, the proposed model can leverage the characteristic information and improve the translation quality from Arabic dialects to Modern Standard Arabic. The experiments are conducted from Levantine Arabic (LA) to MSA and Maghrebi Arabic (MA) to MSA translation tasks. As an additional linguistic resource, the segment-level part-of-speech tags for Arabic dialects were also exploited. Experiments suggest that translation quality and the performance of POS tagger were improved with the implementation of multitask learning approach.

Download Full-text

A Neural Machine Translation Model for Arabic Dialects That Utilizes Multitask Learning (MTL)

Computational Intelligence and Neuroscience ◽

10.1155/2018/7534712 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 6

Author(s):

Laith H. Baniata ◽

Seyoung Park ◽

Seong-Bae Park

Keyword(s):

Machine Translation ◽

Multitask Learning ◽

Learning Problems ◽

Neural Machine Translation ◽

Translation Model ◽

Research Article ◽

Proposed Model ◽

Arabic Dialects ◽

Modern Standard

In this research article, we study the problem of employing a neural machine translation model to translate Arabic dialects to Modern Standard Arabic. The proposed solution of the neural machine translation model is prompted by the recurrent neural network-based encoder-decoder neural machine translation model that has been proposed recently, which generalizes machine translation as sequence learning problems. We propose the development of a multitask learning (MTL) model which shares one decoder among language pairs, and every source language has a separate encoder. The proposed model can be applied to limited volumes of data as well as extensive amounts of data. Experiments carried out have shown that the proposed MTL model can ensure a higher quality of translation when compared to the individually learned model.

Download Full-text

Neural Machine Translation from Jordanian Dialect to Modern Standard Arabic

2020 11th International Conference on Information and Communication Systems (ICICS) ◽

10.1109/icics49469.2020.239505 ◽

2020 ◽

Author(s):

Roqayah Al-Ibrahim ◽

Rehab M. Duwairi

Keyword(s):

Machine Translation ◽

Modern Standard Arabic ◽

Neural Machine Translation ◽

Standard Arabic ◽

Modern Standard

Download Full-text

Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation

10.18653/v1/2021.calcs-1.8 ◽

2021 ◽

Author(s):

El Moatez Billah Nagoudi ◽

AbdelRahim Elmadany ◽

Muhammad Abdul-Mageed

Keyword(s):

Machine Translation ◽

Modern Standard Arabic ◽

Standard Arabic ◽

Modern Standard

Download Full-text

Synchronous Bidirectional Neural Machine Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00256 ◽

2019 ◽

Vol 7 ◽

pp. 91-105 ◽

Cited By ~ 8

Author(s):

Long Zhou ◽

Jiajun Zhang ◽

Chengqing Zong

Keyword(s):

Machine Translation ◽

Large Scale ◽

State Of The Art ◽

Target Language ◽

Single Model ◽

Neural Machine Translation ◽

German Translation ◽

Transformer Model ◽

Target Side ◽

Future Information

Existing approaches to neural machine translation (NMT) generate the target language sequence token-by-token from left to right. However, this kind of unidirectional decoding framework cannot make full use of the target-side future contexts which can be produced in a right-to-left decoding direction, and thus suffers from the issue of unbalanced outputs. In this paper, we introduce a synchronous bidirectional–neural machine translation (SB-NMT) that predicts its outputs using left-to-right and right-to-left decoding simultaneously and interactively, in order to leverage both of the history and future information at the same time. Specifically, we first propose a new algorithm that enables synchronous bidirectional decoding in a single model. Then, we present an interactive decoding model in which left-to-right (right-to-left) generation does not only depend on its previously generated outputs, but also relies on future contexts predicted by right-to-left (left-to-right) decoding. We extensively evaluate the proposed SB-NMT model on large-scale NIST Chinese-English, WMT14 English-German, and WMT18 Russian-English translation tasks. Experimental results demonstrate that our model achieves significant improvements over the strong Transformer model by 3.92, 1.49, and 1.04 BLEU points, respectively, and obtains the state-of-the-art per- formance on Chinese-English and English- German translation tasks. 1

Download Full-text

The Corpus Based Approach to Sentiment Analysis in Modern Standard Arabic and Arabic Dialects: A Literature Review

Journal of Polytechnic ◽

10.2339/politeknik.403975 ◽

2018 ◽

Cited By ~ 3

Author(s):

Anwar Alnawas ◽

Nursal Arıcı

Keyword(s):

Literature Review ◽

Sentiment Analysis ◽

Modern Standard Arabic ◽

Standard Arabic ◽

Arabic Dialects ◽

Modern Standard

Download Full-text

A survey of automatic Arabic diacritization techniques

Natural Language Engineering ◽

10.1017/s1351324913000284 ◽

2013 ◽

Vol 21 (3) ◽

pp. 477-495 ◽

Cited By ~ 26

Author(s):

AQIL M. AZMI ◽

REHAM S. ALMAJED

Keyword(s):

Information Retrieval ◽

Machine Translation ◽

Arabic Language ◽

Future Trend ◽

Text To Speech ◽

Modern Standard Arabic ◽

Standard Arabic ◽

The Future ◽

Modern Standard

AbstractIn Modern Standard Arabic texts are typically written without diacritical markings. The diacritics are important to clarify the sense and meaning of words. Lack of these markings may lead to ambiguity even for the natives. Often the natives successfully disambiguate the meaning through the context; however, many Arabic applications, such as machine translation, text-to-speech, and information retrieval, are vulnerable due to lack of diacritics. The process of automatically restoring diacritical marks is called diacritization or diacritic restoration. In this paper we discuss the properties of the Arabic language and the issues that are related to the lack of the diacritical marking. It will be followed by a survey of the recent algorithms that were developed to solve the diacritization problem. We also look into the future trend for researchers working in this area.

Download Full-text

Dual contextual module for neural machine translation

Machine Translation ◽

10.1007/s10590-021-09282-0 ◽

2021 ◽

Author(s):

Isaac Kojo Essel Ampomah ◽

Sally McClean ◽

Glenn Hawe

Keyword(s):

Machine Translation ◽

Contextual Information ◽

Attention Mechanism ◽

The Self ◽

Experimental Results ◽

Neural Machine Translation ◽

Translation Model ◽

Global Context ◽

Overall Performance ◽

Transformer Model

AbstractSelf-attention-based encoder-decoder frameworks have drawn increasing attention in recent years. The self-attention mechanism generates contextual representations by attending to all tokens in the sentence. Despite improvements in performance, recent research argues that the self-attention mechanism tends to concentrate more on the global context with less emphasis on the contextual information available within the local neighbourhood of tokens. This work presents the Dual Contextual (DC) module, an extension of the conventional self-attention unit, to effectively leverage both the local and global contextual information. The goal is to further improve the sentence representation ability of the encoder and decoder subnetworks, thus enhancing the overall performance of the translation model. Experimental results on WMT’14 English-German (En$$\rightarrow $$ → De) and eight IWSLT translation tasks show that the DC module can further improve the translation performance of the Transformer model.

Download Full-text

Source-Word Decomposition for Neural Machine Translation

Mathematical Problems in Engineering ◽

10.1155/2020/4795187 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Thien Nguyen ◽

Hoai Le ◽

Van-Huy Pham

Keyword(s):

Machine Translation ◽

Linguistic Features ◽

Specialized Knowledge ◽

Neural Machine Translation ◽

Translation Model ◽

Part Of Speech ◽

Effective System ◽

Feature Based ◽

Transformer Model ◽

Translation Systems

End-to-end neural machine translation does not require us to have specialized knowledge of investigated language pairs in building an effective system. On the other hand, feature engineering proves to be vital in other artificial intelligence fields, such as speech recognition and computer vision. Inspired by works in those fields, in this paper, we propose a novel feature-based translation model by modifying the state-of-the-art transformer model. Specifically, the encoder of the modified transformer model takes input combinations of linguistic features comprising of lemma, dependency label, part-of-speech tag, and morphological label instead of source words. The experiment results for the Russian-Vietnamese language pair show that the proposed feature-based transformer model improves over the strongest baseline transformer translation model by impressive 4.83 BLEU. In addition, experiment analysis reveals that human judgment on the translation results strongly confirms machine judgment. Our model could be useful in building translation systems translating from a highly inflectional language into a noninflectional language.

Download Full-text

Modern Standard Arabic as a target language in simultaneous interpreting

The Routledge Handbook of Arabic Translation ◽

10.4324/9781315661346-21 ◽

2019 ◽

pp. 333-349

Author(s):

Marwa Shamy

Keyword(s):

Target Language ◽

Modern Standard Arabic ◽

Simultaneous Interpreting ◽

Standard Arabic ◽

Modern Standard

Download Full-text