Hybrid Arabic-English Machine Translation to Solve Reordering and Ambiguity Problems

Khalid Shaker Alubaidi

doi:10.21928/juhd.v1n4y2015.pp413-416

Hybrid Arabic-English Machine Translation to Solve Reordering and Ambiguity Problems

Journal of University of Human Development ◽

10.21928/juhd.v1n4y2015.pp413-416 ◽

2015 ◽

Vol 1 (4) ◽

pp. 413

Author(s):

Khalid Shaker Alubaidi

Keyword(s):

Machine Translation ◽

Target Material ◽

Rule Based ◽

Parallel Corpora ◽

Linguistic Rule ◽

Hybrid Machine Translation ◽

In The Beginning ◽

Better Than ◽

Lexical Analyzer

The problem in Arabic to English rule-based machine translation is that the rule-based lexical analyzer leaves some amount of ambiguity; therefore a statistical approach is used to resolve the ambiguity problem. Rule Based Machine Translation (RBMT) uses linguistic rule between two languages which is built manually by human in general, whereas SMT uses appearance statistic of word in parallel corpora. In this paper, those different approaches are combined into Arabic-English Hybrid Machine Translation (HMT) system to get the advantage from both kind of information. In the beginning, Arabic text will be inputted into RBMT to solve reordering problem. Then, the output will be edited by SMT to solve the ambiguity problem and generate the final translation of English text. SMT is capable to do this because on the training process, it uses RBMT’s output (English) as source material and real translation (English) as target material. The results showed that the quality of translation in HMT system is better than SMT system.

Download Full-text

Learning Lessons from Bilingual Corpora: Benefits for Machine Translation

International Journal of Corpus Linguistics ◽

10.1075/ijcl.5.2.06str ◽

2000 ◽

Vol 5 (2) ◽

pp. 199-230 ◽

Cited By ~ 1

Author(s):

Oliver Streiter ◽

Leonid L. Iomdin

Keyword(s):

Machine Translation ◽

Subject Domain ◽

Rule Based ◽

Parallel Corpora ◽

Specific Subject ◽

Multiword Expressions ◽

Bilingual Corpora ◽

Subject Domains

The research described in this paper is rooted in the endeavors to combine the advantages of corpus-based and rule-based MT approaches in order to improve the performance of MT systems—most importantly, the quality of translation. The authors review the ongoing activities in the field and present a case study, which shows how translation knowledge can be drawn from parallel corpora and compiled into the lexicon of a rule-based MT system. These data are obtained with the help of three procedures: (1) identification of hence unknown one-word translations, (2) statistical rating of the known one-word translations, and (3) extraction of new translations of multiword expressions (MWEs) followed by compilation steps which create new rules for the MT engine. As a result, the lexicon is enriched with translation equivalents attested for different subject domains, which facilitates the tuning of the MT system to a specific subject domain and improves the quality and adequacy of translation.

Download Full-text

Recurrent Stacking of Layers for Compact Neural Machine Translation Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016292 ◽

2019 ◽

Vol 33 ◽

pp. 6292-6299 ◽

Cited By ~ 2

Author(s):

Raj Dabre ◽

Atsushi Fujita

Keyword(s):

Machine Translation ◽

Single Layer ◽

Training Data ◽

Neural Machine Translation ◽

Parallel Corpora ◽

Translation Quality ◽

Sequence Generation ◽

Sequence Modeling ◽

Back Translation

In encoder-decoder based sequence-to-sequence modeling, the most common practice is to stack a number of recurrent, convolutional, or feed-forward layers in the encoder and decoder. While the addition of each new layer improves the sequence generation quality, this also leads to a significant increase in the number of parameters. In this paper, we propose to share parameters across all layers thereby leading to a recurrently stacked sequence-to-sequence model. We report on an extensive case study on neural machine translation (NMT) using our proposed method, experimenting with a variety of datasets. We empirically show that the translation quality of a model that recurrently stacks a single-layer 6 times, despite its significantly fewer parameters, approaches that of a model that stacks 6 different layers. We also show how our method can benefit from a prevalent way for improving NMT, i.e., extending training data with pseudo-parallel corpora generated by back-translation. We then analyze the effects of recurrently stacked layers by visualizing the attentions of models that use recurrently stacked layers and models that do not. Finally, we explore the limits of parameter sharing where we share even the parameters between the encoder and decoder in addition to recurrent stacking of layers.

Download Full-text

Function words in statistical machine-translated Chinese and original Chinese: A study into the translationese of machine translation systems

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy050 ◽

2018 ◽

Vol 34 (4) ◽

pp. 752-771

Author(s):

Chen-li Kuo

Keyword(s):

Machine Translation ◽

Attribute Selection ◽

Close Attention ◽

Function Words ◽

Rule Based ◽

Source Language ◽

Statistical Mt ◽

Chinese Texts ◽

Translation Systems

Abstract Statistical approaches have become the mainstream in machine translation (MT), for their potential in producing less rigid and more natural translations than rule-based approaches. However, on closer examination, the uses of function words between statistical machine-translated Chinese and the original Chinese are different, and such differences may be associated with translationese as discussed in translation studies. This article examines the distribution of Chinese function words in a comparable corpus consisting of MTs and the original Chinese texts extracted from Wikipedia. An attribute selection technique is used to investigate which types of function words are significant in discriminating between statistical machine-translated Chinese and the original texts. The results show that statistical MT overuses the most frequent function words, even when alternatives exist. To improve the quality of the end product, developers of MT should pay close attention to modelling Chinese conjunctions and adverbial function words. The results also suggest that machine-translated Chinese shares some characteristics with human-translated texts, including normalization and being influenced by the source language; however, machine-translated texts do not exhibit other characteristics of translationese such as explicitation.

Download Full-text

A brief study of the Autshumato Machine Translation Web Service for South African languages

Literator ◽

10.4102/lit.v42i1.1766 ◽

2021 ◽

Vol 42 (1) ◽

Author(s):

Nomsa J. Skosana ◽

Respect Mlambo

Keyword(s):

Web Service ◽

South African ◽

Machine Translation ◽

Language Processing ◽

High Speed ◽

Training Data ◽

Parallel Corpora ◽

African Languages ◽

Official Languages

The scarcity of adequate resources for South African languages poses a huge challenge for their functional development in specialised fields such as science and technology. The study examines the Autshumato Machine Translation (MT) Web Service, created by the Centre for Text Technology at the North-West University. This software supports both formal and informal translations as a machine-aided human translation tool. We investigate the system in terms of its advantages and limitations and suggest possible solutions for South African languages. The results show that the system is essential as it offers high-speed translation and operates as an open-source platform. It also provides multiple translations from sentences, documents and web pages. Some South African languages were included whilst others were excluded and we find this to be a limitation of the system. We also find that the system was trained with a limited amount of data, and this has an adverse effect on the quality of the output. The study suggests that adding specialised parallel corpora from various contemporary fields for all official languages and involving language experts in the pre-editing of training data can be a major step towards improving the quality of the system’s output. The study also outlines that developers should consider integrating the system with other natural language processing applications. Finally, the initiatives discussed in this study will help to improve this MT system to be a more effective translation tool for all the official languages of South Africa.

Download Full-text

Rule Based and Expectation Maximization algorithm for Arabic-English Hybrid Machine Translation

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v5.i2.pp72-79 ◽

2016 ◽

Vol 5 (2) ◽

pp. 72

Author(s):

Arwa Hatem Alqudsi ◽

Nazlia Omar ◽

Rabha W. Ibrahim

Keyword(s):

Em Algorithm ◽

Machine Translation ◽

Expectation Maximization ◽

Statistical Machine Translation ◽

Expectation Maximization Algorithm ◽

Algorithm Performance ◽

Rule Based ◽

Hybrid Machine ◽

Hybrid Machine Translation ◽

Rule Based Approach

<p><strong> </strong>It is practically impossible for pure machine translation approach to process all of translation problems; however, Rule Based Machine Translation and Statistical Machine translation (RBMT and SMT) use different architectures for performing translation task. Lexical analyser and syntactic analyser are solved by Rule Based and some amount of ambiguity is left to be solved by Expectation–Maximization (EM) algorithm, which is an iterative statistic algorithm for finding maximum likelihood. In this paper we have proposed an integrated Hybrid Machine Translation (HMT) system. The goal is to combine the best properties of each approach. Initially, Arabic text is keyed into RBMT; then the output will be edited by EM algorithm to generate the final translation of English text. As we have seen in previous works, the performance and enhancement of EM algorithm, the key of EM algorithm performance is the ability to accurately transform a frequency from one language to another. Results showing that, as proved by BLEU system, the proposed method can substantially outperform standard Rule Based approach and EM algorithm in terms of frequency and accuracy. The results of this study have been showed that the score of HMT system is higher than SMT system in all cases. When combining two approaches, HMT outperformed SMT in Bleu score.</p>

Download Full-text

How to evaluate machine translation: A review of automated and human metrics

Natural Language Engineering ◽

10.1017/s1351324919000469 ◽

2019 ◽

Vol 26 (2) ◽

pp. 137-161

Author(s):

Eirini Chatzikoumi

Keyword(s):

Machine Translation ◽

Subjective Evaluation ◽

Evaluation Methods ◽

Quality Estimation ◽

Neural Machine Translation ◽

Mt Evaluation ◽

Error Classification ◽

Better Than ◽

Detailed Presentation

AbstractThis article presents the most up-to-date, influential automated, semiautomated and human metrics used to evaluate the quality of machine translation (MT) output and provides the necessary background for MT evaluation projects. Evaluation is, as repeatedly admitted, highly relevant for the improvement of MT. This article is divided into three parts: the first one is dedicated to automated metrics; the second, to human metrics; and the last, to the challenges posed by neural machine translation (NMT) regarding its evaluation. The first part includes reference translation–based metrics; confidence or quality estimation (QE) metrics, which are used as alternatives for quality assessment; and diagnostic evaluation based on linguistic checkpoints. Human evaluation metrics are classified according to the criterion of whether human judges directly express a so-called subjective evaluation judgment, such as ‘good’ or ‘better than’, or not, as is the case in error classification. The former methods are based on directly expressed judgment (DEJ); therefore, they are called ‘DEJ-based evaluation methods’, while the latter are called ‘non-DEJ-based evaluation methods’. In the DEJ-based evaluation section, tasks such as fluency and adequacy annotation, ranking and direct assessment (DA) are presented, whereas in the non-DEJ-based evaluation section, tasks such as error classification and postediting are detailed, with definitions and guidelines, thus rendering this article a useful guide for evaluation projects. Following the detailed presentation of the previously mentioned metrics, the specificities of NMT are set forth along with suggestions for its evaluation, according to the latest studies. As human translators are the most adequate judges of the quality of a translation, emphasis is placed on the human metrics seen from a translator-judge perspective to provide useful methodology tools for interdisciplinary research groups that evaluate MT systems.

Download Full-text

Improving the quality of Machine Translation using rule based tense synthesizer for Hindi

2015 IEEE International Advance Computing Conference (IACC) ◽

10.1109/iadcc.2015.7154741 ◽

2015 ◽

Cited By ~ 1

Author(s):

Shashi Pal Singh ◽

Ajai Kumar ◽

Hemant Darbari ◽

Anshika Gupta

Keyword(s):

Machine Translation ◽

Rule Based

Download Full-text

A North Saami to South Saami Machine Translation Prototype

Northern European Journal of Language Technology ◽

10.3384/nejlt.2000-1533.1642 ◽

2016 ◽

Vol 4 ◽

pp. 11-27

Author(s):

Lene Antonsen ◽

Trond Trosterud ◽

Francis M. Tyers

Keyword(s):

Machine Translation ◽

School Administration ◽

Single Domain ◽

Restricted Domain ◽

Rule Based ◽

Rule Based System

The paper describes a rule-based machine translation (MT) system from North to South Saami. The system is designed for a workflow where North Saami functions as pivot language in translation from Norwegian or Swedish. We envisage manual translation from Norwegian or Swedish to North Saami, and thereafter MT to South Saami. The system was aimed at a single domain, that of texts for use in school administration. We evaluated the system in terms of the quality of translations for postediting. Two out of three of the Norwegian to South Saami professional translators found the output of the system to be useful. The evaluation shows that it is possible to make a functioning rule-based system with a small transfer lexicon and a small number of rules and achieve results that are useful for a restricted domain, even if there are substantial differences b etween the languages.

Download Full-text

Improving Statistical Machine Translation by Adapting Translation Models to Translationese

Computational Linguistics ◽

10.1162/coli_a_00159 ◽

2013 ◽

Vol 39 (4) ◽

pp. 999-1023 ◽

Cited By ~ 5

Author(s):

Gennadi Lembersky ◽

Noam Ordan ◽

Shuly Wintner

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Target Language ◽

Common Assumption ◽

Parallel Corpora ◽

The Common ◽

The Right ◽

Translation Systems ◽

Parallel Texts

Translation models used for statistical machine translation are compiled from parallel corpora that are manually translated. The common assumption is that parallel texts are symmetrical: The direction of translation is deemed irrelevant and is consequently ignored. Much research in Translation Studies indicates that the direction of translation matters, however, as translated language (translationese) has many unique properties. It has already been shown that phrase tables constructed from parallel corpora translated in the same direction as the translation task outperform those constructed from corpora translated in the opposite direction. We reconfirm that this is indeed the case, but emphasize the importance of also using texts translated in the “wrong” direction. We take advantage of information pertaining to the direction of translation in constructing phrase tables by adapting the translation model to the special properties of translationese. We explore two adaptation techniques: First, we create a mixture model by interpolating phrase tables trained on texts translated in the “right” and the “wrong” directions. The weights for the interpolation are determined by minimizing perplexity. Second, we define entropy-based measures that estimate the correspondence of target-language phrases to translationese, thereby eliminating the need to annotate the parallel corpus with information pertaining to the direction of translation. We show that incorporating these measures as features in the phrase tables of statistical machine translation systems results in consistent, statistically significant improvement in the quality of the translation.

Download Full-text

Comma Analysis and Processing for Improving Translation Quality of Long Sentences in Rule-based English-Korean Machine Translation

Proceedings of the 11th International Conference on Agents and Artificial Intelligence ◽

10.5220/0007310604740479 ◽

2019 ◽

Author(s):

Sung-Dong Kim

Keyword(s):

Machine Translation ◽

Rule Based ◽

Translation Quality

Download Full-text