Pre-Editing of Google Neural Machine Translation

Alvin Taufik

doi:10.30813/jelc.v10i2.2137

Pre-Editing of Google Neural Machine Translation

Journal of English Language and Culture ◽

10.30813/jelc.v10i2.2137 ◽

2020 ◽

Vol 10 (2) ◽

Author(s):

Alvin Taufik

Keyword(s):

Machine Translation ◽

Focus Of Attention ◽

Source Text ◽

Neural Machine Translation ◽

Efficiency And Effectiveness ◽

The One ◽

Editing Process ◽

Oriented Research ◽

Language Pair ◽

New Machine

Even with the new Machine Translation (MT) platform available in Google today (Neural, as compared to the previous Statistical one in the previous years), the output is not always satisfactory. This is even more obvious in specific contexts and situations. Research has shown that the implementation of rules for the process prior to and the one that follows the input activities into an MT (often referred to as the pre-editing and post editing process) has proven to be fruitful (Gerlach, et. al., 2013; Shei, 2002). However, to the best knowledge of the researcher, no research on pre-editing rules on Indonesian input into MT has been conducted. This research is significant because it might increase efficiency and effectiveness of MT, especially for the language pair Indonesian-English. For that reason, this research intends to identify the pre-editing rules required to create a solid basis to translate Indonesian Source Text (ST) into English Target Text (TT). This research adopts the product-oriented research. The results show that in the pre-editing process, the length of the sentence, the conjunctions (subordinative and correlative), and the inappropriate ST words should be the focus of attention.

Download Full-text

Improving thai-lao neural machine translation with similarity lexicon

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-212236 ◽

2021 ◽

pp. 1-10

Author(s):

Zhiqiang Yu ◽

Yuxin Huang ◽

Junjun Guo

Keyword(s):

Machine Translation ◽

Semantic Information ◽

Neural Machine Translation ◽

Low Resource ◽

Translation Quality ◽

Decoder Architecture ◽

Baseline System ◽

Input Sentence ◽

Resource Conditions ◽

Language Pair

It has been shown that the performance of neural machine translation (NMT) drops starkly in low-resource conditions. Thai-Lao is a typical low-resource language pair of tiny parallel corpus, leading to suboptimal NMT performance on it. However, Thai and Lao have considerable similarities in linguistic morphology and have bilingual lexicon which is relatively easy to obtain. To use this feature, we first build a bilingual similarity lexicon composed of pairs of similar words. Then we propose a novel NMT architecture to leverage the similarity between Thai and Lao. Specifically, besides the prevailing sentence encoder, we introduce an extra similarity lexicon encoder into the conventional encoder-decoder architecture, by which the semantic information carried by the similarity lexicon can be represented. We further provide a simple mechanism in the decoder to balance the information representations delivered from the input sentence and the similarity lexicon. Our approach can fully exploit linguistic similarity carried by the similarity lexicon to improve translation quality. Experimental results demonstrate that our approach achieves significant improvements over the state-of-the-art Transformer baseline system and previous similar works.

Download Full-text

Estimating Machine Translation Quality of Any Input Sentence

International Journal of Asian Language Processing ◽

10.1142/s2717554520500022 ◽

2020 ◽

Vol 30 (01) ◽

pp. 2050002

Author(s):

Taichi Aida ◽

Kazuhide Yamamoto

Keyword(s):

Machine Translation ◽

Evaluation Model ◽

Joint Probability ◽

Translation System ◽

Neural Machine Translation ◽

Translation Quality ◽

Machine Translation System ◽

Input Sentence ◽

The One

Current methods of neural machine translation may generate sentences with different levels of quality. Methods for automatically evaluating translation output from machine translation can be broadly classified into two types: a method that uses human post-edited translations for training an evaluation model, and a method that uses a reference translation that is the correct answer during evaluation. On the one hand, it is difficult to prepare post-edited translations because it is necessary to tag each word in comparison with the original translated sentences. On the other hand, users who actually employ the machine translation system do not have a correct reference translation. Therefore, we propose a method that trains the evaluation model without using human post-edited sentences and in the test set, estimates the quality of output sentences without using reference translations. We define some indices and predict the quality of translations with a regression model. For the quality of the translated sentences, we employ the BLEU score calculated from the number of word [Formula: see text]-gram matches between the translated sentence and the reference translation. After that, we compute the correlation between quality scores predicted by our method and BLEU actually computed from references. According to the experimental results, the correlation with BLEU is the highest when XGBoost uses all the indices. Moreover, looking at each index, we find that the sentence log-likelihood and the model uncertainty, which are based on the joint probability of generating the translated sentence, are important in BLEU estimation.

Download Full-text

Attention-Based Syllable Level Neural Machine Translation System for Myanmar to English Language Pair

International Journal on Natural Language Computing ◽

10.5121/ijnlc.2019.8201 ◽

2019 ◽

Vol 8 (2) ◽

pp. 01-11

Author(s):

Yi Mon Shwe Sin ◽

Khin Mar Soe

Keyword(s):

Machine Translation ◽

English Language ◽

Translation System ◽

Neural Machine Translation ◽

Machine Translation System ◽

Language Pair

Download Full-text

Augmenting Neural Machine Translation through Round-Trip Training Approach

Open Computer Science ◽

10.1515/comp-2019-0019 ◽

2019 ◽

Vol 9 (1) ◽

pp. 268-278 ◽

Cited By ~ 1

Author(s):

Benyamin Ahmadnia ◽

Bonnie J. Dorr

Keyword(s):

Machine Translation ◽

Training Data ◽

Training Dataset ◽

Round Trip ◽

Neural Machine Translation ◽

Low Resource ◽

Translation Quality ◽

High Resource ◽

Training Approach ◽

Language Pair

AbstractThe quality of Neural Machine Translation (NMT), as a data-driven approach, massively depends on quantity, quality and relevance of the training dataset. Such approaches have achieved promising results for bilingually high-resource scenarios but are inadequate for low-resource conditions. Generally, the NMT systems learn from millions of words from bilingual training dataset. However, human labeling process is very costly and time consuming. In this paper, we describe a round-trip training approach to bilingual low-resource NMT that takes advantage of monolingual datasets to address training data bottleneck, thus augmenting translation quality. We conduct detailed experiments on English-Spanish as a high-resource language pair as well as Persian-Spanish as a low-resource language pair. Experimental results show that this competitive approach outperforms the baseline systems and improves translation quality.

Download Full-text

POSTEDITING IN MACHINE TRANSLATION

Naukovy Visnyk of South Ukrainian National Pedagogical University named after K D Ushynsky Linguistic Sciences ◽

10.24195/2616-5317-2020-30-7 ◽

2020 ◽

Vol 2020 (30) ◽

pp. 102-119

Author(s):

Tetiana Korolova ◽

Natalya Zhmayeva ◽

Yulia Kolchah

Keyword(s):

Machine Translation ◽

Grammatical Structure ◽

Incorrect Choice ◽

Neural Machine Translation ◽

Translation Quality ◽

Modern Methods ◽

Translation Services ◽

Linguistic Units ◽

Source Message ◽

Language Pair

Modern industry of translation services singles out two translation quality levels that can be reached as a result of machine translation (MT) post-editing: good enough quality foresees rendering the main information of the source message, admitting stylistic, syntactic and morphological flaws while quality similar or equal to human translation is a full dress version of a post-edited text, ready to be published. The overview of MT systems enables us to consider Google Neural Machine Translation (GNMT) which is based on the most modern methods of training to reach maximum improvements the most powerful one. When analyzing texts translated by means of Google Translate the following problems were identified: distortion of the referential meaning of the source message, incorrect choice of variant equivalences, lack of terms harmonization, lack of abbreviations rendering, inconformity of linguistic units in persons, numbers and cases, incorrect choice of functional correspondings when rendering absolute constructions, gerund and participial constructions, literal translation of phrases, lack of transformations of the grammatical structure of the source message (additions, rearrangements). Taking into account the classified issues of machine translation as well as the levels of post-editing quality post-editing of the texts translated by means of MT is carried out, demands and recommendations applicable to post-editing results of MT within the language pair under analysis with respect to peculiarities of the specific MT system and the type of translated texts are provided.

Download Full-text

An Improved English-to-Mizo Neural Machine Translation

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3445974 ◽

2021 ◽

Vol 20 (4) ◽

pp. 1-21

Author(s):

Candy Lalrempuii ◽

Badal Soni ◽

Partha Pakray

Keyword(s):

Machine Translation ◽

Model Performance ◽

Sentence Length ◽

Prediction Errors ◽

Indian Languages ◽

Neural Machine Translation ◽

Automatic Translation ◽

Translation Techniques ◽

Language Pair

Machine Translation is an effort to bridge language barriers and misinterpretations, making communication more convenient through the automatic translation of languages. The quality of translations produced by corpus-based approaches predominantly depends on the availability of a large parallel corpus. Although machine translation of many Indian languages has progressively gained attention, there is very limited research on machine translation and the challenges of using various machine translation techniques for a low-resource language such as Mizo. In this article, we have implemented and compared statistical-based approaches with modern neural-based approaches for the English–Mizo language pair. We have experimented with different tokenization methods, architectures, and configurations. The performance of translations predicted by the trained models has been evaluated using automatic and human evaluation measures. Furthermore, we have analyzed the prediction errors of the models and the quality of predictions based on variations in sentence length and compared the model performance with the existing baselines.

Download Full-text

Quality of neural machine translation for the Korean-Japanese language pair - the development of editing codes for machine translation -

Interpretation and Translation ◽

10.20305/it201801043071 ◽

2018 ◽

Vol 20 (1) ◽

pp. 43-71

Author(s):

JuRiAe Lee ◽

Keyword(s):

Machine Translation ◽

Japanese Language ◽

Neural Machine Translation ◽

Language Pair

Download Full-text

Making sense of neural machine translation

Translation Spaces ◽

10.1075/ts.6.2.06for ◽

2017 ◽

Vol 6 (2) ◽

pp. 291-309 ◽

Cited By ~ 11

Author(s):

Mikel L. Forcada

Keyword(s):

Machine Translation ◽

New Technology ◽

Statistical Machine Translation ◽

Software Requirements ◽

Word Embeddings ◽

Neural Machine Translation ◽

Training Time ◽

Making Sense ◽

Translation Systems ◽

New Machine

Abstract The last few years have witnessed a surge in the interest of a new machine translation paradigm: neural machine translation (NMT). Neural machine translation is starting to displace its corpus-based predecessor, statistical machine translation (SMT). In this paper, I introduce NMT, and explain in detail, without the mathematical complexity, how neural machine translation systems work, how they are trained, and their main differences with SMT systems. The paper will try to decipher NMT jargon such as “distributed representations”, “deep learning”, “word embeddings”, “vectors”, “layers”, “weights”, “encoder”, “decoder”, and “attention”, and build upon these concepts, so that individual translators and professionals working for the translation industry as well as students and academics in translation studies can make sense of this new technology and know what to expect from it. Aspects such as how NMT output differs from SMT, and the hardware and software requirements of NMT, both at training time and at run time, on the translation industry, will be discussed.

Download Full-text

COSTA MT Evaluation Tool: An Open Toolkit for Human Machine Translation Evaluation

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2013-0014 ◽

2013 ◽

Vol 100 (1) ◽

pp. 83-89 ◽

Cited By ~ 1

Author(s):

Konstantinos Chatzitheodorou

Keyword(s):

Machine Translation ◽

Evaluation Tool ◽

Machine Translation Evaluation ◽

Translation Error ◽

Mt Evaluation ◽

Java Program ◽

Error Classification ◽

The One ◽

Language Pair

Abstract A hotly debated topic in machine translation is human evaluation. On the one hand, it is extremely costly and time consuming; on the other, it is an important and unfortunately inevitable part of any system. This paper describes COSTA MT Evaluation Tool, an open stand-alone tool for human machine translation evaluation. It is a Java program that can be used to manually evaluate the quality of the machine translation output. It is simple in use, designed to allow machine translation potential users and developers to analyze their systems using a friendly environment. It enables the ranking of the quality of machine translation output segment-bysegment for a particular language pair. The benefits of this tool are multiple. Firstly, it is a rich repository of commonly used industry criteria (fluency, adequacy and translation error classification). Secondly, it is freely available to anyone and provides results that can be further analyzed. Thirdly, it estimates the time needed for each evaluated sentence. Finally, it gives suggestions about the fuzzy matching of the candidate translations.

Download Full-text

Evaluation of English–Slovak Neural and Statistical Machine Translation

Applied Sciences ◽

10.3390/app11072948 ◽

2021 ◽

Vol 11 (7) ◽

pp. 2948

Author(s):

Lucia Benkova ◽

Dasa Munkova ◽

Ľubomír Benko ◽

Michal Munk

Keyword(s):

Machine Translation ◽

Statistical Approach ◽

Statistical Machine Translation ◽

Specific Domain ◽

Neural Network Approach ◽

Neural Machine Translation ◽

Translation Quality ◽

The Neural Network ◽

Language Pair

This study is focused on the comparison of phrase-based statistical machine translation (SMT) systems and neural machine translation (NMT) systems using automatic metrics for translation quality evaluation for the language pair of English and Slovak. As the statistical approach is the predecessor of neural machine translation, it was assumed that the neural network approach would generate results with a better quality. An experiment was performed using residuals to compare the scores of automatic metrics of the accuracy (BLEU_n) of the statistical machine translation with those of the neural machine translation. The results showed that the assumption of better neural machine translation quality regardless of the system used was confirmed. There were statistically significant differences between the SMT and NMT in favor of the NMT based on all BLEU_n scores. The neural machine translation achieved a better quality of translation of journalistic texts from English into Slovak, regardless of if it was a system trained on general texts, such as Google Translate, or specific ones, such as the European Commission’s (EC’s) tool, which was trained on a specific-domain.

Download Full-text