scholarly journals Pre-Editing of Google Neural Machine Translation

2020 ◽  
Vol 10 (2) ◽  
Author(s):  
Alvin Taufik

<span>Even with the new Machine Translation (MT) platform available in Google today (Neural, as compared to the previous Statistical one in the previous years), the output is not always satisfactory. This is even more obvious in specific contexts and situations. </span><span lang="IN">Research has shown that the implementation of rules for the process prior to and the one that follows the input activities into an MT (often referred to as the pre-editing and post editing process) has proven to be fruitful (Gerlach, et. al., 2013; Shei, 2002). However, to the best knowledge of the researcher, no research on pre-editing rules on Indonesian input into MT has been conducted. This research is significant because it might increase efficiency and effectiveness of MT, especially for the language pair Indonesian-English. For that reason, t</span><span>his research intends to identify the pre-editing </span><span lang="IN">rul</span><span>es required to create a solid basis to translate Indonesian Source Text (ST) into English Target Text (TT). </span><span lang="IN">This research adopts the product-oriented research. The results show that in the pre-editing process, the length of the sentence, the conjunctions (subordinative and correlative), and the inappropriate ST words should be the focus of attention.</span>

2021 ◽  
pp. 1-10
Author(s):  
Zhiqiang Yu ◽  
Yuxin Huang ◽  
Junjun Guo

It has been shown that the performance of neural machine translation (NMT) drops starkly in low-resource conditions. Thai-Lao is a typical low-resource language pair of tiny parallel corpus, leading to suboptimal NMT performance on it. However, Thai and Lao have considerable similarities in linguistic morphology and have bilingual lexicon which is relatively easy to obtain. To use this feature, we first build a bilingual similarity lexicon composed of pairs of similar words. Then we propose a novel NMT architecture to leverage the similarity between Thai and Lao. Specifically, besides the prevailing sentence encoder, we introduce an extra similarity lexicon encoder into the conventional encoder-decoder architecture, by which the semantic information carried by the similarity lexicon can be represented. We further provide a simple mechanism in the decoder to balance the information representations delivered from the input sentence and the similarity lexicon. Our approach can fully exploit linguistic similarity carried by the similarity lexicon to improve translation quality. Experimental results demonstrate that our approach achieves significant improvements over the state-of-the-art Transformer baseline system and previous similar works.


2020 ◽  
Vol 30 (01) ◽  
pp. 2050002
Author(s):  
Taichi Aida ◽  
Kazuhide Yamamoto

Current methods of neural machine translation may generate sentences with different levels of quality. Methods for automatically evaluating translation output from machine translation can be broadly classified into two types: a method that uses human post-edited translations for training an evaluation model, and a method that uses a reference translation that is the correct answer during evaluation. On the one hand, it is difficult to prepare post-edited translations because it is necessary to tag each word in comparison with the original translated sentences. On the other hand, users who actually employ the machine translation system do not have a correct reference translation. Therefore, we propose a method that trains the evaluation model without using human post-edited sentences and in the test set, estimates the quality of output sentences without using reference translations. We define some indices and predict the quality of translations with a regression model. For the quality of the translated sentences, we employ the BLEU score calculated from the number of word [Formula: see text]-gram matches between the translated sentence and the reference translation. After that, we compute the correlation between quality scores predicted by our method and BLEU actually computed from references. According to the experimental results, the correlation with BLEU is the highest when XGBoost uses all the indices. Moreover, looking at each index, we find that the sentence log-likelihood and the model uncertainty, which are based on the joint probability of generating the translated sentence, are important in BLEU estimation.


2019 ◽  
Vol 9 (1) ◽  
pp. 268-278 ◽  
Author(s):  
Benyamin Ahmadnia ◽  
Bonnie J. Dorr

AbstractThe quality of Neural Machine Translation (NMT), as a data-driven approach, massively depends on quantity, quality and relevance of the training dataset. Such approaches have achieved promising results for bilingually high-resource scenarios but are inadequate for low-resource conditions. Generally, the NMT systems learn from millions of words from bilingual training dataset. However, human labeling process is very costly and time consuming. In this paper, we describe a round-trip training approach to bilingual low-resource NMT that takes advantage of monolingual datasets to address training data bottleneck, thus augmenting translation quality. We conduct detailed experiments on English-Spanish as a high-resource language pair as well as Persian-Spanish as a low-resource language pair. Experimental results show that this competitive approach outperforms the baseline systems and improves translation quality.


Author(s):  
Tetiana Korolova ◽  
Natalya Zhmayeva ◽  
Yulia Kolchah

Modern industry of translation services singles out two translation quality levels that can be reached as a result of machine translation (MT) post-editing: good enough quality foresees rendering the main information of the source message, admitting stylistic, syntactic and morphological flaws while quality similar or equal to human translation is a full dress version of a post-edited text, ready to be published. The overview of MT systems enables us to consider Google Neural Machine Translation (GNMT) which is based on the most modern methods of training to reach maximum improvements the most powerful one. When analyzing texts translated by means of Google Translate the following problems were identified: distortion of the referential meaning of the source message, incorrect choice of variant equivalences, lack of terms harmonization, lack of abbreviations rendering, inconformity of linguistic units in persons, numbers and cases, incorrect choice of functional correspondings when rendering absolute constructions, gerund and participial constructions, literal translation of phrases, lack of transformations of the grammatical structure of the source message (additions, rearrangements). Taking into account the classified issues of machine translation as well as the levels of post-editing quality post-editing of the texts translated by means of MT is carried out, demands and recommendations applicable to post-editing results of MT within the language pair under analysis with respect to peculiarities of the specific MT system and the type of translated texts are provided.


Author(s):  
Candy Lalrempuii ◽  
Badal Soni ◽  
Partha Pakray

Machine Translation is an effort to bridge language barriers and misinterpretations, making communication more convenient through the automatic translation of languages. The quality of translations produced by corpus-based approaches predominantly depends on the availability of a large parallel corpus. Although machine translation of many Indian languages has progressively gained attention, there is very limited research on machine translation and the challenges of using various machine translation techniques for a low-resource language such as Mizo. In this article, we have implemented and compared statistical-based approaches with modern neural-based approaches for the English–Mizo language pair. We have experimented with different tokenization methods, architectures, and configurations. The performance of translations predicted by the trained models has been evaluated using automatic and human evaluation measures. Furthermore, we have analyzed the prediction errors of the models and the quality of predictions based on variations in sentence length and compared the model performance with the existing baselines.


2017 ◽  
Vol 6 (2) ◽  
pp. 291-309 ◽  
Author(s):  
Mikel L. Forcada

Abstract The last few years have witnessed a surge in the interest of a new machine translation paradigm: neural machine translation (NMT). Neural machine translation is starting to displace its corpus-based predecessor, statistical machine translation (SMT). In this paper, I introduce NMT, and explain in detail, without the mathematical complexity, how neural machine translation systems work, how they are trained, and their main differences with SMT systems. The paper will try to decipher NMT jargon such as “distributed representations”, “deep learning”, “word embeddings”, “vectors”, “layers”, “weights”, “encoder”, “decoder”, and “attention”, and build upon these concepts, so that individual translators and professionals working for the translation industry as well as students and academics in translation studies can make sense of this new technology and know what to expect from it. Aspects such as how NMT output differs from SMT, and the hardware and software requirements of NMT, both at training time and at run time, on the translation industry, will be discussed.


2013 ◽  
Vol 100 (1) ◽  
pp. 83-89 ◽  
Author(s):  
Konstantinos Chatzitheodorou

Abstract A hotly debated topic in machine translation is human evaluation. On the one hand, it is extremely costly and time consuming; on the other, it is an important and unfortunately inevitable part of any system. This paper describes COSTA MT Evaluation Tool, an open stand-alone tool for human machine translation evaluation. It is a Java program that can be used to manually evaluate the quality of the machine translation output. It is simple in use, designed to allow machine translation potential users and developers to analyze their systems using a friendly environment. It enables the ranking of the quality of machine translation output segment-bysegment for a particular language pair. The benefits of this tool are multiple. Firstly, it is a rich repository of commonly used industry criteria (fluency, adequacy and translation error classification). Secondly, it is freely available to anyone and provides results that can be further analyzed. Thirdly, it estimates the time needed for each evaluated sentence. Finally, it gives suggestions about the fuzzy matching of the candidate translations.


2021 ◽  
Vol 11 (7) ◽  
pp. 2948
Author(s):  
Lucia Benkova ◽  
Dasa Munkova ◽  
Ľubomír Benko ◽  
Michal Munk

This study is focused on the comparison of phrase-based statistical machine translation (SMT) systems and neural machine translation (NMT) systems using automatic metrics for translation quality evaluation for the language pair of English and Slovak. As the statistical approach is the predecessor of neural machine translation, it was assumed that the neural network approach would generate results with a better quality. An experiment was performed using residuals to compare the scores of automatic metrics of the accuracy (BLEU_n) of the statistical machine translation with those of the neural machine translation. The results showed that the assumption of better neural machine translation quality regardless of the system used was confirmed. There were statistically significant differences between the SMT and NMT in favor of the NMT based on all BLEU_n scores. The neural machine translation achieved a better quality of translation of journalistic texts from English into Slovak, regardless of if it was a system trained on general texts, such as Google Translate, or specific ones, such as the European Commission’s (EC’s) tool, which was trained on a specific-domain.


Sign in / Sign up

Export Citation Format

Share Document