Translators’ perceptions of literary post-editing using statistical and neural machine translation

Abstract In the context of recent improvements in the quality of machine translation (MT) output and new use cases being found for that output, this article reports on an experiment using statistical and neural MT systems to translate literature. Six professional translators with experience of literary translation produced English-to-Catalan translations under three conditions: translation from scratch, neural MT post-editing, and statistical MT post-editing. They provided feedback before and after the translation via questionnaires and interviews. While all participants prefer to translate from scratch, mostly due to the freedom to be creative without the constraints of segment-level segmentation, those with less experience find the MT suggestions useful.

Download Full-text

Is Neural Machine Translation the New State of the Art?

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0013 ◽

2017 ◽

Vol 108 (1) ◽

pp. 109-120 ◽

Cited By ~ 37

Author(s):

Sheila Castilho ◽

Joss Moorkens ◽

Federico Gaspari ◽

Iacer Calixto ◽

John Tinsley ◽

...

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Evaluation Methods ◽

Automatic Evaluation ◽

New Paradigm ◽

Neural Machine Translation ◽

Human Evaluation ◽

Statistical Mt

Abstract This paper discusses neural machine translation (NMT), a new paradigm in the MT field, comparing the quality of NMT systems with statistical MT by describing three studies using automatic and human evaluation methods. Automatic evaluation results presented for NMT are very promising, however human evaluations show mixed results. We report increases in fluency but inconsistent results for adequacy and post-editing effort. NMT undoubtedly represents a step forward for the MT field, but one that the community should be careful not to oversell.

Download Full-text

Gutenberg Goes Neural: Comparing Features of Dutch Human Translations with Raw Neural Machine Translation Outputs in a Corpus of English Literary Classics

Informatics ◽

10.3390/informatics7030032 ◽

2020 ◽

Vol 7 (3) ◽

pp. 32

Author(s):

Rebecca Webster ◽

Margot Fonteyne ◽

Arda Tezcan ◽

Lieve Macken ◽

Joke Daems

Keyword(s):

Machine Translation ◽

Syntactic Structure ◽

Literary Translation ◽

Neural Machine Translation ◽

Lexical Richness ◽

Source Sentence ◽

Literary Classics ◽

Local Cohesion ◽

Insight Into

Due to the growing success of neural machine translation (NMT), many have started to question its applicability within the field of literary translation. In order to grasp the possibilities of NMT, we studied the output of the neural machine system of Google Translate (GNMT) and DeepL when applied to four classic novels translated from English into Dutch. The quality of the NMT systems is discussed by focusing on manual annotations, and we also employed various metrics in order to get an insight into lexical richness, local cohesion, syntactic, and stylistic difference. Firstly, we discovered that a large proportion of the translated sentences contained errors. We also observed a lower level of lexical richness and local cohesion in the NMTs compared to the human translations. In addition, NMTs are more likely to follow the syntactic structure of a source sentence, whereas human translations can differ. Lastly, the human translations deviate from the machine translations in style.

Download Full-text

Recurrent Stacking of Layers for Compact Neural Machine Translation Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016292 ◽

2019 ◽

Vol 33 ◽

pp. 6292-6299 ◽

Cited By ~ 2

Author(s):

Raj Dabre ◽

Atsushi Fujita

Keyword(s):

Machine Translation ◽

Single Layer ◽

Training Data ◽

Neural Machine Translation ◽

Parallel Corpora ◽

Translation Quality ◽

Sequence Generation ◽

Sequence Modeling ◽

Back Translation

In encoder-decoder based sequence-to-sequence modeling, the most common practice is to stack a number of recurrent, convolutional, or feed-forward layers in the encoder and decoder. While the addition of each new layer improves the sequence generation quality, this also leads to a significant increase in the number of parameters. In this paper, we propose to share parameters across all layers thereby leading to a recurrently stacked sequence-to-sequence model. We report on an extensive case study on neural machine translation (NMT) using our proposed method, experimenting with a variety of datasets. We empirically show that the translation quality of a model that recurrently stacks a single-layer 6 times, despite its significantly fewer parameters, approaches that of a model that stacks 6 different layers. We also show how our method can benefit from a prevalent way for improving NMT, i.e., extending training data with pseudo-parallel corpora generated by back-translation. We then analyze the effects of recurrently stacked layers by visualizing the attentions of models that use recurrently stacked layers and models that do not. Finally, we explore the limits of parameter sharing where we share even the parameters between the encoder and decoder in addition to recurrent stacking of layers.

Download Full-text

Knowledge Graphs Enhanced Neural Machine Translation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/559 ◽

2020 ◽

Author(s):

Yang Zhao ◽

Jiajun Zhang ◽

Yu Zhou ◽

Chengqing Zong

Keyword(s):

Machine Translation ◽

Semantic Space ◽

Neural Machine Translation ◽

Translation Quality ◽

Structured Information ◽

Knowledge Graphs ◽

Japanese Translation

Knowledge graphs (KGs) store much structured information on various entities, many of which are not covered by the parallel sentence pairs of neural machine translation (NMT). To improve the translation quality of these entities, in this paper we propose a novel KGs enhanced NMT method. Specifically, we first induce the new translation results of these entities by transforming the source and target KGs into a unified semantic space. We then generate adequate pseudo parallel sentence pairs that contain these induced entity pairs. Finally, NMT model is jointly trained by the original and pseudo sentence pairs. The extensive experiments on Chinese-to-English and Englishto-Japanese translation tasks demonstrate that our method significantly outperforms the strong baseline models in translation quality, especially in handling the induced entities.

Download Full-text

MTIL2017: Machine Translation Using Recurrent Neural Network on Statistical Machine Translation

Journal of Intelligent Systems ◽

10.1515/jisys-2018-0016 ◽

2019 ◽

Vol 28 (3) ◽

pp. 447-453 ◽

Cited By ~ 5

Author(s):

Sainik Kumar Mahata ◽

Dipankar Das ◽

Sivaji Bandyopadhyay

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Language Model ◽

Target Language ◽

Data Sets ◽

Shared Task ◽

Automatic Translation ◽

External Data ◽

Statistical Mt

Abstract Machine translation (MT) is the automatic translation of the source language to its target language by a computer system. In the current paper, we propose an approach of using recurrent neural networks (RNNs) over traditional statistical MT (SMT). We compare the performance of the phrase table of SMT to the performance of the proposed RNN and in turn improve the quality of the MT output. This work has been done as a part of the shared task problem provided by the MTIL2017. We have constructed the traditional MT model using Moses toolkit and have additionally enriched the language model using external data sets. Thereafter, we have ranked the phrase tables using an RNN encoder-decoder module created originally as a part of the GroundHog project of LISA lab.

Download Full-text

Function words in statistical machine-translated Chinese and original Chinese: A study into the translationese of machine translation systems

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy050 ◽

2018 ◽

Vol 34 (4) ◽

pp. 752-771

Author(s):

Chen-li Kuo

Keyword(s):

Machine Translation ◽

Attribute Selection ◽

Close Attention ◽

Function Words ◽

Rule Based ◽

Source Language ◽

Statistical Mt ◽

Chinese Texts ◽

Translation Systems

Abstract Statistical approaches have become the mainstream in machine translation (MT), for their potential in producing less rigid and more natural translations than rule-based approaches. However, on closer examination, the uses of function words between statistical machine-translated Chinese and the original Chinese are different, and such differences may be associated with translationese as discussed in translation studies. This article examines the distribution of Chinese function words in a comparable corpus consisting of MTs and the original Chinese texts extracted from Wikipedia. An attribute selection technique is used to investigate which types of function words are significant in discriminating between statistical machine-translated Chinese and the original texts. The results show that statistical MT overuses the most frequent function words, even when alternatives exist. To improve the quality of the end product, developers of MT should pay close attention to modelling Chinese conjunctions and adverbial function words. The results also suggest that machine-translated Chinese shares some characteristics with human-translated texts, including normalization and being influenced by the source language; however, machine-translated texts do not exhibit other characteristics of translationese such as explicitation.

Download Full-text

Estimating Machine Translation Quality of Any Input Sentence

International Journal of Asian Language Processing ◽

10.1142/s2717554520500022 ◽

2020 ◽

Vol 30 (01) ◽

pp. 2050002

Author(s):

Taichi Aida ◽

Kazuhide Yamamoto

Keyword(s):

Machine Translation ◽

Evaluation Model ◽

Joint Probability ◽

Translation System ◽

Neural Machine Translation ◽

Translation Quality ◽

Machine Translation System ◽

Input Sentence ◽

The One

Current methods of neural machine translation may generate sentences with different levels of quality. Methods for automatically evaluating translation output from machine translation can be broadly classified into two types: a method that uses human post-edited translations for training an evaluation model, and a method that uses a reference translation that is the correct answer during evaluation. On the one hand, it is difficult to prepare post-edited translations because it is necessary to tag each word in comparison with the original translated sentences. On the other hand, users who actually employ the machine translation system do not have a correct reference translation. Therefore, we propose a method that trains the evaluation model without using human post-edited sentences and in the test set, estimates the quality of output sentences without using reference translations. We define some indices and predict the quality of translations with a regression model. For the quality of the translated sentences, we employ the BLEU score calculated from the number of word [Formula: see text]-gram matches between the translated sentence and the reference translation. After that, we compute the correlation between quality scores predicted by our method and BLEU actually computed from references. According to the experimental results, the correlation with BLEU is the highest when XGBoost uses all the indices. Moreover, looking at each index, we find that the sentence log-likelihood and the model uncertainty, which are based on the joint probability of generating the translated sentence, are important in BLEU estimation.

Download Full-text

Comparing the Quality of Neural Machine Translation and Professional Post-Editing

2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX) ◽

10.1109/qomex.2019.8743218 ◽

2019 ◽

Author(s):

Jennifer Vardaro ◽

Moritz Schaeffer ◽

Silvia Hansen-Schirra

Keyword(s):

Machine Translation ◽

Neural Machine Translation

Download Full-text

Unsupervised Quality Estimation for Neural Machine Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00330 ◽

2020 ◽

Vol 8 ◽

pp. 539-555

Author(s):

Marina Fomicheva ◽

Shuo Sun ◽

Lisa Yankovskaya ◽

Frédéric Blain ◽

Francisco Guzmán ◽

...

Keyword(s):

Machine Translation ◽

Real World ◽

State Of The Art ◽

Black Box ◽

Test Time ◽

Quality Estimation ◽

Neural Machine Translation ◽

Real World Applications ◽

Unsupervised Approach

Quality Estimation (QE) is an important component in making Machine Translation (MT) useful in real-world applications, as it is aimed to inform the user on the quality of the MT output at test time. Existing approaches require large amounts of expert annotated data, computation, and time for training. As an alternative, we devise an unsupervised approach to QE where no training or access to additional resources besides the MT system itself is required. Different from most of the current work that treats the MT system as a black box, we explore useful information that can be extracted from the MT system as a by-product of translation. By utilizing methods for uncertainty quantification, we achieve very good correlation with human judgments of quality, rivaling state-of-the-art supervised QE models. To evaluate our approach we collect the first dataset that enables work on both black-box and glass-box approaches to QE.

Download Full-text

An Improved English-to-Mizo Neural Machine Translation

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3445974 ◽

2021 ◽

Vol 20 (4) ◽

pp. 1-21

Author(s):

Candy Lalrempuii ◽

Badal Soni ◽

Partha Pakray

Keyword(s):

Machine Translation ◽

Model Performance ◽

Sentence Length ◽

Prediction Errors ◽

Indian Languages ◽

Neural Machine Translation ◽

Automatic Translation ◽

Translation Techniques ◽

Language Pair

Machine Translation is an effort to bridge language barriers and misinterpretations, making communication more convenient through the automatic translation of languages. The quality of translations produced by corpus-based approaches predominantly depends on the availability of a large parallel corpus. Although machine translation of many Indian languages has progressively gained attention, there is very limited research on machine translation and the challenges of using various machine translation techniques for a low-resource language such as Mizo. In this article, we have implemented and compared statistical-based approaches with modern neural-based approaches for the English–Mizo language pair. We have experimented with different tokenization methods, architectures, and configurations. The performance of translations predicted by the trained models has been evaluated using automatic and human evaluation measures. Furthermore, we have analyzed the prediction errors of the models and the quality of predictions based on variations in sentence length and compared the model performance with the existing baselines.

Download Full-text