scholarly journals Lost in Machine Translation: Contextual Linguistic Uncertainty

Author(s):  
Anton Sukhoverkhov ◽  
Dorothy DeWitt ◽  
Ioannis Manasidi ◽  
Keiko Nitta ◽  
Vladimir Krstić

The article considers the issues related to the semantic, grammatical, stylistic and technical difficulties currently present in machine translation and compares its four main approaches: Rule-based (RBMT), Corpora-based (CBMT), Neural (NMT), and Hybrid (HMT). It also examines some "open systems", which allow the correction or augmentation of content by the users themselves ("crowdsourced translation"). The authors of the article, native speakers presenting different countries (Russia, Greece, Malaysia, Japan and Serbia), tested the translation quality of the most representative phrases from the English, Russian, Greek, Malay and Japanese languages by using different machine translation systems: PROMT (RBMT), Yandex. Translate (HMT) and Google Translate (NMT). The test results presented by the authors show low "comprehension level" of semantic, linguistic and pragmatic contexts of translated texts, mistranslations of rare and culture-specific words,unnecessary translation of proper names, as well as a low rate of idiomatic phrase and metaphor recognition. It is argued that the development of machine translation requires incorporation of literal, conceptual, and content-and-contextual forms of meaning processing into text translation expansion of metaphor corpora and contextological dictionaries, and implementation of different types and styles of translation, which take into account gender peculiarities, specific dialects and idiolects of users. The problem of untranslatability ('linguistic relativity') of the concepts, unique to a particular culture, has been reviewed from the perspective of machine translation. It has also been shown, that the translation of booming Internet slang, where national languages merge with English, is almost impossible without human correction.

Author(s):  
A.V. Kozina ◽  
Yu.S. Belov

Automatically assessing the quality of machine translation is an important yet challenging task for machine translation research. Translation quality assessment is understood as predicting translation quality without reference to the source text. Translation quality depends on the specific machine translation system and often requires post-editing. Manual editing is a long and expensive process. Since the need to quickly determine the quality of translation increases, its automation is required. In this paper, we propose a quality assessment method based on ensemble supervised machine learning methods. The bilingual corpus WMT 2019 for the EnglishRussian language pair was used as data. The text data volume is 17089 sentences, 85% of the data was used for training, and 15% for testing the model. Linguistic functions extracted from the text in the source and target languages were used as features for training the system, since it is these characteristics that can most accurately characterize the translation in terms of quality. The following tools were used for feature extraction: a free language modeling tool based on SRILM and a Stanford POS Tagger parts of speech tagger. Before training the system, the text was preprocessed. The model was trained using three regression methods: Bagging, Extra Tree, and Random Forest. The algorithms were implemented in the Python programming language using the Scikit learn library. The parameters of the random forest method have been optimized using a grid search. The performance of the model was assessed by the mean absolute error MAE and the root mean square error RMSE, as well as by the Pearsоn coefficient, which determines the correlation with human judgment. Testing was carried out using three machine translation systems: Google and Bing neural systems, Mouses statistical machine translation systems based on phrases and based on syntax. Based on the results of the work, the method of additional trees showed itself best. In addition, for all categories of indicators under consideration, the best results are achieved using the Google machine translation system. The developed method showed good results close to human judgment. The system can be used for further research in the task of assessing the quality of translation.


2021 ◽  
Vol 284 ◽  
pp. 08001
Author(s):  
Ilya Ulitkin ◽  
Irina Filippova ◽  
Natalia Ivanova ◽  
Alexey Poroykov

We report on various approaches to automatic evaluation of machine translation quality and describe three widely used methods. These methods, i.e. methods based on string matching and n-gram models, make it possible to compare the quality of machine translation to reference translation. We employ modern metrics for automatic evaluation of machine translation quality such as BLEU, F-measure, and TER to compare translations made by Google and PROMT neural machine translation systems with translations obtained 5 years ago, when statistical machine translation and rule-based machine translation algorithms were employed by Google and PROMT, respectively, as the main translation algorithms [6]. The evaluation of the translation quality of candidate texts generated by Google and PROMT with reference translation using an automatic translation evaluation program reveal significant qualitative changes as compared with the results obtained 5 years ago, which indicate a dramatic improvement in the work of the above-mentioned online translation systems. Ways to improve the quality of machine translation are discussed. It is shown that modern systems of automatic evaluation of translation quality allow errors made by machine translation systems to be identified and systematized, which will enable the improvement of the quality of translation by these systems in the future.


2002 ◽  
Vol 01 (02) ◽  
pp. 349-366 ◽  
Author(s):  
FUJI REN ◽  
HONGCHI SHI

One of the most difficult problems in dialogue machine translation is to correctly translate irregular expressions in natural conversations such as ungrammatical, incomplete, or ill-formed sentences. However, most existing machine translation systems reject utterances including irregular expressions. In this paper, we present a dialogue machine translation approach based on a cooperative distributed natural language processing model to attack the complex machine translation problem. In this approach, different types of translation processors are used in the analysis of the original language and the generation of the target language. The idea of combining multiple machine translation engines provides a new effective way to increase the success rate and quality of dialogue machine translation. A dialogue machine translation using multiple processors (DMTMP) system has been built using the following machine translation processors: (i) Robust Parser based Translation Processor, (ii) Example based Translation Processor, (iii) Family Modal based Translation Processor, and (iv) Super Function based Translation Processor. DMTMP is used in a practical machine translation environment called SWKJC. Experiments show that the approach presented in this paper is effective in implementation of robust dialogue machine translation systems.


Author(s):  
Yana Fedorko ◽  
Tetiana Yablonskaya

The article is focused on peculiarities of English and Chinese political discourse translation into Ukrainian. The advantages and disadvantages of machine translation are described on the basis of linguistic analysis of online Google Translate and M-Translate systems. The reasons of errors in translation are identified and the need of post-correction to improve the quality of translation is wanted. Key words: political discourse, automatic translation, online machine translation systems, machine translation quality assessment.


Author(s):  
Raj Dabre ◽  
Atsushi Fujita

In encoder-decoder based sequence-to-sequence modeling, the most common practice is to stack a number of recurrent, convolutional, or feed-forward layers in the encoder and decoder. While the addition of each new layer improves the sequence generation quality, this also leads to a significant increase in the number of parameters. In this paper, we propose to share parameters across all layers thereby leading to a recurrently stacked sequence-to-sequence model. We report on an extensive case study on neural machine translation (NMT) using our proposed method, experimenting with a variety of datasets. We empirically show that the translation quality of a model that recurrently stacks a single-layer 6 times, despite its significantly fewer parameters, approaches that of a model that stacks 6 different layers. We also show how our method can benefit from a prevalent way for improving NMT, i.e., extending training data with pseudo-parallel corpora generated by back-translation. We then analyze the effects of recurrently stacked layers by visualizing the attentions of models that use recurrently stacked layers and models that do not. Finally, we explore the limits of parameter sharing where we share even the parameters between the encoder and decoder in addition to recurrent stacking of layers.


2018 ◽  
Vol 8 (6) ◽  
pp. 3512-3514
Author(s):  
D. Chopra ◽  
N. Joshi ◽  
I. Mathur

Machine translation (MT) has been a topic of great research during the last sixty years, but, improving its quality is still considered an open problem. In the current paper, we will discuss improvements in MT quality by the use of the ensemble approach. We performed MT from English to Hindi using 6 MT different engines described in this paper. We found that the quality of MT is improved by using a combination of various approaches as compared to the simple baseline approach for performing MT from source to target text.


Author(s):  
Yang Zhao ◽  
Jiajun Zhang ◽  
Yu Zhou ◽  
Chengqing Zong

Knowledge graphs (KGs) store much structured information on various entities, many of which are not covered by the parallel sentence pairs of neural machine translation (NMT). To improve the translation quality of these entities, in this paper we propose a novel KGs enhanced NMT method. Specifically, we first induce the new translation results of these entities by transforming the source and target KGs into a unified semantic space. We then generate adequate pseudo parallel sentence pairs that contain these induced entity pairs. Finally, NMT model is jointly trained by the original and pseudo sentence pairs. The extensive experiments on Chinese-to-English and Englishto-Japanese translation tasks demonstrate that our method significantly outperforms the strong baseline models in translation quality, especially in handling the induced entities.


2018 ◽  
Vol 34 (4) ◽  
pp. 752-771
Author(s):  
Chen-li Kuo

Abstract Statistical approaches have become the mainstream in machine translation (MT), for their potential in producing less rigid and more natural translations than rule-based approaches. However, on closer examination, the uses of function words between statistical machine-translated Chinese and the original Chinese are different, and such differences may be associated with translationese as discussed in translation studies. This article examines the distribution of Chinese function words in a comparable corpus consisting of MTs and the original Chinese texts extracted from Wikipedia. An attribute selection technique is used to investigate which types of function words are significant in discriminating between statistical machine-translated Chinese and the original texts. The results show that statistical MT overuses the most frequent function words, even when alternatives exist. To improve the quality of the end product, developers of MT should pay close attention to modelling Chinese conjunctions and adverbial function words. The results also suggest that machine-translated Chinese shares some characteristics with human-translated texts, including normalization and being influenced by the source language; however, machine-translated texts do not exhibit other characteristics of translationese such as explicitation.


2020 ◽  
Vol 30 (01) ◽  
pp. 2050002
Author(s):  
Taichi Aida ◽  
Kazuhide Yamamoto

Current methods of neural machine translation may generate sentences with different levels of quality. Methods for automatically evaluating translation output from machine translation can be broadly classified into two types: a method that uses human post-edited translations for training an evaluation model, and a method that uses a reference translation that is the correct answer during evaluation. On the one hand, it is difficult to prepare post-edited translations because it is necessary to tag each word in comparison with the original translated sentences. On the other hand, users who actually employ the machine translation system do not have a correct reference translation. Therefore, we propose a method that trains the evaluation model without using human post-edited sentences and in the test set, estimates the quality of output sentences without using reference translations. We define some indices and predict the quality of translations with a regression model. For the quality of the translated sentences, we employ the BLEU score calculated from the number of word [Formula: see text]-gram matches between the translated sentence and the reference translation. After that, we compute the correlation between quality scores predicted by our method and BLEU actually computed from references. According to the experimental results, the correlation with BLEU is the highest when XGBoost uses all the indices. Moreover, looking at each index, we find that the sentence log-likelihood and the model uncertainty, which are based on the joint probability of generating the translated sentence, are important in BLEU estimation.


2018 ◽  
Vol 6 ◽  
pp. 145-157 ◽  
Author(s):  
Zaixiang Zheng ◽  
Hao Zhou ◽  
Shujian Huang ◽  
Lili Mou ◽  
Xinyu Dai ◽  
...  

Existing neural machine translation systems do not explicitly model what has been translated and what has not during the decoding phase. To address this problem, we propose a novel mechanism that separates the source information into two parts: translated Past contents and untranslated Future contents, which are modeled by two additional recurrent layers. The Past and Future contents are fed to both the attention model and the decoder states, which provides Neural Machine Translation (NMT) systems with the knowledge of translated and untranslated contents. Experimental results show that the proposed approach significantly improves the performance in Chinese-English, German-English, and English-German translation tasks. Specifically, the proposed model outperforms the conventional coverage model in terms of both the translation quality and the alignment error rate.


Sign in / Sign up

Export Citation Format

Share Document