scholarly journals Experimental study of the impact of using a neural machine translation engine on the quality of translation of texts in the field of pharmacognosy

The article is devoted to the study of the impact of using the neural machine translation system Google Translate on the quality of translation of texts in the field of pharmacognosy. At the present stage, the work of a translator is impossible to imagine without the use of information and communication technologies, an important place among which is attributed to machine translation. It is considered that neural machine translation systems perform translation at a fairly high level, so that its use by a human translator can have a positive impact. That is why the aim of the study was to conduct an experiment to determine the impact of using a neural machine translation system on the quality of translation of texts in the field of pharmacognosy in terms of the number of errors and correctness of translating terminology. The article formulates a research hypothesis, describes the text chosen to conduct the study and the neural machine translation system, which was selected for this purpose, discloses the procedure for estimating the number of errors in translations and calculating the percentage of correctness of translating terminology, provides quantitative experimental data, and the results are illustrated in tables and drawings. The experimental study was conducted in the first semester of the 2020/2021 academic year (September) on the basis of an excerpt from a text in the field of pharmacognosy, which was translated by the neural machine translation system Google Translate and a translation student of the bachelor’s level. Both translations were checked in terms of quantity and quality (types) of errors, as well as in terms of correctness of translating domain-specific terminology. The results refuted our hypothesis, as the translation performed by the neural machine translation system Google Translate was worse, both in terms of the number of errors and the percentage of correctness of translating terminology as compared to the results demonstrated by the student.

2020 ◽  
Vol 30 (01) ◽  
pp. 2050002
Author(s):  
Taichi Aida ◽  
Kazuhide Yamamoto

Current methods of neural machine translation may generate sentences with different levels of quality. Methods for automatically evaluating translation output from machine translation can be broadly classified into two types: a method that uses human post-edited translations for training an evaluation model, and a method that uses a reference translation that is the correct answer during evaluation. On the one hand, it is difficult to prepare post-edited translations because it is necessary to tag each word in comparison with the original translated sentences. On the other hand, users who actually employ the machine translation system do not have a correct reference translation. Therefore, we propose a method that trains the evaluation model without using human post-edited sentences and in the test set, estimates the quality of output sentences without using reference translations. We define some indices and predict the quality of translations with a regression model. For the quality of the translated sentences, we employ the BLEU score calculated from the number of word [Formula: see text]-gram matches between the translated sentence and the reference translation. After that, we compute the correlation between quality scores predicted by our method and BLEU actually computed from references. According to the experimental results, the correlation with BLEU is the highest when XGBoost uses all the indices. Moreover, looking at each index, we find that the sentence log-likelihood and the model uncertainty, which are based on the joint probability of generating the translated sentence, are important in BLEU estimation.


2019 ◽  
Vol 252 ◽  
pp. 03006
Author(s):  
Ualsher Tukeyev ◽  
Aidana Karibayeva ◽  
Balzhan Abduali

The lack of big parallel data is present for the Kazakh language. This problem seriously impairs the quality of machine translation from and into Kazakh. This article considers the neural machine translation of the Kazakh language on the basis of synthetic corpora. The Kazakh language belongs to the Turkic languages, which are characterised by rich morphology. Neural machine translation of natural languages requires large training data. The article will show the model for the creation of synthetic corpora, namely the generation of sentences based on complete suffixes for the Kazakh language. The novelty of this approach of the synthetic corpora generation for the Kazakh language is the generation of sentences on the basis of the complete system of suffixes of the Kazakh language. By using generated synthetic corpora we are improving the translation quality in neural machine translation of Kazakh-English and Kazakh-Russian pairs.


Author(s):  
A.V. Kozina ◽  
Yu.S. Belov

Automatically assessing the quality of machine translation is an important yet challenging task for machine translation research. Translation quality assessment is understood as predicting translation quality without reference to the source text. Translation quality depends on the specific machine translation system and often requires post-editing. Manual editing is a long and expensive process. Since the need to quickly determine the quality of translation increases, its automation is required. In this paper, we propose a quality assessment method based on ensemble supervised machine learning methods. The bilingual corpus WMT 2019 for the EnglishRussian language pair was used as data. The text data volume is 17089 sentences, 85% of the data was used for training, and 15% for testing the model. Linguistic functions extracted from the text in the source and target languages were used as features for training the system, since it is these characteristics that can most accurately characterize the translation in terms of quality. The following tools were used for feature extraction: a free language modeling tool based on SRILM and a Stanford POS Tagger parts of speech tagger. Before training the system, the text was preprocessed. The model was trained using three regression methods: Bagging, Extra Tree, and Random Forest. The algorithms were implemented in the Python programming language using the Scikit learn library. The parameters of the random forest method have been optimized using a grid search. The performance of the model was assessed by the mean absolute error MAE and the root mean square error RMSE, as well as by the Pearsоn coefficient, which determines the correlation with human judgment. Testing was carried out using three machine translation systems: Google and Bing neural systems, Mouses statistical machine translation systems based on phrases and based on syntax. Based on the results of the work, the method of additional trees showed itself best. In addition, for all categories of indicators under consideration, the best results are achieved using the Google machine translation system. The developed method showed good results close to human judgment. The system can be used for further research in the task of assessing the quality of translation.


2016 ◽  
Vol 5 (4) ◽  
pp. 51-66 ◽  
Author(s):  
Krzysztof Wolk ◽  
Krzysztof P. Marasek

The quality of machine translation is rapidly evolving. Today one can find several machine translation systems on the web that provide reasonable translations, although the systems are not perfect. In some specific domains, the quality may decrease. A recently proposed approach to this domain is neural machine translation. It aims at building a jointly-tuned single neural network that maximizes translation performance, a very different approach from traditional statistical machine translation. Recently proposed neural machine translation models often belong to the encoder-decoder family in which a source sentence is encoded into a fixed length vector that is, in turn, decoded to generate a translation. The present research examines the effects of different training methods on a Polish-English Machine Translation system used for medical data. The European Medicines Agency parallel text corpus was used as the basis for training of neural and statistical network-based translation systems. A comparison and implementation of a medical translator is the main focus of our experiments.


2019 ◽  
Author(s):  
Xinze Guo ◽  
Chang Liu ◽  
Xiaolong Li ◽  
Yiran Wang ◽  
Guoliang Li ◽  
...  

2019 ◽  
Author(s):  
Miguel Domingo ◽  
Mercedes García-Martínez ◽  
Amando Estela Pastor ◽  
Laurent Bié ◽  
Alexander Helle ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document