Function words in statistical machine-translated Chinese and original Chinese: A study into the translationese of machine translation systems

2018 ◽  
Vol 34 (4) ◽  
pp. 752-771
Author(s):  
Chen-li Kuo

Abstract Statistical approaches have become the mainstream in machine translation (MT), for their potential in producing less rigid and more natural translations than rule-based approaches. However, on closer examination, the uses of function words between statistical machine-translated Chinese and the original Chinese are different, and such differences may be associated with translationese as discussed in translation studies. This article examines the distribution of Chinese function words in a comparable corpus consisting of MTs and the original Chinese texts extracted from Wikipedia. An attribute selection technique is used to investigate which types of function words are significant in discriminating between statistical machine-translated Chinese and the original texts. The results show that statistical MT overuses the most frequent function words, even when alternatives exist. To improve the quality of the end product, developers of MT should pay close attention to modelling Chinese conjunctions and adverbial function words. The results also suggest that machine-translated Chinese shares some characteristics with human-translated texts, including normalization and being influenced by the source language; however, machine-translated texts do not exhibit other characteristics of translationese such as explicitation.

2018 ◽  
Vol 7 (4.36) ◽  
pp. 542
Author(s):  
T. K. Bijimol ◽  
John T. Abraham

Malayalam is one of the Indian languages and it is a highly agglutinative and morphologically rich. These linguistic specialties of Malayalam determine the quality of all kinds of Malayalam machine translation systems. Causative sentences translations in Malayalam to English and English to Malayalam were analysed using Google Translation System and identified that causative sentence translation in these languages is not up to the mark. This paper discusses the concept and method of causative sentence handling in Malayalam to English and English to Malayalam Machine Translation Systems. A Rule-based system is proposed here to handle the causative sentence in both languages.  


2019 ◽  
Vol 28 (3) ◽  
pp. 447-453 ◽  
Author(s):  
Sainik Kumar Mahata ◽  
Dipankar Das ◽  
Sivaji Bandyopadhyay

Abstract Machine translation (MT) is the automatic translation of the source language to its target language by a computer system. In the current paper, we propose an approach of using recurrent neural networks (RNNs) over traditional statistical MT (SMT). We compare the performance of the phrase table of SMT to the performance of the proposed RNN and in turn improve the quality of the MT output. This work has been done as a part of the shared task problem provided by the MTIL2017. We have constructed the traditional MT model using Moses toolkit and have additionally enriched the language model using external data sets. Thereafter, we have ranked the phrase tables using an RNN encoder-decoder module created originally as a part of the GroundHog project of LISA lab.


Author(s):  
A.V. Kozina ◽  
Yu.S. Belov

Automatically assessing the quality of machine translation is an important yet challenging task for machine translation research. Translation quality assessment is understood as predicting translation quality without reference to the source text. Translation quality depends on the specific machine translation system and often requires post-editing. Manual editing is a long and expensive process. Since the need to quickly determine the quality of translation increases, its automation is required. In this paper, we propose a quality assessment method based on ensemble supervised machine learning methods. The bilingual corpus WMT 2019 for the EnglishRussian language pair was used as data. The text data volume is 17089 sentences, 85% of the data was used for training, and 15% for testing the model. Linguistic functions extracted from the text in the source and target languages were used as features for training the system, since it is these characteristics that can most accurately characterize the translation in terms of quality. The following tools were used for feature extraction: a free language modeling tool based on SRILM and a Stanford POS Tagger parts of speech tagger. Before training the system, the text was preprocessed. The model was trained using three regression methods: Bagging, Extra Tree, and Random Forest. The algorithms were implemented in the Python programming language using the Scikit learn library. The parameters of the random forest method have been optimized using a grid search. The performance of the model was assessed by the mean absolute error MAE and the root mean square error RMSE, as well as by the Pearsоn coefficient, which determines the correlation with human judgment. Testing was carried out using three machine translation systems: Google and Bing neural systems, Mouses statistical machine translation systems based on phrases and based on syntax. Based on the results of the work, the method of additional trees showed itself best. In addition, for all categories of indicators under consideration, the best results are achieved using the Google machine translation system. The developed method showed good results close to human judgment. The system can be used for further research in the task of assessing the quality of translation.


2020 ◽  
Author(s):  
Adrián Fuentes-Luque ◽  
Alexandra Santamaría Urbieta

Computer-assisted translation tools are increasingly supplemented by the presence of machine translation (MT) in different areas and working environments, from technical translation to translation in international organizations. MT is also present in the translation of tourism texts, from brochures to food menus, websites and tourist guides. Its need or suitability for use is the subject of growing debate. This article presents a comparative analysis of tourist guides translated by a human translator and three machine translation systems. The aims are to determine a first approach to the level of quality of machine translation in tourist texts and to establish whether some tourist texts can be translated using machine translation alone or whether human participation is necessary, either for the complete translation of the text or only for post-editing tasks.


2012 ◽  
Vol 5 ◽  
Author(s):  
Manny Rayner ◽  
Pierrette Bouillon ◽  
Paula Estrella ◽  
Yukie Nakao ◽  
Gwen Christian

We describe a series of experiments in which we start with English to French and English to Japanese versions of a rule-based speech translation system for a medical domain, and bootstrap corresponding statistical systems. Comparative evaluation reveals that the statistical systems are still slightly inferior to the rule-based ones, despite the fact that considerable effort has been invested in tuning both the recognition and translation components; however, a hybrid system is able to deliver a small but significant improvement in performance. In conclusion, we suggest that the hybrid architecture we describe potentially allows construction of limited-domain speech translation systems which combine substantial source-language coverage with high-precision translation.


2017 ◽  
Vol 108 (1) ◽  
pp. 109-120 ◽  
Author(s):  
Sheila Castilho ◽  
Joss Moorkens ◽  
Federico Gaspari ◽  
Iacer Calixto ◽  
John Tinsley ◽  
...  

Abstract This paper discusses neural machine translation (NMT), a new paradigm in the MT field, comparing the quality of NMT systems with statistical MT by describing three studies using automatic and human evaluation methods. Automatic evaluation results presented for NMT are very promising, however human evaluations show mixed results. We report increases in fluency but inconsistent results for adequacy and post-editing effort. NMT undoubtedly represents a step forward for the MT field, but one that the community should be careful not to oversell.


Author(s):  
Hidayatul Khoiriyah

<p style="text-align: justify;"><em>The development of technology has a big impact on human life. The existence of a machine translation is the result of technological advancements that aim to facilitate humans in translating one language into another. The focus of this research is to examine the quality of the google translate machine in terms of vocabulary accuracy, clarity, and reasonableness of meaning. Data of mufradāt taken from several Arabic translation dictionaries, while the text is taken from the phenomenal work of Dr. Aidh Qorni in the book Lā Tahzan. The method used in this research is the translation critic method. </em></p><p style="text-align: justify;"><em>The results showed that in terms of the accuracy of vocabulary and terms, Google Translate has a good translation quality. In terms of clarity and reasonableness of meaning, google translate has not been able to transmit ideas from the source language well into the target language. Furthermore, in grammatical, the results of the google translate translation do not have a grammatical arrangement, the results of the google translate translation do not have a good grammatical structure and are by following the rules that applied in the target Indonesian language.</em></p><p style="text-align: justify;"><em>From the data, it shows that google translate should not be used as a basis for translating an Arabic text into Indonesian, especially in translating verses of the Qur'</em><em>ā</em><em>n and Hadīts. A beginner translator should prefer a dictionary rather than using google translate to effort and improve the ability to translate.</em></p><p style="text-align: justify;"><strong><em>Key Words: Translation, Google Translate, Arabic</em></strong></p>


2021 ◽  
Vol 284 ◽  
pp. 08001
Author(s):  
Ilya Ulitkin ◽  
Irina Filippova ◽  
Natalia Ivanova ◽  
Alexey Poroykov

We report on various approaches to automatic evaluation of machine translation quality and describe three widely used methods. These methods, i.e. methods based on string matching and n-gram models, make it possible to compare the quality of machine translation to reference translation. We employ modern metrics for automatic evaluation of machine translation quality such as BLEU, F-measure, and TER to compare translations made by Google and PROMT neural machine translation systems with translations obtained 5 years ago, when statistical machine translation and rule-based machine translation algorithms were employed by Google and PROMT, respectively, as the main translation algorithms [6]. The evaluation of the translation quality of candidate texts generated by Google and PROMT with reference translation using an automatic translation evaluation program reveal significant qualitative changes as compared with the results obtained 5 years ago, which indicate a dramatic improvement in the work of the above-mentioned online translation systems. Ways to improve the quality of machine translation are discussed. It is shown that modern systems of automatic evaluation of translation quality allow errors made by machine translation systems to be identified and systematized, which will enable the improvement of the quality of translation by these systems in the future.


2021 ◽  
Vol 7 (Extra-C) ◽  
pp. 714-721
Author(s):  
Zulfiya Akhatovna Usmanova ◽  
Ekaterina Nikolayevna Zudilova ◽  
Pavel Alekseevich Arkatov ◽  
Nataliaya Grigorievna Vitkovskaya ◽  
Ekaterina Vladimirovna Kravets

The main specificity of the modern translation market is the translation of large volumes of technical texts and business documents in the shortest time possible. The purpose of the study is to conduct an experiment on the impact of machine translation systems (in terms of using term bases) on the efficiency of future translators. The study provides a literature review on the problem under study and presents the advantages of computer-assisted translation tools in translation practice. Based on the experimental study, the analysis of the influence of computer-assisted translation tools on the quality of written translations of student translators was carried out.


Sign in / Sign up

Export Citation Format

Share Document