scholarly journals AN ONLINE REPOSITORY OF PYTHON RESOURCES FOR TEACHING MACHINE TRANSLATION TO TRANSLATION STUDENTS

2021 ◽  
Vol 8 ◽  
pp. 4-30
Author(s):  
Ralph Krüger

This paper presents an online repository of Python resources aimed at teaching the technical dimension of machine translation to students of translation studies programmes. The Python resources provided in this repository are Jupyter notebooks. These are web-based computational environments in which students can run commented blocks of code in order to perform MT-related tasks such as exploring word embeddings, preparing MT training data, training open- source machine translation systems or calculating automatic MT quality metrics such as BLEU, METEOR, BERTScore or COMET. The notebooks are prepared in such a way that students can interact with them even if they have had little to no prior exposure to the Python programming language. The notebooks are provided as open-source resources under the MIT License and can be used and modified by translator training institutions which intend to make their students familiar with the more technical aspects of modern machine translation technology. Institutions who would like to contribute their own Python-based teaching resources to the repository are welcome to do so. Keywords: translation technology, machine translation, natural language processing, translation didactics, Jupyter notebooks, Python programming

Author(s):  
Ralph Krüger

This paper presents an online repository of Python resources aimed at teaching the technical dimension of machine translation to students of translation studies programmes. The Python resources provided in this repository are Jupyter notebooks. These are web-based computational environments in which students can run commented blocks of code in order to perform MT-related tasks such as exploring word embeddings, preparing MT training data, training open-source machine translation systems or calculating automatic MT quality metrics such as BLEU, METEOR, BERTScore or COMET. The notebooks are prepared in such a way that students can interact with them even if they have had little to no prior exposure to the Python programming language. The notebooks are provided as open-source resources under the MIT License and can be used and modified by translator training institutions which intend to make their students familiar with the more technical aspects of modern machine translation technology. Institutions who would like to contribute their own Python-based teaching resources to the repository are welcome to do so. Keywords: translation technology, machine translation, natural language processing, translation didactics, Jupyter notebooks, Python programming


Author(s):  
Oleg Kuzmin ◽  

The modern world is moving towards global digitalization and accelerated software development with a clear tendency to replace human resources by digital services or programs that imitate the doing of similar tasks. There is no doubt that, long term, the use of such technologies has economic benefits for enterprises and companies. Despite this, however, the quality of the final result is often less than satisfactory, and machine translation systems are no exception, as editing of texts translated by using online translation services is still a demanding task. At the moment, producing high-quality translations using only machine translation systems remains impossible for multiple reasons, the main of which lies in the mysteries of natural language: the existence of sublanguages, abstract words, polysemy, etc. Since improving the quality of machine translation systems is one of the priorities of natural language processing (NLP), this article describes current trends in developing modern machine translation systems as well as the latest advances in the field of natural language processing (NLP) and gives suggestions about software innovations that would minimize the number of errors. Even though recent years have seen a significant breakthrough in the speed of information analysis, in all probability, this will not be a priority issue in the future. The main criteria for evaluating the quality of translated texts will be the semantic coherence of these texts and the semantic accuracy of the lexical material used. To improve machine translation systems, we should introduce elements of data differentiation and personalization of information for individual users and their tasks, employing the method of thematic modeling for determining the subject area of a particular text. Currently, there are algorithms based on deep learning that are able to perform these tasks. However, the process of identifying unique lexical units requires a more detailed linguistic description of their semantic features. The parsing methods that will be used in analyzing texts should also provide for the possibility of clustering by sublanguages. Creating automated electronic dictionaries for specific fields of professional knowledge will help improve the quality of machine translation systems. Notably, to date there have been no successful projects of creating dictionaries for machine translation systems for specific sub-languages. Thus, there is a need to develop such dictionaries and to integrate them into existing online translation systems.


2014 ◽  
Vol 102 (1) ◽  
pp. 69-80 ◽  
Author(s):  
Torregrosa Daniel ◽  
Forcada Mikel L. ◽  
Pérez-Ortiz Juan Antonio

Abstract We present a web-based open-source tool for interactive translation prediction (ITP) and describe its underlying architecture. ITP systems assist human translators by making context-based computer-generated suggestions as they type. Most of the ITP systems in literature are strongly coupled with a statistical machine translation system that is conveniently adapted to provide the suggestions. Our system, however, follows a resource-agnostic approach and suggestions are obtained from any unmodified black-box bilingual resource. This paper reviews our ITP method and describes the architecture of Forecat, a web tool, partly based on the recent technology of web components, that eases the use of our ITP approach in any web application requiring this kind of translation assistance. We also evaluate the performance of our method when using an unmodified Moses-based statistical machine translation system as the bilingual resource.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Michael Adjeisah ◽  
Guohua Liu ◽  
Douglas Omwenga Nyabuga ◽  
Richard Nuetey Nortey ◽  
Jinling Song

Scaling natural language processing (NLP) to low-resourced languages to improve machine translation (MT) performance remains enigmatic. This research contributes to the domain on a low-resource English-Twi translation based on filtered synthetic-parallel corpora. It is often perplexing to learn and understand what a good-quality corpus looks like in low-resource conditions, mainly where the target corpus is the only sample text of the parallel language. To improve the MT performance in such low-resource language pairs, we propose to expand the training data by injecting synthetic-parallel corpus obtained by translating a monolingual corpus from the target language based on bootstrapping with different parameter settings. Furthermore, we performed unsupervised measurements on each sentence pair engaging squared Mahalanobis distances, a filtering technique that predicts sentence parallelism. Additionally, we extensively use three different sentence-level similarity metrics after round-trip translation. Experimental results on a diverse amount of available parallel corpus demonstrate that injecting pseudoparallel corpus and extensive filtering with sentence-level similarity metrics significantly improves the original out-of-the-box MT systems for low-resource language pairs. Compared with existing improvements on the same original framework under the same structure, our approach exhibits tremendous developments in BLEU and TER scores.


2020 ◽  
Vol 184 ◽  
pp. 01061
Author(s):  
Anusha Anugu ◽  
Gajula Ramesh

Machine translation has gradually developed in past 1940’s.It has gained more and more attention because of effective and efficient nature. As it makes the translation automatically without the involvement of human efforts. The distinct models of machine translation along with “Neural Machine Translation (NMT)” is summarized in this paper. Researchers have previously done lots of work on Machine Translation techniques and their evaluation techniques. Thus, we want to demonstrate an analysis of the existing techniques for machine translation including Neural Machine translation, their differences and the translation tools associated with them. Now-a-days the combination of two Machine Translation systems has the full advantage of using features from both the systems which attracts in the domain of natural language processing. So, the paper also includes the literature survey of the Hybrid Machine Translation (HMT).


2017 ◽  
Vol 109 (1) ◽  
pp. 39-50 ◽  
Author(s):  
Matīss Rikters ◽  
Mark Fishel ◽  
Ondřej Bojar

Abstract In this article, we describe a tool for visualizing the output and attention weights of neural machine translation systems and for estimating confidence about the output based on the attention. Our aim is to help researchers and developers better understand the behaviour of their NMT systems without the need for any reference translations. Our tool includes command line and web-based interfaces that allow to systematically evaluate translation outputs from various engines and experiments. We also present a web demo of our tool with examples of good and bad translations: http://ej.uz/nmt-attention.


Literator ◽  
2021 ◽  
Vol 42 (1) ◽  
Author(s):  
Nomsa J. Skosana ◽  
Respect Mlambo

The scarcity of adequate resources for South African languages poses a huge challenge for their functional development in specialised fields such as science and technology. The study examines the Autshumato Machine Translation (MT) Web Service, created by the Centre for Text Technology at the North-West University. This software supports both formal and informal translations as a machine-aided human translation tool. We investigate the system in terms of its advantages and limitations and suggest possible solutions for South African languages. The results show that the system is essential as it offers high-speed translation and operates as an open-source platform. It also provides multiple translations from sentences, documents and web pages. Some South African languages were included whilst others were excluded and we find this to be a limitation of the system. We also find that the system was trained with a limited amount of data, and this has an adverse effect on the quality of the output. The study suggests that adding specialised parallel corpora from various contemporary fields for all official languages and involving language experts in the pre-editing of training data can be a major step towards improving the quality of the system’s output. The study also outlines that developers should consider integrating the system with other natural language processing applications. Finally, the initiatives discussed in this study will help to improve this MT system to be a more effective translation tool for all the official languages of South Africa.


2016 ◽  
Vol 22 (4) ◽  
pp. 549-573 ◽  
Author(s):  
SANJIKA HEWAVITHARANA ◽  
STEPHAN VOGEL

AbstractMining parallel data from comparable corpora is a promising approach for overcoming the data sparseness in statistical machine translation and other natural language processing applications. In this paper, we address the task of detecting parallel phrase pairs embedded in comparable sentence pairs. We present a novel phrase alignment approach that is designed to only align parallel sections bypassing non-parallel sections of the sentence. We compare the proposed approach with two other alignment methods: (1) the standard phrase extraction algorithm, which relies on the Viterbi path of the word alignment, (2) a binary classifier to detect parallel phrase pairs when presented with a large collection of phrase pair candidates. We evaluate the accuracy of these approaches using a manually aligned data set, and show that the proposed approach outperforms the other two approaches. Finally, we demonstrate the effectiveness of the extracted phrase pairs by using them in Arabic–English and Urdu–English translation systems, which resulted in improvements upto 1.2 Bleu over the baseline. The main contributions of this paper are two-fold: (1) novel phrase alignment algorithms to extract parallel phrase pairs from comparable sentences, (2) evaluating the utility of the extracted phrases by using them directly in the MT decoder.


2016 ◽  
Vol 106 (1) ◽  
pp. 159-168 ◽  
Author(s):  
Julian Hitschler ◽  
Laura Jehl ◽  
Sariya Karimova ◽  
Mayumi Ohta ◽  
Benjamin Körner ◽  
...  

Abstract We present Otedama, a fast, open-source tool for rule-based syntactic pre-ordering, a well established technique in statistical machine translation. Otedama implements both a learner for pre-ordering rules, as well as a component for applying these rules to parsed sentences. Our system is compatible with several external parsers and capable of accommodating many source and all target languages in any machine translation paradigm which uses parallel training data. We demonstrate improvements on a patent translation task over a state-of-the-art English-Japanese hierarchical phrase-based machine translation system. We compare Otedama with an existing syntax-based pre-ordering system, showing comparable translation performance at a runtime speedup of a factor of 4.5-10.


Sign in / Sign up

Export Citation Format

Share Document