scholarly journals Integration of a Multilingual Preordering Component into a Commercial SMT Platform

2017 ◽  
Vol 108 (1) ◽  
pp. 61-72
Author(s):  
Anita Ramm ◽  
Riccardo Superbo ◽  
Dimitar Shterionov ◽  
Tony O’Dowd ◽  
Alexander Fraser

AbstractWe present a multilingual preordering component tailored for a commercial Statistical Machine translation platform. In commercial settings, issues such as processing speed as well as the ability to adapt models to the customers’ needs play a significant role and have a big impact on the choice of approaches that are added to the custom pipeline to deal with specific problems such as long-range reorderings.We developed a fast and customisable preordering component, also available as an open-source tool, which comes along with a generic implementation that is restricted neither to the translation platform nor to the Machine Translation paradigm. We test preordering on three language pairs: English →Japanese/German/Chinese for both Statistical Machine Translation (SMT) and Neural Machine Translation (NMT). Our experiments confirm previously reported improvements in the SMT output when the models are trained on preordered data, but they also show that preordering does not improve NMT.

Author(s):  
Rashmini Naranpanawa ◽  
Ravinga Perera ◽  
Thilakshi Fonseka ◽  
Uthayasanker Thayasivam

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.


2014 ◽  
Vol 102 (1) ◽  
pp. 69-80 ◽  
Author(s):  
Torregrosa Daniel ◽  
Forcada Mikel L. ◽  
Pérez-Ortiz Juan Antonio

Abstract We present a web-based open-source tool for interactive translation prediction (ITP) and describe its underlying architecture. ITP systems assist human translators by making context-based computer-generated suggestions as they type. Most of the ITP systems in literature are strongly coupled with a statistical machine translation system that is conveniently adapted to provide the suggestions. Our system, however, follows a resource-agnostic approach and suggestions are obtained from any unmodified black-box bilingual resource. This paper reviews our ITP method and describes the architecture of Forecat, a web tool, partly based on the recent technology of web components, that eases the use of our ITP approach in any web application requiring this kind of translation assistance. We also evaluate the performance of our method when using an unmodified Moses-based statistical machine translation system as the bilingual resource.


Author(s):  
Guillaume Klein ◽  
Yoon Kim ◽  
Yuntian Deng ◽  
Jean Senellart ◽  
Alexander Rush

Author(s):  
Yu Chen ◽  
Andreas Eisele ◽  
Christian Federmann ◽  
Eva Hasler ◽  
Michael Jellinghaus ◽  
...  

2016 ◽  
Vol 5 (4) ◽  
pp. 51-66 ◽  
Author(s):  
Krzysztof Wolk ◽  
Krzysztof P. Marasek

The quality of machine translation is rapidly evolving. Today one can find several machine translation systems on the web that provide reasonable translations, although the systems are not perfect. In some specific domains, the quality may decrease. A recently proposed approach to this domain is neural machine translation. It aims at building a jointly-tuned single neural network that maximizes translation performance, a very different approach from traditional statistical machine translation. Recently proposed neural machine translation models often belong to the encoder-decoder family in which a source sentence is encoded into a fixed length vector that is, in turn, decoded to generate a translation. The present research examines the effects of different training methods on a Polish-English Machine Translation system used for medical data. The European Medicines Agency parallel text corpus was used as the basis for training of neural and statistical network-based translation systems. A comparison and implementation of a medical translator is the main focus of our experiments.


2015 ◽  
Vol 104 (1) ◽  
pp. 5-16 ◽  
Author(s):  
Matt Post ◽  
Yuan Cao ◽  
Gaurav Kumar

Abstract We describe the version six release of Joshua, an open-source statistical machine translation toolkit. The main difference from release five is the introduction of a simple, unlexicalized, phrase-based stack decoder. This phrase-based decoder shares a hypergraph format with the syntax-based systems, permitting a tight coupling with the existing codebase of feature functions and hypergraph tools. Joshua 6 also includes a number of large-scale discriminative tuners and a simplified sparse feature function interface with reflection-based loading, which allows new features to be used by writing a single function. Finally, Joshua includes a number of simplifications and improvements focused on usability for both researchers and end-users, including the release of language packs — precompiled models that can be run as black boxes.


2018 ◽  
Vol 13 (3) ◽  
pp. 486-508
Author(s):  
Federico M. Federici ◽  
Khetam Al Sharou

Abstract Training translators to react to sudden emergencies is a challenge. This article presents the results of a training experiment testing the speed of acquisition of the skills necessary to operate the open-source Moses statistical machine translation (SMT) system. A task-based approach was used with trainee translators who had no experience working with MT technology. The experiment is a feasibility study to ascertain whether training on Moses SMT could be considered for long-lasting crisis scenarios. The article reports its findings in four sections. The first section discusses the research context in which ‘crisis translation’ is defined; the second section illustrates the rationale of the experiment; the third section looks at the results of the training experiment; and the fourth at the trainees’ perceptions of their learning processes. The conclusion reflects on the viability of using Moses and on the next phases needed to refine the findings of this first experiment.


2021 ◽  
Vol 22 (1) ◽  
pp. 100-123
Author(s):  
Xiangling Wang ◽  
Tingting Wang ◽  
Ricardo Muñoz Martín ◽  
Yanfang Jia

AbstractThis is a report on an empirical study on the usability for translation trainees of neural machine translation systems when post-editing (mtpe). Sixty Chinese translation trainees completed a questionnaire on their perceptions of mtpe's usability. Fifty of them later performed both a post-editing task and a regular translation task, designed to examine mtpe's usability by comparing their performance in terms of text processing speed, effort, and translation quality. Contrasting data collected by the questionnaire, keylogging, eyetracking and retrospective reports we found that, compared with regular, unaided translation, mtpe's usefulness in performance was remarkable: (1) it increased translation trainees' text processing speed and also improved their translation quality; (2) mtpe's ease of use in performance was partly proved in that it significantly reduced informants' effort as measured by (a) fixation duration and fixation counts; (b) total task time; and (c) the number of insertion keystrokes and total keystrokes. However, (3) translation trainees generally perceived mtpe to be useful to increase productivity, but they were skeptical about its use to improve quality. They were neutral towards the ease of use of mtpe.


Language barrier is a common issue faced by humans who move from one community or group to another. Statistical machine translation has enabled us to solve this issue to a certain extent, by formulating models to translate text from one language to another. Statistical machine translation has come a long way but they have their limitations in terms of translating words that belongs to an entirely different context that is not available in the training dataset. This has paved way for neural Machine Translation (NMT), a deep learning approach in solving sequence to sequence translation. Khasi is a language popularly spoken in Meghalaya, a north-east state in India. Its wide and unexplored. In this paper we will discuss about the modeling and analyzing of a NMT base model and a NMT model using Attention mechanism for English to Khasi.


Sign in / Sign up

Export Citation Format

Share Document