Integration of a Multilingual Preordering Component into a Commercial SMT Platform

AbstractWe present a multilingual preordering component tailored for a commercial Statistical Machine translation platform. In commercial settings, issues such as processing speed as well as the ability to adapt models to the customers’ needs play a significant role and have a big impact on the choice of approaches that are added to the custom pipeline to deal with specific problems such as long-range reorderings.We developed a fast and customisable preordering component, also available as an open-source tool, which comes along with a generic implementation that is restricted neither to the translation platform nor to the Machine Translation paradigm. We test preordering on three language pairs: English →Japanese/German/Chinese for both Statistical Machine Translation (SMT) and Neural Machine Translation (NMT). Our experiments confirm previously reported improvements in the SMT output when the models are trained on preordered data, but they also show that preordering does not improve NMT.

Download Full-text

Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation

International Journal of Asian Language Processing ◽

10.1142/s2717554520500174 ◽

2021 ◽

pp. 2050017

Author(s):

Rashmini Naranpanawa ◽

Ravinga Perera ◽

Thilakshi Fonseka ◽

Uthayasanker Thayasivam

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Translation System ◽

Rare Word ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Low Resource ◽

Word Level ◽

Morphologically Rich Languages

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.

Download Full-text

An Open-Source Web-Based Tool for Resource-Agnostic Interactive Translation Prediction

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2014-0015 ◽

2014 ◽

Vol 102 (1) ◽

pp. 69-80 ◽

Cited By ~ 2

Author(s):

Torregrosa Daniel ◽

Forcada Mikel L. ◽

Pérez-Ortiz Juan Antonio

Keyword(s):

Open Source ◽

Machine Translation ◽

Web Application ◽

Statistical Machine Translation ◽

Black Box ◽

Translation System ◽

Web Tool ◽

Web Based ◽

Strongly Coupled ◽

Machine Translation System

Abstract We present a web-based open-source tool for interactive translation prediction (ITP) and describe its underlying architecture. ITP systems assist human translators by making context-based computer-generated suggestions as they type. Most of the ITP systems in literature are strongly coupled with a statistical machine translation system that is conveniently adapted to provide the suggestions. Our system, however, follows a resource-agnostic approach and suggestions are obtained from any unmodified black-box bilingual resource. This paper reviews our ITP method and describes the architecture of Forecat, a web tool, partly based on the recent technology of web components, that eases the use of our ITP approach in any web application requiring this kind of translation assistance. We also evaluate the performance of our method when using an unmodified Moses-based statistical machine translation system as the bilingual resource.

Download Full-text

OpenNMT: Open-Source Toolkit for Neural Machine Translation

10.18653/v1/p17-4012 ◽

2017 ◽

Cited By ~ 139

Author(s):

Guillaume Klein ◽

Yoon Kim ◽

Yuntian Deng ◽

Jean Senellart ◽

Alexander Rush

Keyword(s):

Open Source ◽

Machine Translation ◽

Neural Machine Translation

Download Full-text

Multi-engine machine translation with an open-source decoder for statistical machine translation

10.3115/1626355.1626381 ◽

2007 ◽

Cited By ~ 2

Author(s):

Yu Chen ◽

Andreas Eisele ◽

Christian Federmann ◽

Eva Hasler ◽

Michael Jellinghaus ◽

...

Keyword(s):

Open Source ◽

Machine Translation ◽

Statistical Machine Translation

Download Full-text

Translation of Medical Texts using Neural Networks

International Journal of Reliable and Quality E-Healthcare ◽

10.4018/ijrqeh.2016100104 ◽

2016 ◽

Vol 5 (4) ◽

pp. 51-66 ◽

Cited By ~ 5

Author(s):

Krzysztof Wolk ◽

Krzysztof P. Marasek

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

European Medicines Agency ◽

Translation System ◽

Training Methods ◽

Neural Machine Translation ◽

Machine Translation System ◽

Source Sentence ◽

Parallel Text ◽

Translation Systems

The quality of machine translation is rapidly evolving. Today one can find several machine translation systems on the web that provide reasonable translations, although the systems are not perfect. In some specific domains, the quality may decrease. A recently proposed approach to this domain is neural machine translation. It aims at building a jointly-tuned single neural network that maximizes translation performance, a very different approach from traditional statistical machine translation. Recently proposed neural machine translation models often belong to the encoder-decoder family in which a source sentence is encoded into a fixed length vector that is, in turn, decoded to generate a translation. The present research examines the effects of different training methods on a Polish-English Machine Translation system used for medical data. The European Medicines Agency parallel text corpus was used as the basis for training of neural and statistical network-based translation systems. A comparison and implementation of a medical translator is the main focus of our experiments.

Download Full-text

Neural Machine Translation Advised by Statistical Machine Translation: The Case of Farsi-Spanish Bilingually Low-Resource Scenario

2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) ◽

10.1109/icmla.2018.00196 ◽

2018 ◽

Cited By ~ 2

Author(s):

Benyamin Ahmadnia ◽

Parisa Kordjamshidi ◽

Gholamreza Haffari

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Neural Machine Translation ◽

Low Resource

Download Full-text

Joshua 6: A phrase-based and hierarchical statistical machine translation system

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2015-0009 ◽

2015 ◽

Vol 104 (1) ◽

pp. 5-16 ◽

Cited By ~ 1

Author(s):

Matt Post ◽

Yuan Cao ◽

Gaurav Kumar

Keyword(s):

Open Source ◽

Machine Translation ◽

Large Scale ◽

Statistical Machine Translation ◽

End Users ◽

Translation System ◽

Tight Coupling ◽

Single Function ◽

Black Boxes ◽

Machine Translation System

Abstract We describe the version six release of Joshua, an open-source statistical machine translation toolkit. The main difference from release five is the introduction of a simple, unlexicalized, phrase-based stack decoder. This phrase-based decoder shares a hypergraph format with the syntax-based systems, permitting a tight coupling with the existing codebase of feature functions and hypergraph tools. Joshua 6 also includes a number of large-scale discriminative tuners and a simplified sparse feature function interface with reflection-based loading, which allows new features to be used by writing a single function. Finally, Joshua includes a number of simplifications and improvements focused on usability for both researchers and end-users, including the release of language packs — precompiled models that can be run as black boxes.

Download Full-text

Moses, time, and crisis translation

Translation and Interpreting Studies ◽

10.1075/tis.00026.fed ◽

2018 ◽

Vol 13 (3) ◽

pp. 486-508

Author(s):

Federico M. Federici ◽

Khetam Al Sharou

Keyword(s):

Open Source ◽

Machine Translation ◽

Feasibility Study ◽

Statistical Machine Translation ◽

Learning Processes ◽

The Third ◽

Research Context ◽

System A ◽

Crisis Scenarios

Abstract Training translators to react to sudden emergencies is a challenge. This article presents the results of a training experiment testing the speed of acquisition of the skills necessary to operate the open-source Moses statistical machine translation (SMT) system. A task-based approach was used with trainee translators who had no experience working with MT technology. The experiment is a feasibility study to ascertain whether training on Moses SMT could be considered for long-lasting crisis scenarios. The article reports its findings in four sections. The first section discusses the research context in which ‘crisis translation’ is defined; the second section illustrates the rationale of the experiment; the third section looks at the results of the training experiment; and the fourth at the trainees’ perceptions of their learning processes. The conclusion reflects on the viability of using Moses and on the next phases needed to refine the findings of this first experiment.

Download Full-text

Investigating usability in postediting neural machine translation: Evidence from translation trainees' self-perception and performance

Across Languages and Cultures ◽

10.1556/084.2021.00006 ◽

2021 ◽

Vol 22 (1) ◽

pp. 100-123

Author(s):

Xiangling Wang ◽

Tingting Wang ◽

Ricardo Muñoz Martín ◽

Yanfang Jia

Keyword(s):

Machine Translation ◽

Processing Speed ◽

Text Processing ◽

Ease Of Use ◽

Chinese Translation ◽

Neural Machine Translation ◽

Translation Quality ◽

Retrospective Reports ◽

And Performance ◽

Translation Systems

AbstractThis is a report on an empirical study on the usability for translation trainees of neural machine translation systems when post-editing (mtpe). Sixty Chinese translation trainees completed a questionnaire on their perceptions of mtpe's usability. Fifty of them later performed both a post-editing task and a regular translation task, designed to examine mtpe's usability by comparing their performance in terms of text processing speed, effort, and translation quality. Contrasting data collected by the questionnaire, keylogging, eyetracking and retrospective reports we found that, compared with regular, unaided translation, mtpe's usefulness in performance was remarkable: (1) it increased translation trainees' text processing speed and also improved their translation quality; (2) mtpe's ease of use in performance was partly proved in that it significantly reduced informants' effort as measured by (a) fixation duration and fixation counts; (b) total task time; and (c) the number of insertion keystrokes and total keystrokes. However, (3) translation trainees generally perceived mtpe to be useful to increase productivity, but they were skeptical about its use to improve quality. They were neutral towards the ease of use of mtpe.

Download Full-text

Analyses and Modeling of Neural Machine Translation for English-to-Khasi

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b3175.079220 ◽

2020 ◽

Vol 9 (2) ◽

pp. 115-118

Keyword(s):

Deep Learning ◽

Machine Translation ◽

Statistical Machine Translation ◽

Language Barrier ◽

Attention Mechanism ◽

Training Dataset ◽

Learning Approach ◽

Neural Machine Translation ◽

Community Or ◽

North East

Language barrier is a common issue faced by humans who move from one community or group to another. Statistical machine translation has enabled us to solve this issue to a certain extent, by formulating models to translate text from one language to another. Statistical machine translation has come a long way but they have their limitations in terms of translating words that belongs to an entirely different context that is not available in the training dataset. This has paved way for neural Machine Translation (NMT), a deep learning approach in solving sequence to sequence translation. Khasi is a language popularly spoken in Meghalaya, a north-east state in India. Its wide and unexplored. In this paper we will discuss about the modeling and analyzing of a NMT base model and a NMT model using Attention mechanism for English to Khasi.

Download Full-text