Bridging the “gApp”: improving neural machine translation systems for multiword expression detection

AbstractThe present research introduces the tool gApp, a Python-based text preprocessing system for the automatic identification and conversion of discontinuous multiword expressions (MWEs) into their continuous form in order to enhance neural machine translation (NMT). To this end, an experiment with semi-fixed verb–noun idiomatic combinations (VNICs) will be carried out in order to evaluate to what extent gApp can optimise the performance of the two main free open-source NMT systems —Google Translate and DeepL— under the challenge of MWE discontinuity in the Spanish into English directionality. In the light of our promising results, the study concludes with suggestions on how to further optimise MWE-aware NMT systems.

Download Full-text

Google Translate vs. DeepL

MonTi Monografías de Traducción e Interpretación ◽

10.6035/monti.2020.ne6.5 ◽

2021 ◽

pp. 154-177

Author(s):

Carlos Manuel Hidalgo-Ternero

Keyword(s):

Open Source ◽

Machine Translation ◽

Automatic Detection ◽

Neural Machine Translation ◽

Free Open Source

The present research analyses the performance of two free open-source neural machine translation (NMT) systems —Google Translate and DeepL— in the (ES>EN) translation of somatisms such as tomar el pelo and meter la pata, their nominal variants (tomadura/tomada de pelo and metedura/metida de pata), and other lower-frequency variants such as meter la pata hasta el corvejón, meter la gamba and metedura/metida de gamba. The machine translation outcomes will be contrasted and classified depending on whether these idioms are presented in their continuous or discontinuous form (Anastasiou 2010), i.e., whether different n-grams split the idiomatic sequence (or not), which may pose some difficulties for their automatic detection and translation. Overall, the insights gained from this study will prove useful in determining for which of the different scenarios either Google Translate or DeepL delivers a better performance under the challenge of phraseological variation and discontinuity.

Download Full-text

Free/Open-Source Resources in the Apertium Platform for Machine Translation Research and Development

Prague Bulletin of Mathematical Linguistics ◽

10.2478/v10108-010-0015-5 ◽

2010 ◽

Vol 93 (1) ◽

pp. 67-76 ◽

Cited By ~ 8

Author(s):

Francis Tyers ◽

Felipe Sánchez-Martínez ◽

Sergio Ortiz-Rojas ◽

Mikel Forcada

Keyword(s):

Research And Development ◽

Open Source ◽

Machine Translation ◽

Morphological Analysis ◽

Translation Research ◽

Part Of Speech ◽

Open Source Framework ◽

Finite State ◽

Translation Systems ◽

Free Open Source

Free/Open-Source Resources in the Apertium Platform for Machine Translation Research and DevelopmentThis paper describes the resources available in the Apertium platform, a free/open-source framework for creating rule-based machine translation systems. Resources within the platform take the form of finite-state morphologies for morphological analysis and generation, bilingual transfer lexica, probabilistic part-of-speech taggers and transfer rule files, all in standardised formats. These resources are described and some examples are given of their reuse and recycling in combination with other machine translation systems.

Download Full-text

OpenNMT: Open-Source Toolkit for Neural Machine Translation

10.18653/v1/p17-4012 ◽

2017 ◽

Cited By ~ 139

Author(s):

Guillaume Klein ◽

Yoon Kim ◽

Yuntian Deng ◽

Jean Senellart ◽

Alexander Rush

Keyword(s):

Open Source ◽

Machine Translation ◽

Neural Machine Translation

Download Full-text

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Machine Translation ◽

10.1007/s10590-021-09260-6 ◽

2021 ◽

Author(s):

Tanmai Khanna ◽

Jonathan N. Washington ◽

Francis M. Tyers ◽

Sevilay Bayatlı ◽

Daniel G. Swanson ◽

...

Keyword(s):

Open Source ◽

Machine Translation ◽

Lexical Selection ◽

Rule Based ◽

Low Resource ◽

Language Technology ◽

Language Data ◽

Recursive Structures ◽

Platform Translation ◽

Free Open Source

AbstractThis paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional modules: a module that allows rules to process recursive structures at the structural transfer stage, a module that deals with contiguous and discontiguous multi-word expressions, and a module that resolves anaphora to aid translation. Also highlighted is the hybridisation of Apertium through statistical modules that augment the pipeline, and statistical methods that augment existing modules. This includes morphological disambiguation, weighted structural transfer, and lexical selection modules that learn from limited data. The paper also discusses how a platform like Apertium can be a critical part of access to language technology for so-called low-resource languages, which might be ignored or deemed unapproachable by popular corpus-based translation technologies. Finally, the paper presents some of the released and unreleased language pairs, concluding with a brief look at some supplementary Apertium tools that prove valuable to users as well as language developers. All Apertium-related code, including language data, is free/open-source and available at https://github.com/apertium.

Download Full-text

Is automation changing the translation profession?

International Journal of the Sociology of Language ◽

10.1515/ijsl-2020-0015 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Anthony Pym ◽

Ester Torres-Simón

Keyword(s):

Machine Translation ◽

Communication Skills ◽

Service Providers ◽

Time Data ◽

Market Segments ◽

Interactive Communication ◽

Neural Machine Translation ◽

History Of ◽

Translation Services ◽

Translation Systems

Abstract As a language-intensive profession, translation is of frontline interest in the era of language automation. In particular, the development of neural machine translation systems since 2016 has brought with it fears that soon there will be no more human translators. When considered in terms of the history of automation, however, any such direct effect is far from obvious: the translation industry is still growing and machine translation is only one instance of automation. At the same time, data on remuneration indicate structural wage dispersion in professional translation services, with some signs that this dispersion may increase in certain market segments as automated workflows and translation technologies are adopted more by large language-service providers more than by smaller companies and individual freelancers. An analysis of recent changes in discourses on and in the translation profession further indicates conceptual adjustments in the profession that may be attributed to growing automation, particularly with respect to expanding skills set associated with translation, the tendency to combine translation with other forms of communication, and the use of interactive communication skills to authorize and humanize the results of automation.

Download Full-text

Checkpoint Reranking: An Approach to Select Better Hypothesis for Neural Machine Translation Systems

10.18653/v1/2020.acl-srw.38 ◽

2020 ◽

Author(s):

Vinay Pandramish ◽

Dipti Misra Sharma

Keyword(s):

Machine Translation ◽

Neural Machine Translation ◽

Translation Systems

Download Full-text

Tsinghua University Neural Machine Translation Systems for CCMT 2020

Communications in Computer and Information Science - Machine Translation ◽

10.1007/978-981-33-6162-1_9 ◽

2020 ◽

pp. 98-104

Author(s):

Gang Chen ◽

Shuo Wang ◽

Xuancheng Huang ◽

Zhixing Tan ◽

Maosong Sun ◽

...

Keyword(s):

Machine Translation ◽

Neural Machine Translation ◽

Tsinghua University ◽

Translation Systems

Download Full-text

A Survey on Hybrid Machine Translation

E3S Web of Conferences ◽

10.1051/e3sconf/202018401061 ◽

2020 ◽

Vol 184 ◽

pp. 01061

Author(s):

Anusha Anugu ◽

Gajula Ramesh

Keyword(s):

Machine Translation ◽

Language Processing ◽

Literature Survey ◽

Neural Machine Translation ◽

Translation Tools ◽

Translation Techniques ◽

Hybrid Machine ◽

Hybrid Machine Translation ◽

Translation Systems ◽

Evaluation Techniques

Machine translation has gradually developed in past 1940’s.It has gained more and more attention because of effective and efficient nature. As it makes the translation automatically without the involvement of human efforts. The distinct models of machine translation along with “Neural Machine Translation (NMT)” is summarized in this paper. Researchers have previously done lots of work on Machine Translation techniques and their evaluation techniques. Thus, we want to demonstrate an analysis of the existing techniques for machine translation including Neural Machine translation, their differences and the translation tools associated with them. Now-a-days the combination of two Machine Translation systems has the full advantage of using features from both the systems which attracts in the domain of natural language processing. So, the paper also includes the literature survey of the Hybrid Machine Translation (HMT).

Download Full-text

Integration of a Multilingual Preordering Component into a Commercial SMT Platform

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0009 ◽

2017 ◽

Vol 108 (1) ◽

pp. 61-72

Author(s):

Anita Ramm ◽

Riccardo Superbo ◽

Dimitar Shterionov ◽

Tony O’Dowd ◽

Alexander Fraser

Keyword(s):

Open Source ◽

Machine Translation ◽

Long Range ◽

Significant Role ◽

Processing Speed ◽

Statistical Machine Translation ◽

Neural Machine Translation ◽

Open Source Tool

AbstractWe present a multilingual preordering component tailored for a commercial Statistical Machine translation platform. In commercial settings, issues such as processing speed as well as the ability to adapt models to the customers’ needs play a significant role and have a big impact on the choice of approaches that are added to the custom pipeline to deal with specific problems such as long-range reorderings.We developed a fast and customisable preordering component, also available as an open-source tool, which comes along with a generic implementation that is restricted neither to the translation platform nor to the Machine Translation paradigm. We test preordering on three language pairs: English →Japanese/German/Chinese for both Statistical Machine Translation (SMT) and Neural Machine Translation (NMT). Our experiments confirm previously reported improvements in the SMT output when the models are trained on preordered data, but they also show that preordering does not improve NMT.

Download Full-text

Translation of Medical Texts using Neural Networks

International Journal of Reliable and Quality E-Healthcare ◽

10.4018/ijrqeh.2016100104 ◽

2016 ◽

Vol 5 (4) ◽

pp. 51-66 ◽

Cited By ~ 5

Author(s):

Krzysztof Wolk ◽

Krzysztof P. Marasek

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

European Medicines Agency ◽

Translation System ◽

Training Methods ◽

Neural Machine Translation ◽

Machine Translation System ◽

Source Sentence ◽

Parallel Text ◽

Translation Systems

The quality of machine translation is rapidly evolving. Today one can find several machine translation systems on the web that provide reasonable translations, although the systems are not perfect. In some specific domains, the quality may decrease. A recently proposed approach to this domain is neural machine translation. It aims at building a jointly-tuned single neural network that maximizes translation performance, a very different approach from traditional statistical machine translation. Recently proposed neural machine translation models often belong to the encoder-decoder family in which a source sentence is encoded into a fixed length vector that is, in turn, decoded to generate a translation. The present research examines the effects of different training methods on a Polish-English Machine Translation system used for medical data. The European Medicines Agency parallel text corpus was used as the basis for training of neural and statistical network-based translation systems. A comparison and implementation of a medical translator is the main focus of our experiments.

Download Full-text