Free/open-source machine translation: preface

2011 ◽  
Vol 25 (2) ◽  
pp. 83-86 ◽  
Author(s):  
Felipe Sánchez-Martínez ◽  
Mikel L. Forcada
Author(s):  
Tanmai Khanna ◽  
Jonathan N. Washington ◽  
Francis M. Tyers ◽  
Sevilay Bayatlı ◽  
Daniel G. Swanson ◽  
...  

AbstractThis paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional modules: a module that allows rules to process recursive structures at the structural transfer stage, a module that deals with contiguous and discontiguous multi-word expressions, and a module that resolves anaphora to aid translation. Also highlighted is the hybridisation of Apertium through statistical modules that augment the pipeline, and statistical methods that augment existing modules. This includes morphological disambiguation, weighted structural transfer, and lexical selection modules that learn from limited data. The paper also discusses how a platform like Apertium can be a critical part of access to language technology for so-called low-resource languages, which might be ignored or deemed unapproachable by popular corpus-based translation technologies. Finally, the paper presents some of the released and unreleased language pairs, concluding with a brief look at some supplementary Apertium tools that prove valuable to users as well as language developers. All Apertium-related code, including language data, is free/open-source and available at https://github.com/apertium.


2011 ◽  
Vol 25 (2) ◽  
pp. 127-144 ◽  
Author(s):  
Mikel L. Forcada ◽  
Mireia Ginestí-Rosell ◽  
Jacob Nordfalk ◽  
Jim O’Regan ◽  
Sergio Ortiz-Rojas ◽  
...  

2020 ◽  
Vol 11 (1) ◽  
pp. 61-80
Author(s):  
Carlos Manuel Hidalgo-Ternero ◽  
Gloria Corpas Pastor

AbstractThe present research introduces the tool gApp, a Python-based text preprocessing system for the automatic identification and conversion of discontinuous multiword expressions (MWEs) into their continuous form in order to enhance neural machine translation (NMT). To this end, an experiment with semi-fixed verb–noun idiomatic combinations (VNICs) will be carried out in order to evaluate to what extent gApp can optimise the performance of the two main free open-source NMT systems —Google Translate and DeepL— under the challenge of MWE discontinuity in the Spanish into English directionality. In the light of our promising results, the study concludes with suggestions on how to further optimise MWE-aware NMT systems.


2009 ◽  
Vol 34 ◽  
pp. 605-635 ◽  
Author(s):  
F. Sánchez-Martínez ◽  
M. L. Forcada

This paper describes a method for the automatic inference of structural transfer rules to be used in a shallow-transfer machine translation (MT) system from small parallel corpora. The structural transfer rules are based on alignment templates, like those used in statistical MT. Alignment templates are extracted from sentence-aligned parallel corpora and extended with a set of restrictions which are derived from the bilingual dictionary of the MT system and control their application as transfer rules. The experiments conducted using three different language pairs in the free/open-source MT platform Apertium show that translation quality is improved as compared to word-for-word translation (when no transfer rules are used), and that the resulting translation quality is close to that obtained using hand-coded transfer rules. The method we present is entirely unsupervised and benefits from information in the rest of modules of the MT system in which the inferred rules are applied.


Author(s):  
Sandipan Dandapat ◽  
Mikel L. Forcada ◽  
Declan Groves ◽  
Sergio Penkale ◽  
John Tinsley ◽  
...  

2018 ◽  
Vol 2 (2) ◽  
pp. 32
Author(s):  
Kanaan Mikael Kaka-Khan

In this paper we present a machine translation system developed to translate simple English sentences to Kurdish. The system is based on the (apertuim) free open source engine that provides the environment and the required tools to develop a machine translation system. The developed system is used to translate some as simple sentence, compound sentence, phrases and idioms from English to Kurdish. The resulting translation is then evaluated manually for accuracy and completeness compared to the result produced by the popular (inKurdish) English to Kurdish machine translation system. The result shows that our system is more accurate than inkurdish system. This paper contributes towards the ongoing effort to achieve full machine-based translation in general and English to Kurdish machine translation in specific.


Author(s):  
Carlos Manuel Hidalgo-Ternero

The present research analyses the performance of two free open-source neural machine translation (NMT) systems —Google Translate and DeepL— in the (ES>EN) translation of somatisms such as tomar el pelo and meter la pata, their nominal variants (tomadura/tomada de pelo and metedura/metida de pata), and other lower-frequency variants such as meter la pata hasta el corvejón, meter la gamba and metedura/metida de gamba. The machine translation outcomes will be contrasted and classified depending on whether these idioms are presented in their continuous or discontinuous form (Anastasiou 2010), i.e., whether different n-grams split the idiomatic sequence (or not), which may pose some difficulties for their automatic detection and translation. Overall, the insights gained from this study will prove useful in determining for which of the different scenarios either Google Translate or DeepL delivers a better performance under the challenge of phraseological variation and discontinuity.


2010 ◽  
Vol 93 (1) ◽  
pp. 67-76 ◽  
Author(s):  
Francis Tyers ◽  
Felipe Sánchez-Martínez ◽  
Sergio Ortiz-Rojas ◽  
Mikel Forcada

Free/Open-Source Resources in the Apertium Platform for Machine Translation Research and DevelopmentThis paper describes the resources available in the Apertium platform, a free/open-source framework for creating rule-based machine translation systems. Resources within the platform take the form of finite-state morphologies for morphological analysis and generation, bilingual transfer lexica, probabilistic part-of-speech taggers and transfer rule files, all in standardised formats. These resources are described and some examples are given of their reuse and recycling in combination with other machine translation systems.


Sign in / Sign up

Export Citation Format

Share Document