NMT-Keras: a Very Flexible Toolkit with a Focus on Interactive NMT and Online Learning

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.

Download Full-text

Translation of Medical Texts using Neural Networks

International Journal of Reliable and Quality E-Healthcare ◽

10.4018/ijrqeh.2016100104 ◽

2016 ◽

Vol 5 (4) ◽

pp. 51-66 ◽

Cited By ~ 5

Author(s):

Krzysztof Wolk ◽

Krzysztof P. Marasek

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

European Medicines Agency ◽

Translation System ◽

Training Methods ◽

Neural Machine Translation ◽

Machine Translation System ◽

Source Sentence ◽

Parallel Text ◽

Translation Systems

The quality of machine translation is rapidly evolving. Today one can find several machine translation systems on the web that provide reasonable translations, although the systems are not perfect. In some specific domains, the quality may decrease. A recently proposed approach to this domain is neural machine translation. It aims at building a jointly-tuned single neural network that maximizes translation performance, a very different approach from traditional statistical machine translation. Recently proposed neural machine translation models often belong to the encoder-decoder family in which a source sentence is encoded into a fixed length vector that is, in turn, decoded to generate a translation. The present research examines the effects of different training methods on a Polish-English Machine Translation system used for medical data. The European Medicines Agency parallel text corpus was used as the basis for training of neural and statistical network-based translation systems. A comparison and implementation of a medical translator is the main focus of our experiments.

Download Full-text

Human evaluation of three machine translation systems: from quality to attitudes by professional translators

Vigo International Journal of Applied Linguistics ◽

10.35869/vial.v0i18.3366 ◽

2021 ◽

pp. 123-148

Author(s):

Anna Fernández Torné ◽

Anna Matamala

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Evaluation Process ◽

Translation System ◽

Neural Machine Translation ◽

System A ◽

Human Evaluation ◽

Machine Translation System ◽

Translation Systems ◽

Document Level

This article aims to compare three machine translation systems with a focus on human evaluation. The systems under analysis are a domain-adapted statistical machine translation system, a domain-adapted neural machine translation system and a generic machine translation system. The comparison is carried out on translation from Spanish into German with industrial documentation of machine tool components and processes. The focus is on the human evaluation of the machine translation output, specifically on: fluency, adequacy and ranking at the segment level; fluency, adequacy, need for post-editing, ease of post-editing, and mental effort required in post-editing at the document level; productivity (post-editing speed and post-editing effort) and attitudes. Emphasis is placed on human factors in the evaluation process.

Download Full-text

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00065 ◽

2017 ◽

Vol 5 ◽

pp. 339-351 ◽

Cited By ~ 159

Author(s):

Melvin Johnson ◽

Mike Schuster ◽

Quoc V. Le ◽

Maxim Krikun ◽

Yonghui Wu ◽

...

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Target Language ◽

Translation System ◽

Single Model ◽

Neural Machine Translation ◽

Comparable Performance ◽

Machine Translation System ◽

Input Sentence ◽

Multiple Languages

We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no changes to the model architecture from a standard NMT system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT systems using a single model. On the WMT’14 benchmarks, a single multilingual model achieves comparable performance for English→French and surpasses state-of-theart results for English→German. Similarly, a single multilingual model surpasses state-of-the-art results for French→English and German→English on WMT’14 and WMT’15 benchmarks, respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. Our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and also show some interesting examples when mixing languages.

Download Full-text

A Survey of Orthographic Information in Machine Translation

SN Computer Science ◽

10.1007/s42979-021-00723-4 ◽

2021 ◽

Vol 2 (4) ◽

Author(s):

Bharathi Raja Chakravarthi ◽

Priya Rani ◽

Mihael Arcan ◽

John P. McCrae

Keyword(s):

Machine Translation ◽

Language Processing ◽

Orthographic Knowledge ◽

Translation System ◽

Neural Machine Translation ◽

Machine Translation System ◽

Translation Methods ◽

Traditional Approaches ◽

Translation Systems ◽

Different Levels

AbstractMachine translation is one of the applications of natural language processing which has been explored in different languages. Recently researchers started paying attention towards machine translation for resource-poor languages and closely related languages. A widespread and underlying problem for these machine translation systems is the linguistic difference and variation in orthographic conventions which causes many issues to traditional approaches. Two languages written in two different orthographies are not easily comparable but orthographic information can also be used to improve the machine translation system. This article offers a survey of research regarding orthography’s influence on machine translation of under-resourced languages. It introduces under-resourced languages in terms of machine translation and how orthographic information can be utilised to improve machine translation. We describe previous work in this area, discussing what underlying assumptions were made, and showing how orthographic knowledge improves the performance of machine translation of under-resourced languages. We discuss different types of machine translation and demonstrate a recent trend that seeks to link orthographic information with well-established machine translation methods. Considerable attention is given to current efforts using cognate information at different levels of machine translation and the lessons that can be drawn from this. Additionally, multilingual neural machine translation of closely related languages is given a particular focus in this survey. This article ends with a discussion of the way forward in machine translation with orthographic information, focusing on multilingual settings and bilingual lexicon induction.

Download Full-text

METHOD OF SYSTEM ENGINEERING OF NEURAL MACHINE TRANSLATION SYSTEMS

KPI Science News ◽

10.20535/kpisn.2021.2.236939 ◽

2021 ◽

Author(s):

Pavlo P. Maslianko ◽

Yevhenii P. Sielskyi

Keyword(s):

Machine Translation ◽

Data Science ◽

System Engineering ◽

Translation System ◽

High Quality ◽

Neural Machine Translation ◽

Meta Level ◽

Machine Translation System ◽

Productive Process ◽

Translation Systems

Background. There are not many machine translation companies on the market whose products are in demand. These are, for example, free and commercial products such as “GoogleTranslate”, “DeepLTranslator”, “ModernMT”, “Apertium”, “Trident”, to name a few. To implement a more efficient and productive process for developing high-quality neural machine translation systems (NMTS), appropriate scientifically based methods of NMTS engineering are needed in order to get a high-quality and competitive product as quickly as possible. Objective. The purpose of this article is to apply the Eriksson-Penker business profile to the development and formalization of a method for system engineering of NMTS. Methods. The idea behind the neural machine translation system engineering method is to apply the Eriksson-Penker system engineering methodology and business profile to formalize an ordered way to develop NMT systems. Results. The method of developing NMT systems based on the use of system engineering techniques consists of three main stages. At the first stage, the structure of the NMT system is modelled in the form of an Eriksson-Penker business profile. At the second stage, a set of processes is determined that is specific to the class of Data Science systems, and the international CRISP-DM standard. At the third stage, verification and validation of the developed NMTS is carried out. Conclusions. The article proposes a method of system engineering of NMTS based on the modified Erickson-Penker business profile representation of the system at the meta-level, as well as international process standards of Data Science and Data Mining. The effectiveness of using this method was studied on the example of developing a bidirectional English-Ukrainian NMTS EUMT (English-Ukrainian Machine Translator) and it was found that the EUMT system is at least as good as the quality of English-Ukrainian translation of the popular Google Translate translator. The full version code of the EUMT system is published on the GitHub platform and is available at: https://github.com/EugeneSel/EUMT.

Download Full-text

Translation of Medical Texts using Neural Networks

Deep Learning and Neural Networks ◽

10.4018/978-1-7998-0414-7.ch063 ◽

2020 ◽

pp. 1137-1154

Author(s):

Krzysztof Wolk ◽

Krzysztof P. Marasek

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

European Medicines Agency ◽

Translation System ◽

Training Methods ◽

Neural Machine Translation ◽

Machine Translation System ◽

Source Sentence ◽

Parallel Text ◽

Translation Systems

The quality of machine translation is rapidly evolving. Today one can find several machine translation systems on the web that provide reasonable translations, although the systems are not perfect. In some specific domains, the quality may decrease. A recently proposed approach to this domain is neural machine translation. It aims at building a jointly-tuned single neural network that maximizes translation performance, a very different approach from traditional statistical machine translation. Recently proposed neural machine translation models often belong to the encoder-decoder family in which a source sentence is encoded into a fixed length vector that is, in turn, decoded to generate a translation. The present research examines the effects of different training methods on a Polish-English Machine Translation system used for medical data. The European Medicines Agency parallel text corpus was used as the basis for training of neural and statistical network-based translation systems. A comparison and implementation of a medical translator is the main focus of our experiments.

Download Full-text

Handling Unknown Words in Neural Machine Translation System

2020 International Conference on Decision Aid Sciences and Application (DASA) ◽

10.1109/dasa51403.2020.9317169 ◽

2020 ◽

Author(s):

Kamal Deep Garg ◽

Jatin Gupta ◽

Vandana Saini

Keyword(s):

Machine Translation ◽

Translation System ◽

Neural Machine Translation ◽

Machine Translation System ◽

Unknown Words

Download Full-text

Unsupervised Sub-tree Alignment for Tree-to-Tree Translation

Journal of Artificial Intelligence Research ◽

10.1613/jair.4033 ◽

2013 ◽

Vol 48 ◽

pp. 733-782 ◽

Cited By ~ 3

Author(s):

T. Xiao ◽

J. Zhu

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Experimental Results ◽

Alignment Accuracy ◽

Syntactic Structures ◽

Tree Alignment ◽

Matrix Encoding ◽

Word Alignments ◽

Alignment Model ◽

Translation Systems

This article presents a probabilistic sub-tree alignment model and its application to tree-to-tree machine translation. Unlike previous work, we do not resort to surface heuristics or expensive annotated data, but instead derive an unsupervised model to infer the syntactic correspondence between two languages. More importantly, the developed model is syntactically-motivated and does not rely on word alignments. As a by-product, our model outputs a sub-tree alignment matrix encoding a large number of diverse alignments between syntactic structures, from which machine translation systems can efficiently extract translation rules that are often filtered out due to the errors in 1-best alignment. Experimental results show that the proposed approach outperforms three state-of-the-art baseline approaches in both alignment accuracy and grammar quality. When applied to machine translation, our approach yields a +1.0 BLEU improvement and a -0.9 TER reduction on the NIST machine translation evaluation corpora. With tree binarization and fuzzy decoding, it even outperforms a state-of-the-art hierarchical phrase-based system.

Download Full-text

Is automation changing the translation profession?

International Journal of the Sociology of Language ◽

10.1515/ijsl-2020-0015 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Anthony Pym ◽

Ester Torres-Simón

Keyword(s):

Machine Translation ◽

Communication Skills ◽

Service Providers ◽

Time Data ◽

Market Segments ◽

Interactive Communication ◽

Neural Machine Translation ◽

History Of ◽

Translation Services ◽

Translation Systems

Abstract As a language-intensive profession, translation is of frontline interest in the era of language automation. In particular, the development of neural machine translation systems since 2016 has brought with it fears that soon there will be no more human translators. When considered in terms of the history of automation, however, any such direct effect is far from obvious: the translation industry is still growing and machine translation is only one instance of automation. At the same time, data on remuneration indicate structural wage dispersion in professional translation services, with some signs that this dispersion may increase in certain market segments as automated workflows and translation technologies are adopted more by large language-service providers more than by smaller companies and individual freelancers. An analysis of recent changes in discourses on and in the translation profession further indicates conceptual adjustments in the profession that may be attributed to growing automation, particularly with respect to expanding skills set associated with translation, the tendency to combine translation with other forms of communication, and the use of interactive communication skills to authorize and humanize the results of automation.

Download Full-text