Philipp Koehn: Neural Machine Translation

Machine Translation ◽

10.1007/s10590-021-09277-x ◽

2021 ◽

Author(s):

Wandri Jooste ◽

Rejwanul Haque ◽

Andy Way

Keyword(s):

Deep Learning ◽

Machine Translation ◽

Statistical Machine Translation ◽

New Techniques ◽

Neural Machine Translation ◽

Postgraduate Students ◽

Learning Techniques ◽

Computational Aspects ◽

To Come ◽

Dominant Paradigm

AbstractNeural machine translation (NMT) is an approach to machine translation (MT) that uses deep learning techniques, a broad area of machine learning based on deep artificial neural networks (NNs). The book Neural Machine Translation by Philipp Koehn targets a broad range of readers including researchers, scientists, academics, advanced undergraduate or postgraduate students, and users of MT, covering wider topics including fundamental and advanced neural network-based learning techniques and methodologies used to develop NMT systems. The book demonstrates different linguistic and computational aspects in terms of NMT with the latest practices and standards and investigates problems relating to NMT. Having read this book, the reader should be able to formulate, design, implement, critically assess and evaluate some of the fundamental and advanced deep learning techniques and methods used for MT. Koehn himself notes that he was somewhat overtaken by events, as originally this book was envisaged only as a chapter in a revised, extended version of his 2009 book Statistical Machine Translation. However, in the interim, NMT completely overtook this previously dominant paradigm, and this new book is likely to serve as the reference of note for the field for some time to come, despite the fact that new techniques are coming onstream all the time.

Download Full-text

Analyses and Modeling of Neural Machine Translation for English-to-Khasi

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b3175.079220 ◽

2020 ◽

Vol 9 (2) ◽

pp. 115-118

Keyword(s):

Deep Learning ◽

Machine Translation ◽

Statistical Machine Translation ◽

Language Barrier ◽

Attention Mechanism ◽

Training Dataset ◽

Learning Approach ◽

Neural Machine Translation ◽

Community Or ◽

North East

Language barrier is a common issue faced by humans who move from one community or group to another. Statistical machine translation has enabled us to solve this issue to a certain extent, by formulating models to translate text from one language to another. Statistical machine translation has come a long way but they have their limitations in terms of translating words that belongs to an entirely different context that is not available in the training dataset. This has paved way for neural Machine Translation (NMT), a deep learning approach in solving sequence to sequence translation. Khasi is a language popularly spoken in Meghalaya, a north-east state in India. Its wide and unexplored. In this paper we will discuss about the modeling and analyzing of a NMT base model and a NMT model using Attention mechanism for English to Khasi.

Download Full-text

Optical Character Recognition and Neural Machine Translation Using Deep Learning Techniques

Innovations in Computer Science and Engineering - Lecture Notes in Networks and Systems ◽

10.1007/978-981-33-4543-0_30 ◽

2021 ◽

pp. 277-283

Author(s):

K. Chandra Shekar ◽

Maria Anisha Cross ◽

Vignesh Vasudevan

Keyword(s):

Deep Learning ◽

Machine Translation ◽

Character Recognition ◽

Optical Character Recognition ◽

Neural Machine Translation ◽

Optical Character ◽

Learning Techniques

Download Full-text

Comparing Statistical and Neural Machine Translation Performance on Hindi-to-Tamil and English-to-Tamil

10.20944/preprints202012.0580.v1 ◽

2020 ◽

Author(s):

Akshai Ramesh ◽

Venkatesh Balavadhani Parthasarathy ◽

Andy Way ◽

Rejwanul Haque

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Learning Approaches ◽

Neural Models ◽

Neural Machine Translation ◽

Low Resource ◽

Domain Specific ◽

Evaluation Scheme ◽

Media Platform ◽

Dominant Paradigm

Statistical machine translation (SMT) which was the dominant paradigm in machine translation (MT) research for nearly three decades has recently been superseded by the end-to-end deep learning approaches to MT. Although deep neural models produce state-of-the-art results in many translation tasks, they are found to under-perform on resource-poor scenarios. Despite some success, none of the present-day benchmarks that have tried to overcome this problem can be regarded as a universal solution to the problem of translation of many low-resource languages. In this work, we investigate the performance of phrase-based SMT (PB-SMT) and NMT on two rarely-tested low-resource language-pairs, English-to-Tamil and Hindi-to-Tamil, taking a specialised data domain (software localisation) into consideration. This paper demonstrates our findings including the identification of several issues of the current neural approaches to low-resource domain-specific text translation and rankings of our MT systems via a social media platform-based human evaluation scheme.

Download Full-text

Comparing Statistical and Neural Machine Translation Performance on Hindi-To-Tamil and English-To-Tamil

Digital ◽

10.3390/digital1020007 ◽

2021 ◽

Vol 1 (2) ◽

pp. 86-102

Author(s):

Akshai Ramesh ◽

Venkatesh Balavadhani Parthasarathy ◽

Rejwanul Haque ◽

Andy Way

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Training Data ◽

Research Translation ◽

Neural Machine Translation ◽

Low Resource ◽

Evaluation Scheme ◽

Dominant Paradigm ◽

Target Side

Phrase-based statistical machine translation (PB-SMT) has been the dominant paradigm in machine translation (MT) research for more than two decades. Deep neural MT models have been producing state-of-the-art performance across many translation tasks for four to five years. To put it another way, neural MT (NMT) took the place of PB-SMT a few years back and currently represents the state-of-the-art in MT research. Translation to or from under-resourced languages has been historically seen as a challenging task. Despite producing state-of-the-art results in many translation tasks, NMT still poses many problems such as performing poorly for many low-resource language pairs mainly because of its learning task’s data-demanding nature. MT researchers have been trying to address this problem via various techniques, e.g., exploiting source- and/or target-side monolingual data for training, augmenting bilingual training data, and transfer learning. Despite some success, none of the present-day benchmarks have entirely overcome the problem of translation in low-resource scenarios for many languages. In this work, we investigate the performance of PB-SMT and NMT on two rarely tested under-resourced language pairs, English-to-Tamil and Hindi-to-Tamil, taking a specialised data domain into consideration. This paper demonstrates our findings and presents results showing the rankings of our MT systems produced via a social media-based human evaluation scheme.

Download Full-text

A Review of Neural Machine Translation based on Deep learning techniques

10.1109/upcon52273.2021.9667560 ◽

2021 ◽

Author(s):

Sonali Sharma ◽

Manoj Diwakar ◽

Prabhishek Singh ◽

Amrendra Tripathi ◽

Chandrakala Arya ◽

...

Keyword(s):

Deep Learning ◽

Machine Translation ◽

Neural Machine Translation ◽

Learning Techniques

Download Full-text

Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation

International Journal of Asian Language Processing ◽

10.1142/s2717554520500174 ◽

2021 ◽

pp. 2050017

Author(s):

Rashmini Naranpanawa ◽

Ravinga Perera ◽

Thilakshi Fonseka ◽

Uthayasanker Thayasivam

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Translation System ◽

Rare Word ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Low Resource ◽

Word Level ◽

Morphologically Rich Languages

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.

Download Full-text

Neural Machine Translation for Turkish to English Using Deep Learning

Digital Interaction and Machine Intelligence - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-030-74728-2_1 ◽

2021 ◽

pp. 3-9

Author(s):

Fatih Balki ◽

Hilmi Demirhan ◽

Salih Sarp

Keyword(s):

Deep Learning ◽

Machine Translation ◽

Neural Machine Translation

Download Full-text

Integration of a Multilingual Preordering Component into a Commercial SMT Platform

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0009 ◽

2017 ◽

Vol 108 (1) ◽

pp. 61-72

Author(s):

Anita Ramm ◽

Riccardo Superbo ◽

Dimitar Shterionov ◽

Tony O’Dowd ◽

Alexander Fraser

Keyword(s):

Open Source ◽

Machine Translation ◽

Long Range ◽

Significant Role ◽

Processing Speed ◽

Statistical Machine Translation ◽

Neural Machine Translation ◽

Open Source Tool

AbstractWe present a multilingual preordering component tailored for a commercial Statistical Machine translation platform. In commercial settings, issues such as processing speed as well as the ability to adapt models to the customers’ needs play a significant role and have a big impact on the choice of approaches that are added to the custom pipeline to deal with specific problems such as long-range reorderings.We developed a fast and customisable preordering component, also available as an open-source tool, which comes along with a generic implementation that is restricted neither to the translation platform nor to the Machine Translation paradigm. We test preordering on three language pairs: English →Japanese/German/Chinese for both Statistical Machine Translation (SMT) and Neural Machine Translation (NMT). Our experiments confirm previously reported improvements in the SMT output when the models are trained on preordered data, but they also show that preordering does not improve NMT.

Download Full-text

Translation of Medical Texts using Neural Networks

International Journal of Reliable and Quality E-Healthcare ◽

10.4018/ijrqeh.2016100104 ◽

2016 ◽

Vol 5 (4) ◽

pp. 51-66 ◽

Cited By ~ 5

Author(s):

Krzysztof Wolk ◽

Krzysztof P. Marasek

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

European Medicines Agency ◽

Translation System ◽

Training Methods ◽

Neural Machine Translation ◽

Machine Translation System ◽

Source Sentence ◽

Parallel Text ◽

Translation Systems

The quality of machine translation is rapidly evolving. Today one can find several machine translation systems on the web that provide reasonable translations, although the systems are not perfect. In some specific domains, the quality may decrease. A recently proposed approach to this domain is neural machine translation. It aims at building a jointly-tuned single neural network that maximizes translation performance, a very different approach from traditional statistical machine translation. Recently proposed neural machine translation models often belong to the encoder-decoder family in which a source sentence is encoded into a fixed length vector that is, in turn, decoded to generate a translation. The present research examines the effects of different training methods on a Polish-English Machine Translation system used for medical data. The European Medicines Agency parallel text corpus was used as the basis for training of neural and statistical network-based translation systems. A comparison and implementation of a medical translator is the main focus of our experiments.

Download Full-text

Neural Machine Translation Advised by Statistical Machine Translation: The Case of Farsi-Spanish Bilingually Low-Resource Scenario

2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) ◽

10.1109/icmla.2018.00196 ◽

2018 ◽

Cited By ~ 2

Author(s):

Benyamin Ahmadnia ◽

Parisa Kordjamshidi ◽

Gholamreza Haffari

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Neural Machine Translation ◽

Low Resource

Download Full-text