Pre-Reordering for Neural Machine Translation: Helpful or Harmful?

AbstractPre-reordering, a preprocessing to make the source-side word orders close to those of the target side, has been proven very helpful for statistical machine translation (SMT) in improving translation quality. However, is it the case in neural machine translation (NMT)? In this paper, we firstly investigate the impact of pre-reordered source-side data on NMT, and then propose to incorporate features for the pre-reordering model in SMT as input factors into NMT (factored NMT). The features, namely parts-of-speech (POS), word class and reordered index, are encoded as feature vectors and concatenated to the word embeddings to provide extra knowledge for NMT. Pre-reordering experiments conducted on Japanese↔English and Chinese↔English show that pre-reordering the source-side data for NMT is redundant and NMT models trained on pre-reordered data deteriorate translation performance. However, factored NMT using SMT-based pre-reordering features on Japanese→English and Chinese→English is beneficial and can further improve by 4.48 and 5.89 relative BLEU points, respectively, compared to the baseline NMT system.

Download Full-text

Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015466 ◽

2019 ◽

Vol 33 ◽

pp. 5466-5473 ◽

Cited By ~ 2

Author(s):

Yingce Xia ◽

Tianyu He ◽

Xu Tan ◽

Fei Tian ◽

Di He ◽

...

Keyword(s):

Machine Translation ◽

English Translation ◽

State Of The Art ◽

Compact Model ◽

Word Embeddings ◽

Simple Method ◽

Neural Machine Translation ◽

German Translation ◽

One Step ◽

Target Side

Sharing source and target side vocabularies and word embeddings has been a popular practice in neural machine translation (briefly, NMT) for similar languages (e.g., English to French or German translation). The success of such wordlevel sharing motivates us to move one step further: we consider model-level sharing and tie the whole parts of the encoder and decoder of an NMT model. We share the encoder and decoder of Transformer (Vaswani et al. 2017), the state-of-the-art NMT model, and obtain a compact model named Tied Transformer. Experimental results demonstrate that such a simple method works well for both similar and dissimilar language pairs. We empirically verify our framework for both supervised NMT and unsupervised NMT: we achieve a 35.52 BLEU score on IWSLT 2014 German to English translation, 28.98/29.89 BLEU scores on WMT 2014 English to German translation without/with monolingual data, and a 22.05 BLEU score on WMT 2016 unsupervised German to English translation.

Download Full-text

Topic-Based Dissimilarity and Sensitivity Models for Translation Rule Selection

Journal of Artificial Intelligence Research ◽

10.1613/jair.4265 ◽

2014 ◽

Vol 50 ◽

pp. 1-30 ◽

Cited By ~ 3

Author(s):

M. Zhang ◽

X. Xiao ◽

D. Xiong ◽

Q. Liu

Keyword(s):

Machine Translation ◽

Topic Model ◽

Statistical Machine Translation ◽

Model Space ◽

Target Language ◽

Translation Quality ◽

Rule Selection ◽

Translation Rule ◽

Selection Experiments ◽

Target Side

Translation rule selection is a task of selecting appropriate translation rules for an ambiguous source-language segment. As translation ambiguities are pervasive in statistical machine translation, we introduce two topic-based models for translation rule selection which incorporates global topic information into translation disambiguation. We associate each synchronous translation rule with source- and target-side topic distributions.With these topic distributions, we propose a topic dissimilarity model to select desirable (less dissimilar) rules by imposing penalties for rules with a large value of dissimilarity of their topic distributions to those of given documents. In order to encourage the use of non-topic specific translation rules, we also present a topic sensitivity model to balance translation rule selection between generic rules and topic-specific rules. Furthermore, we project target-side topic distributions onto the source-side topic model space so that we can benefit from topic information of both the source and target language. We integrate the proposed topic dissimilarity and sensitivity model into hierarchical phrase-based machine translation for synchronous translation rule selection. Experiments show that our topic-based translation rule selection model can substantially improve translation quality.

Download Full-text

Low Resource Neural Machine Translation: Assamese to/from Other Indo-Aryan (Indic) Languages

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3469721 ◽

2022 ◽

Vol 21 (1) ◽

pp. 1-32

Author(s):

Rupjyoti Baruah ◽

Rajesh Kumar Mundotiya ◽

Anil Kumar Singh

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Basic Sequence ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Translation Quality ◽

Evaluation Scores ◽

Language Characteristics ◽

The Given ◽

Family Trees

Machine translation (MT) systems have been built using numerous different techniques for bridging the language barriers. These techniques are broadly categorized into approaches like Statistical Machine Translation (SMT) and Neural Machine Translation (NMT). End-to-end NMT systems significantly outperform SMT in translation quality on many language pairs, especially those with the adequate parallel corpus. We report comparative experiments on baseline MT systems for Assamese to other Indo-Aryan languages (in both translation directions) using the traditional Phrase-Based SMT as well as some more successful NMT architectures, namely basic sequence-to-sequence model with attention, Transformer, and finetuned Transformer. The results are evaluated using the most prominent and popular standard automatic metric BLEU (BiLingual Evaluation Understudy), as well as other well-known metrics for exploring the performance of different baseline MT systems, since this is the first such work involving Assamese. The evaluation scores are compared for SMT and NMT models for the effectiveness of bi-directional language pairs involving Assamese and other Indo-Aryan languages (Bangla, Gujarati, Hindi, Marathi, Odia, Sinhalese, and Urdu). The highest BLEU scores obtained are for Assamese to Sinhalese for SMT (35.63) and the Assamese to Bangla for NMT systems (seq2seq is 50.92, Transformer is 50.01, and finetuned Transformer is 50.19). We also try to relate the results with the language characteristics, distances, family trees, domains, data sizes, and sentence lengths. We find that the effect of the domain is the most important factor affecting the results for the given data domains and sizes. We compare our results with the only existing MT system for Assamese (Bing Translator) and also with pairs involving Hindi.

Download Full-text

Unsupervised Translation Quality Estimation for Digital Entertainment Content Subtitles

International Journal of Semantic Computing ◽

10.1142/s1793351x20500026 ◽

2020 ◽

Vol 14 (01) ◽

pp. 137-151

Author(s):

Prabhakar Gupta ◽

Mayank Sharma

Keyword(s):

Machine Translation ◽

Experimental Results ◽

Parallel Translation ◽

Word Embeddings ◽

Quality Estimation ◽

Neural Machine Translation ◽

Intensity Index ◽

Translation Quality ◽

Digital Entertainment ◽

Reference Corpus

We demonstrate the potential for using aligned bilingual word embeddings in developing an unsupervised method to evaluate machine translations without a need for parallel translation corpus or reference corpus. We explain different aspects of digital entertainment content subtitles. We share our experimental results for four languages pairs English to French, German, Portuguese, Spanish, and present findings on the shortcomings of Neural Machine Translation for subtitles. We propose several improvements over the system designed by Gupta et al. [P. Gupta, S. Shekhawat and K. Kumar, Unsupervised quality estimation without reference corpus for subtitle machine translation using word embeddings, IEEE 13th Int. Conf. Semantic Computing, 2019, pp. 32–38.] by incorporating custom embedding model curated to subtitles, compound word splits and punctuation inclusion. We show a massive run time improvement of the order of [Formula: see text] by considering three types of edits, removing Proximity Intensity Index (PII) and changing post-edit score calculation from their system.

Download Full-text

Making sense of neural machine translation

Translation Spaces ◽

10.1075/ts.6.2.06for ◽

2017 ◽

Vol 6 (2) ◽

pp. 291-309 ◽

Cited By ~ 11

Author(s):

Mikel L. Forcada

Keyword(s):

Machine Translation ◽

New Technology ◽

Statistical Machine Translation ◽

Software Requirements ◽

Word Embeddings ◽

Neural Machine Translation ◽

Training Time ◽

Making Sense ◽

Translation Systems ◽

New Machine

Abstract The last few years have witnessed a surge in the interest of a new machine translation paradigm: neural machine translation (NMT). Neural machine translation is starting to displace its corpus-based predecessor, statistical machine translation (SMT). In this paper, I introduce NMT, and explain in detail, without the mathematical complexity, how neural machine translation systems work, how they are trained, and their main differences with SMT systems. The paper will try to decipher NMT jargon such as “distributed representations”, “deep learning”, “word embeddings”, “vectors”, “layers”, “weights”, “encoder”, “decoder”, and “attention”, and build upon these concepts, so that individual translators and professionals working for the translation industry as well as students and academics in translation studies can make sense of this new technology and know what to expect from it. Aspects such as how NMT output differs from SMT, and the hardware and software requirements of NMT, both at training time and at run time, on the translation industry, will be discussed.

Download Full-text

Evaluation of English–Slovak Neural and Statistical Machine Translation

Applied Sciences ◽

10.3390/app11072948 ◽

2021 ◽

Vol 11 (7) ◽

pp. 2948

Author(s):

Lucia Benkova ◽

Dasa Munkova ◽

Ľubomír Benko ◽

Michal Munk

Keyword(s):

Machine Translation ◽

Statistical Approach ◽

Statistical Machine Translation ◽

Specific Domain ◽

Neural Network Approach ◽

Neural Machine Translation ◽

Translation Quality ◽

The Neural Network ◽

Language Pair

This study is focused on the comparison of phrase-based statistical machine translation (SMT) systems and neural machine translation (NMT) systems using automatic metrics for translation quality evaluation for the language pair of English and Slovak. As the statistical approach is the predecessor of neural machine translation, it was assumed that the neural network approach would generate results with a better quality. An experiment was performed using residuals to compare the scores of automatic metrics of the accuracy (BLEU_n) of the statistical machine translation with those of the neural machine translation. The results showed that the assumption of better neural machine translation quality regardless of the system used was confirmed. There were statistically significant differences between the SMT and NMT in favor of the NMT based on all BLEU_n scores. The neural machine translation achieved a better quality of translation of journalistic texts from English into Slovak, regardless of if it was a system trained on general texts, such as Google Translate, or specific ones, such as the European Commission’s (EC’s) tool, which was trained on a specific-domain.

Download Full-text

Automatic evaluation of the quality of machine translation of a scientific text: the results of a five-year-long experiment

E3S Web of Conferences ◽

10.1051/e3sconf/202128408001 ◽

2021 ◽

Vol 284 ◽

pp. 08001

Author(s):

Ilya Ulitkin ◽

Irina Filippova ◽

Natalia Ivanova ◽

Alexey Poroykov

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Dramatic Improvement ◽

Automatic Evaluation ◽

Neural Machine Translation ◽

Translation Quality ◽

Automatic Translation ◽

Translation Systems ◽

Qualitative Changes

We report on various approaches to automatic evaluation of machine translation quality and describe three widely used methods. These methods, i.e. methods based on string matching and n-gram models, make it possible to compare the quality of machine translation to reference translation. We employ modern metrics for automatic evaluation of machine translation quality such as BLEU, F-measure, and TER to compare translations made by Google and PROMT neural machine translation systems with translations obtained 5 years ago, when statistical machine translation and rule-based machine translation algorithms were employed by Google and PROMT, respectively, as the main translation algorithms [6]. The evaluation of the translation quality of candidate texts generated by Google and PROMT with reference translation using an automatic translation evaluation program reveal significant qualitative changes as compared with the results obtained 5 years ago, which indicate a dramatic improvement in the work of the above-mentioned online translation systems. Ways to improve the quality of machine translation are discussed. It is shown that modern systems of automatic evaluation of translation quality allow errors made by machine translation systems to be identified and systematized, which will enable the improvement of the quality of translation by these systems in the future.

Download Full-text

Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5351 ◽

2020 ◽

Vol 34 (01) ◽

pp. 198-205

Author(s):

Chenze Shao ◽

Jinchao Zhang ◽

Yang Feng ◽

Fandong Meng ◽

Jie Zhou

Keyword(s):

Machine Translation ◽

Cross Entropy ◽

Sequential Dependency ◽

Weak Correlation ◽

Neural Machine Translation ◽

Translation Quality ◽

Word Level ◽

Translation Errors ◽

Training Objective ◽

Target Side

Non-Autoregressive Neural Machine Translation (NAT) achieves significant decoding speedup through generating target words independently and simultaneously. However, in the context of non-autoregressive translation, the word-level cross-entropy loss cannot model the target-side sequential dependency properly, leading to its weak correlation with the translation quality. As a result, NAT tends to generate influent translations with over-translation and under-translation errors. In this paper, we propose to train NAT to minimize the Bag-of-Ngrams (BoN) difference between the model output and the reference sentence. The bag-of-ngrams training objective is differentiable and can be efficiently calculated, which encourages NAT to capture the target-side sequential dependency and correlates well with the translation quality. We validate our approach on three translation tasks and show that our approach largely outperforms the NAT baseline by about 5.0 BLEU scores on WMT14 En↔De and about 2.5 BLEU scores on WMT16 En↔Ro.

Download Full-text

A SYSTEMATIC READING IN STATISTICAL TRANSLATION: FROM THE STATISTICAL MACHINE TRANSLATION TO THE NEURAL TRANSLATION MODELS.

Journal of Information and Communication Technology ◽

10.32890/jict2017.16.2.8239 ◽

2017 ◽

Author(s):

Zakaria El Maazouzi ◽

Badr Eddine EL Mohajir ◽

Mohammed Al Achhab

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

State Of The Art ◽

Statistical Machine Translation ◽

High Accuracy ◽

Neural Machine Translation ◽

Translation Quality ◽

Automatic Translation

Achieving high accuracy in automatic translation tasks has been one of the challenging goals for researchers in the area of machine translation since decades. Thus, the eagerness of exploring new possible ways to improve machine translation was always the matter for researchers in the field. Automatic translation as a key application in the natural language processing domain has developed many approaches, namely statistical machine translation and recently neural machine translation that improved largely the translation quality especially for Latin languages. They have even made it possible for the translation of some language pairs to approach human translation quality. In this paper, we present a survey of the state of the art of statistical translation, where we describe the different existing methodologies, and we overview the recent research studies while pointing out the main strengths and limitations of the different approaches.

Download Full-text

Towards a Better Integration of Fuzzy Matches in Neural Machine Translation through Data Augmentation

Informatics ◽

10.3390/informatics8010007 ◽

2021 ◽

Vol 8 (1) ◽

pp. 7

Author(s):

Arda Tezcan ◽

Bram Bulté ◽

Bram Vanroy

Keyword(s):

Machine Translation ◽

Data Augmentation ◽

Sentence Length ◽

Added Value ◽

Neural Machine Translation ◽

Combination Technique ◽

Translation Quality ◽

Fuzzy Match ◽

The Impact ◽

Matching Techniques

We identify a number of aspects that can boost the performance of Neural Fuzzy Repair (NFR), an easy-to-implement method to integrate translation memory matches and neural machine translation (NMT). We explore various ways of maximising the added value of retrieved matches within the NFR paradigm for eight language combinations, using Transformer NMT systems. In particular, we test the impact of different fuzzy matching techniques, sub-word-level segmentation methods and alignment-based features on overall translation quality. Furthermore, we propose a fuzzy match combination technique that aims to maximise the coverage of source words. This is supplemented with an analysis of how translation quality is affected by input sentence length and fuzzy match score. The results show that applying a combination of the tested modifications leads to a significant increase in estimated translation quality over all baselines for all language combinations.

Download Full-text