Transductive Ensemble Learning for Neural Machine Translation

Yiren Wang; Lijun Wu; Yingce Xia; Tao Qin; ChengXiang Zhai; Tie-Yan Liu

doi:10.1609/aaai.v34i04.6097

Transductive Ensemble Learning for Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6097 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6291-6298

Author(s):

Yiren Wang ◽

Lijun Wu ◽

Yingce Xia ◽

Tao Qin ◽

ChengXiang Zhai ◽

...

Keyword(s):

Machine Translation ◽

Ensemble Learning ◽

State Of The Art ◽

Ensemble Methods ◽

Target Language ◽

Test Set ◽

Neural Machine Translation ◽

Learning Tasks ◽

Marginal Improvement ◽

Source Test

Ensemble learning, which aggregates multiple diverse models for inference, is a common practice to improve the accuracy of machine learning tasks. However, it has been observed that the conventional ensemble methods only bring marginal improvement for neural machine translation (NMT) when individual models are strong or there are a large number of individual models. In this paper, we study how to effectively aggregate multiple NMT models under the transductive setting where the source sentences of the test set are known. We propose a simple yet effective approach named transductive ensemble learning (TEL), in which we use all individual models to translate the source test set into the target language space and then finetune a strong model on the translated synthetic corpus. We conduct extensive experiments on different settings (with/without monolingual data) and different language pairs (English↔{German, Finnish}). The results show that our approach boosts strong individual models with significant improvement and benefits a lot from more individual models. Specifically, we achieve the state-of-the-art performances on the WMT2016-2018 English↔German translations.

Download Full-text

Synchronous Bidirectional Neural Machine Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00256 ◽

2019 ◽

Vol 7 ◽

pp. 91-105 ◽

Cited By ~ 8

Author(s):

Long Zhou ◽

Jiajun Zhang ◽

Chengqing Zong

Keyword(s):

Machine Translation ◽

Large Scale ◽

State Of The Art ◽

Target Language ◽

Single Model ◽

Neural Machine Translation ◽

German Translation ◽

Transformer Model ◽

Target Side ◽

Future Information

Existing approaches to neural machine translation (NMT) generate the target language sequence token-by-token from left to right. However, this kind of unidirectional decoding framework cannot make full use of the target-side future contexts which can be produced in a right-to-left decoding direction, and thus suffers from the issue of unbalanced outputs. In this paper, we introduce a synchronous bidirectional–neural machine translation (SB-NMT) that predicts its outputs using left-to-right and right-to-left decoding simultaneously and interactively, in order to leverage both of the history and future information at the same time. Specifically, we first propose a new algorithm that enables synchronous bidirectional decoding in a single model. Then, we present an interactive decoding model in which left-to-right (right-to-left) generation does not only depend on its previously generated outputs, but also relies on future contexts predicted by right-to-left (left-to-right) decoding. We extensively evaluate the proposed SB-NMT model on large-scale NIST Chinese-English, WMT14 English-German, and WMT18 Russian-English translation tasks. Experimental results demonstrate that our model achieves significant improvements over the strong Transformer model by 3.92, 1.49, and 1.04 BLEU points, respectively, and obtains the state-of-the-art per- formance on Chinese-English and English- German translation tasks. 1

Download Full-text

The Suboptimal WMT Test Sets and Its Impact on Human Parity

10.20944/preprints202110.0199.v1 ◽

2021 ◽

Author(s):

Ahrii Kim ◽

Yunju Bak ◽

Jimin Sun ◽

Sungwon Lyu ◽

Changmin Lee

Keyword(s):

Machine Translation ◽

Web Crawling ◽

Data Set ◽

Test Set ◽

Neural Machine Translation ◽

Sentence Level ◽

Test Sets ◽

Source Test

With the advent of Neural Machine Translation, the more the achievement of human-machine parity is claimed at WMT, the more we come to ask ourselves if their evaluation environment can be trusted. In this paper, we argue that the low quality of the source test set of the news track at WMT may lead to an overrated human parity claim. First of all, we report nine types of so-called technical contaminants in the data set, originated from an absence of meticulous inspection after web-crawling. Our empirical findings show that when they are corrected, about 5% of the segments that have previously achieved a human parity claim turn out to be statistically invalid. Such a tendency gets evident when the contaminated sentences are solely concerned. To the best of our knowledge, it is the first attempt to question the “source” side of the test set as a potential cause of the overclaim of human parity. We cast evidence for such phenomenon that according to sentence-level TER scores, those trivial errors change a good part of system translations. We conclude that to overlook it would be a mistake, especially when it comes to an NMT evaluation.

Download Full-text

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00065 ◽

2017 ◽

Vol 5 ◽

pp. 339-351 ◽

Cited By ~ 159

Author(s):

Melvin Johnson ◽

Mike Schuster ◽

Quoc V. Le ◽

Maxim Krikun ◽

Yonghui Wu ◽

...

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Target Language ◽

Translation System ◽

Single Model ◽

Neural Machine Translation ◽

Comparable Performance ◽

Machine Translation System ◽

Input Sentence ◽

Multiple Languages

We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no changes to the model architecture from a standard NMT system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT systems using a single model. On the WMT’14 benchmarks, a single multilingual model achieves comparable performance for English→French and surpasses state-of-theart results for English→German. Similarly, a single multilingual model surpasses state-of-the-art results for French→English and German→English on WMT’14 and WMT’15 benchmarks, respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. Our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and also show some interesting examples when mixing languages.

Download Full-text

Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation

International Journal of Asian Language Processing ◽

10.1142/s2717554520500174 ◽

2021 ◽

pp. 2050017

Author(s):

Rashmini Naranpanawa ◽

Ravinga Perera ◽

Thilakshi Fonseka ◽

Uthayasanker Thayasivam

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Translation System ◽

Rare Word ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Low Resource ◽

Word Level ◽

Morphologically Rich Languages

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.

Download Full-text

IntroVNMT: An Introspective Model for Variational Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6411 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8830-8837

Author(s):

Xin Sheng ◽

Linli Xu ◽

Junliang Guo ◽

Jingchang Liu ◽

Ruoyu Zhao ◽

...

Keyword(s):

Machine Translation ◽

Latent Variables ◽

Image Synthesis ◽

Target Language ◽

Generative Adversarial Network ◽

Neural Machine Translation ◽

Adversarial Network ◽

Proposed Model ◽

Model Training ◽

High Level

We propose a novel introspective model for variational neural machine translation (IntroVNMT) in this paper, inspired by the recent successful application of introspective variational autoencoder (IntroVAE) in high quality image synthesis. Different from the vanilla variational NMT model, IntroVNMT is capable of improving itself introspectively by evaluating the quality of the generated target sentences according to the high-level latent variables of the real and generated target sentences. As a consequence of introspective training, the proposed model is able to discriminate between the generated and real sentences of the target language via the latent variables generated by the encoder of the model. In this way, IntroVNMT is able to generate more realistic target sentences in practice. In the meantime, IntroVNMT inherits the advantages of the variational autoencoders (VAEs), and the model training process is more stable than the generative adversarial network (GAN) based models. Experimental results on different translation tasks demonstrate that the proposed model can achieve significant improvements over the vanilla variational NMT model.

Download Full-text

Controlling Neural Machine Translation Formality with Synthetic Supervision

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6379 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8568-8575

Author(s):

Xing Niu ◽

Marine Carpuat

Keyword(s):

Machine Translation ◽

Target Language ◽

Sentence Pair ◽

English Sentence ◽

Neural Machine Translation ◽

Source Language ◽

Training Scheme ◽

Training Examples ◽

Language Content ◽

Missing Element

This work aims to produce translations that convey source language content at a formality level that is appropriate for a particular audience. Framing this problem as a neural sequence-to-sequence task ideally requires training triplets consisting of a bilingual sentence pair labeled with target language formality. However, in practice, available training examples are limited to English sentence pairs of different styles, and bilingual parallel sentences of unknown formality. We introduce a novel training scheme for multi-task models that automatically generates synthetic training triplets by inferring the missing element on the fly, thus enabling end-to-end training. Comprehensive automatic and human assessments show that our best model outperforms existing models by producing translations that better match desired formality levels while preserving the source meaning.1

Download Full-text

Enhancing Lexical Translation Consistency for Document-Level Neural Machine Translation

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3485469 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-21

Author(s):

Xiaomian Kang ◽

Yang Zhao ◽

Jiajun Zhang ◽

Chengqing Zong

Keyword(s):

Machine Translation ◽

English Translation ◽

Test Set ◽

Neural Machine Translation ◽

Global Context ◽

Translation Quality ◽

Sentence Level ◽

Document Level

Document-level neural machine translation (DocNMT) has yielded attractive improvements. In this article, we systematically analyze the discourse phenomena in Chinese-to-English translation, and focus on the most obvious ones, namely lexical translation consistency. To alleviate the lexical inconsistency, we propose an effective approach that is aware of the words which need to be translated consistently and constrains the model to produce more consistent translations. Specifically, we first introduce a global context extractor to extract the document context and consistency context, respectively. Then, the two types of global context are integrated into a encoder enhancer and a decoder enhancer to improve the lexical translation consistency. We create a test set to evaluate the lexical consistency automatically. Experiments demonstrate that our approach can significantly alleviate the lexical translation inconsistency. In addition, our approach can also substantially improve the translation quality compared to sentence-level Transformer.

Download Full-text

Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015466 ◽

2019 ◽

Vol 33 ◽

pp. 5466-5473 ◽

Cited By ~ 2

Author(s):

Yingce Xia ◽

Tianyu He ◽

Xu Tan ◽

Fei Tian ◽

Di He ◽

...

Keyword(s):

Machine Translation ◽

English Translation ◽

State Of The Art ◽

Compact Model ◽

Word Embeddings ◽

Simple Method ◽

Neural Machine Translation ◽

German Translation ◽

One Step ◽

Target Side

Sharing source and target side vocabularies and word embeddings has been a popular practice in neural machine translation (briefly, NMT) for similar languages (e.g., English to French or German translation). The success of such wordlevel sharing motivates us to move one step further: we consider model-level sharing and tie the whole parts of the encoder and decoder of an NMT model. We share the encoder and decoder of Transformer (Vaswani et al. 2017), the state-of-the-art NMT model, and obtain a compact model named Tied Transformer. Experimental results demonstrate that such a simple method works well for both similar and dissimilar language pairs. We empirically verify our framework for both supervised NMT and unsupervised NMT: we achieve a 35.52 BLEU score on IWSLT 2014 German to English translation, 28.98/29.89 BLEU scores on WMT 2014 English to German translation without/with monolingual data, and a 22.05 BLEU score on WMT 2016 unsupervised German to English translation.

Download Full-text

Deep Learning-based Roman-Urdu to Urdu Transliteration

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421520017 ◽

2020 ◽

pp. 2152001

Author(s):

Mehreen Alam ◽

Sibt ul Hussain

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Research Problem ◽

Attention Mechanism ◽

Data Driven ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Source Language ◽

Data Driven Approach ◽

Modern Machine

Attention-based encoder-decoder models have superseded conventional techniques due to their unmatched performance on many neural machine translation problems. Usually, the encoders and decoders are two recurrent neural networks where the decoder is directed to focus on relevant parts of the source language using attention mechanism. This data-driven approach leads to generic and scalable solutions with no reliance on manual hand-crafted features. To the best of our knowledge, none of the modern machine translation approaches has been applied to address the research problem of Urdu machine transliteration. Ours is the first attempt to apply the deep neural network-based encoder-decoder using attention mechanism to address the aforementioned problem using Roman-Urdu and Urdu parallel corpus. To this end, we present (i) the first ever Roman-Urdu to Urdu parallel corpus of 1.1 million sentences, (ii) three state of the art encoder-decoder models, and (iii) a detailed empirical analysis of these three models on the Roman-Urdu to Urdu parallel corpus. Overall, attention-based model gives state-of-the-art performance with the benchmark of 70 BLEU score. Our qualitative experimental evaluation shows that our models generate coherent transliterations which are grammatically and logically correct.

Download Full-text

Improving Neural Machine Translation by Filtering Synthetic Parallel Data

Entropy ◽

10.3390/e21121213 ◽

2019 ◽

Vol 21 (12) ◽

pp. 1213

Author(s):

Guanghao Xu ◽

Youngjoong Ko ◽

Jungyun Seo

Keyword(s):

Machine Translation ◽

Synthetic Data ◽

Similarity Score ◽

Target Language ◽

Neural Machine Translation ◽

Novel Approach ◽

Parallel Data ◽

Back Translation ◽

Translation Errors ◽

Training State

Synthetic data has been shown to be effective in training state-of-the-art neural machine translation (NMT) systems. Because the synthetic data is often generated by back-translating monolingual data from the target language into the source language, it potentially contains a lot of noise—weakly paired sentences or translation errors. In this paper, we propose a novel approach to filter this noise from synthetic data. For each sentence pair of the synthetic data, we compute a semantic similarity score using bilingual word embeddings. By selecting sentence pairs according to these scores, we obtain better synthetic parallel data. Experimental results on the IWSLT 2017 Korean→English translation task show that despite using much less data, our method outperforms the baseline NMT system with back-translation by up to 0.72 and 0.62 Bleu points for tst2016 and tst2017, respectively.

Download Full-text