Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015466 ◽

2019 ◽

Vol 33 ◽

pp. 5466-5473 ◽

Cited By ~ 2

Author(s):

Yingce Xia ◽

Tianyu He ◽

Xu Tan ◽

Fei Tian ◽

Di He ◽

...

Keyword(s):

Machine Translation ◽

English Translation ◽

State Of The Art ◽

Compact Model ◽

Word Embeddings ◽

Simple Method ◽

Neural Machine Translation ◽

German Translation ◽

One Step ◽

Target Side

Sharing source and target side vocabularies and word embeddings has been a popular practice in neural machine translation (briefly, NMT) for similar languages (e.g., English to French or German translation). The success of such wordlevel sharing motivates us to move one step further: we consider model-level sharing and tie the whole parts of the encoder and decoder of an NMT model. We share the encoder and decoder of Transformer (Vaswani et al. 2017), the state-of-the-art NMT model, and obtain a compact model named Tied Transformer. Experimental results demonstrate that such a simple method works well for both similar and dissimilar language pairs. We empirically verify our framework for both supervised NMT and unsupervised NMT: we achieve a 35.52 BLEU score on IWSLT 2014 German to English translation, 28.98/29.89 BLEU scores on WMT 2014 English to German translation without/with monolingual data, and a 22.05 BLEU score on WMT 2016 unsupervised German to English translation.

Download Full-text

Synchronous Bidirectional Neural Machine Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00256 ◽

2019 ◽

Vol 7 ◽

pp. 91-105 ◽

Cited By ~ 8

Author(s):

Long Zhou ◽

Jiajun Zhang ◽

Chengqing Zong

Keyword(s):

Machine Translation ◽

Large Scale ◽

State Of The Art ◽

Target Language ◽

Single Model ◽

Neural Machine Translation ◽

German Translation ◽

Transformer Model ◽

Target Side ◽

Future Information

Existing approaches to neural machine translation (NMT) generate the target language sequence token-by-token from left to right. However, this kind of unidirectional decoding framework cannot make full use of the target-side future contexts which can be produced in a right-to-left decoding direction, and thus suffers from the issue of unbalanced outputs. In this paper, we introduce a synchronous bidirectional–neural machine translation (SB-NMT) that predicts its outputs using left-to-right and right-to-left decoding simultaneously and interactively, in order to leverage both of the history and future information at the same time. Specifically, we first propose a new algorithm that enables synchronous bidirectional decoding in a single model. Then, we present an interactive decoding model in which left-to-right (right-to-left) generation does not only depend on its previously generated outputs, but also relies on future contexts predicted by right-to-left (left-to-right) decoding. We extensively evaluate the proposed SB-NMT model on large-scale NIST Chinese-English, WMT14 English-German, and WMT18 Russian-English translation tasks. Experimental results demonstrate that our model achieves significant improvements over the strong Transformer model by 3.92, 1.49, and 1.04 BLEU points, respectively, and obtains the state-of-the-art per- formance on Chinese-English and English- German translation tasks. 1

Download Full-text

Pre-Reordering for Neural Machine Translation: Helpful or Harmful?

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0018 ◽

2017 ◽

Vol 108 (1) ◽

pp. 171-182 ◽

Cited By ~ 5

Author(s):

Jinhua Du ◽

Andy Way

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Class ◽

Word Embeddings ◽

Neural Machine Translation ◽

Parts Of Speech ◽

Translation Quality ◽

The Impact ◽

Japanese English ◽

Target Side

AbstractPre-reordering, a preprocessing to make the source-side word orders close to those of the target side, has been proven very helpful for statistical machine translation (SMT) in improving translation quality. However, is it the case in neural machine translation (NMT)? In this paper, we firstly investigate the impact of pre-reordered source-side data on NMT, and then propose to incorporate features for the pre-reordering model in SMT as input factors into NMT (factored NMT). The features, namely parts-of-speech (POS), word class and reordered index, are encoded as feature vectors and concatenated to the word embeddings to provide extra knowledge for NMT. Pre-reordering experiments conducted on Japanese↔English and Chinese↔English show that pre-reordering the source-side data for NMT is redundant and NMT models trained on pre-reordered data deteriorate translation performance. However, factored NMT using SMT-based pre-reordering features on Japanese→English and Chinese→English is beneficial and can further improve by 4.48 and 5.89 relative BLEU points, respectively, compared to the baseline NMT system.

Download Full-text

Neural Machine Translation with Joint Representation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6344 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8285-8292

Author(s):

Yanyang Li ◽

Qiang Wang ◽

Tong Xiao ◽

Tongran Liu ◽

Jingbo Zhu

Keyword(s):

Machine Translation ◽

English Translation ◽

Large Scale ◽

State Of The Art ◽

Statistical Machine Translation ◽

The State ◽

Small Scale ◽

Neural Machine Translation ◽

Joint Representation

Though early successes of Statistical Machine Translation (SMT) systems are attributed in part to the explicit modelling of the interaction between any two source and target units, e.g., alignment, the recent Neural Machine Translation (NMT) systems resort to the attention which partially encodes the interaction for efficiency. In this paper, we employ Joint Representation that fully accounts for each possible interaction. We sidestep the inefficiency issue by refining representations with the proposed efficient attention operation. The resulting Reformer models offer a new Sequence-to-Sequence modelling paradigm besides the Encoder-Decoder framework and outperform the Transformer baseline in either the small scale IWSLT14 German-English, English-German and IWSLT15 Vietnamese-English or the large scale NIST12 Chinese-English translation tasks by about 1 BLEU point. We also propose a systematic model scaling approach, allowing the Reformer model to beat the state-of-the-art Transformer in IWSLT14 German-English and NIST12 Chinese-English with about 50% fewer parameters. The code is publicly available at https://github.com/lyy1994/reformer.

Download Full-text

Explicit Sentence Compression for Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6347 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8311-8318

Author(s):

Zuchao Li ◽

Rui Wang ◽

Kehai Chen ◽

Masao Utiyama ◽

Eiichiro Sumita ◽

...

Keyword(s):

Machine Translation ◽

State Of The Art ◽

General Information ◽

Compression Method ◽

Neural Machine Translation ◽

Sentence Compression ◽

French And English ◽

Source Sentence ◽

Empirical Tests ◽

Target Side

State-of-the-art Transformer-based neural machine translation (NMT) systems still follow a standard encoder-decoder framework, in which source sentence representation can be well done by an encoder with self-attention mechanism. Though Transformer-based encoder may effectively capture general information in its resulting source sentence representation, the backbone information, which stands for the gist of a sentence, is not specifically focused on. In this paper, we propose an explicit sentence compression method to enhance the source sentence representation for NMT. In practice, an explicit sentence compression goal used to learn the backbone information in a sentence. We propose three ways, including backbone source-side fusion, target-side fusion, and both-side fusion, to integrate the compressed sentence into NMT. Our empirical tests on the WMT English-to-French and English-to-German translation tasks show that the proposed sentence compression method significantly improves the translation performances over strong baselines.

Download Full-text

ME-MD: An Effective Framework for Neural Machine Translation with Multiple Encoders and Decoders

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/474 ◽

2017 ◽

Cited By ~ 1

Author(s):

Jinchao Zhang ◽

Qun Liu ◽

Jie Zhou

Keyword(s):

Machine Translation ◽

English Translation ◽

State Of The Art ◽

The State ◽

Neural Machine Translation ◽

Source Sentence ◽

Target Words

The encoder-decoder neural framework is widely employed for Neural Machine Translation (NMT) with a single encoder to represent the source sentence and a single decoder to generate target words. The translation performance heavily relies on the representation ability of the encoder and the generation ability of the decoder. To further enhance NMT, we propose to extend the original encoder-decoder framework to a novel one, which has multiple encoders and decoders (ME-MD). Through this way, multiple encoders extract more diverse features to represent the source sequence and multiple decoders capture more complicated translation knowledge. Our proposed ME-MD framework is convenient to integrate heterogeneous encoders and decoders with multiple depths and multiple types. Experiment on Chinese-English translation task shows that our ME-MD system surpasses the state-of-the-art NMT system by 2.1 BLEU points and surpasses the phrase-based Moses by 7.38 BLEU points. Our framework is general and can be applied to other sequence to sequence tasks.

Download Full-text

Comparing Statistical and Neural Machine Translation Performance on Hindi-To-Tamil and English-To-Tamil

Digital ◽

10.3390/digital1020007 ◽

2021 ◽

Vol 1 (2) ◽

pp. 86-102

Author(s):

Akshai Ramesh ◽

Venkatesh Balavadhani Parthasarathy ◽

Rejwanul Haque ◽

Andy Way

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Training Data ◽

Research Translation ◽

Neural Machine Translation ◽

Low Resource ◽

Evaluation Scheme ◽

Dominant Paradigm ◽

Target Side

Phrase-based statistical machine translation (PB-SMT) has been the dominant paradigm in machine translation (MT) research for more than two decades. Deep neural MT models have been producing state-of-the-art performance across many translation tasks for four to five years. To put it another way, neural MT (NMT) took the place of PB-SMT a few years back and currently represents the state-of-the-art in MT research. Translation to or from under-resourced languages has been historically seen as a challenging task. Despite producing state-of-the-art results in many translation tasks, NMT still poses many problems such as performing poorly for many low-resource language pairs mainly because of its learning task’s data-demanding nature. MT researchers have been trying to address this problem via various techniques, e.g., exploiting source- and/or target-side monolingual data for training, augmenting bilingual training data, and transfer learning. Despite some success, none of the present-day benchmarks have entirely overcome the problem of translation in low-resource scenarios for many languages. In this work, we investigate the performance of PB-SMT and NMT on two rarely tested under-resourced language pairs, English-to-Tamil and Hindi-to-Tamil, taking a specialised data domain into consideration. This paper demonstrates our findings and presents results showing the rankings of our MT systems produced via a social media-based human evaluation scheme.

Download Full-text

Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013723 ◽

2019 ◽

Vol 33 ◽

pp. 3723-3730 ◽

Cited By ~ 5

Author(s):

Junliang Guo ◽

Xu Tan ◽

Di He ◽

Tao Qin ◽

Linli Xu ◽

...

Keyword(s):

Machine Translation ◽

Experimental Results ◽

Word Embeddings ◽

Model Accuracy ◽

Neural Machine Translation ◽

Word Level ◽

Sentence Level ◽

The Cost ◽

Target Side

Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens from the inputs of the decoder, achieve significantly inference speedup but at the cost of inferior accuracy compared to autoregressive translation (AT) models. Previous work shows that the quality of the inputs of the decoder is important and largely impacts the model accuracy. In this paper, we propose two methods to enhance the decoder inputs so as to improve NAT models. The first one directly leverages a phrase table generated by conventional SMT approaches to translate source tokens to target tokens, which are then fed into the decoder as inputs. The second one transforms source-side word embeddings to target-side word embeddings through sentence-level alignment and word-level adversary learning, and then feeds the transformed word embeddings into the decoder as inputs. Experimental results show our method largely outperforms the NAT baseline (Gu et al. 2017) by 5.11 BLEU scores on WMT14 English-German task and 4.72 BLEU scores on WMT16 English-Romanian task.

Download Full-text

An Evaluation of Neural Machine Translation and Pre-trained Word Embeddings in Multilingual Neural Sentiment Analysis

2020 IEEE International Conference on Progress in Informatics and Computing (PIC) ◽

10.1109/pic50277.2020.9350849 ◽

2020 ◽

Author(s):

George Manias ◽

Argyro Mavrogiorgou ◽

Athanasios Kiourtis ◽

Dimosthenis Kyriazis

Keyword(s):

Sentiment Analysis ◽

Machine Translation ◽

Word Embeddings ◽

Neural Machine Translation

Download Full-text

Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation

International Journal of Asian Language Processing ◽

10.1142/s2717554520500174 ◽

2021 ◽

pp. 2050017

Author(s):

Rashmini Naranpanawa ◽

Ravinga Perera ◽

Thilakshi Fonseka ◽

Uthayasanker Thayasivam

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Translation System ◽

Rare Word ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Low Resource ◽

Word Level ◽

Morphologically Rich Languages

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.

Download Full-text

Effective Use of Target-side Context for Neural Machine Translation

Journal of Natural Language Processing ◽

10.5715/jnlp.28.731 ◽

2021 ◽

Vol 28 (2) ◽

pp. 731-735

Author(s):

Hideya Mino

Keyword(s):

Machine Translation ◽

Neural Machine Translation ◽

Effective Use ◽

Target Side

Download Full-text