Improved Neural Machine Translation with Source Syntax

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/584 ◽

2017 ◽

Cited By ~ 6

Author(s):

Shuangzhi Wu ◽

Ming Zhou ◽

Dongdong Zhang

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Syntactic Structure ◽

Phrase Structure ◽

Long Distance ◽

Neural Machine Translation ◽

Attention Model ◽

Word Level ◽

Source Sentence ◽

Dependency Structures

Neural Machine Translation (NMT) based on the encoder-decoder architecture has recently achieved the state-of-the-art performance. Researchers have proven that extending word level attention to phrase level attention by incorporating source-side phrase structure can enhance the attention model and achieve promising improvement. However, word dependencies that can be crucial to correctly understand a source sentence are not always in a consecutive fashion (i.e. phrase structure), sometimes they can be in long distance. Phrase structures are not the best way to explicitly model long distance dependencies. In this paper we propose a simple but effective method to incorporate source-side long distance dependencies into NMT. Our method based on dependency trees enriches each source state with global dependency structures, which can better capture the inherent syntactic structure of source sentences. Experiments on Chinese-English and English-Japanese translation tasks show that our proposed method outperforms state-of-the-art SMT and NMT baselines.

Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation

International Journal of Asian Language Processing ◽

10.1142/s2717554520500174 ◽

2021 ◽

pp. 2050017

Author(s):

Rashmini Naranpanawa ◽

Ravinga Perera ◽

Thilakshi Fonseka ◽

Uthayasanker Thayasivam

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Translation System ◽

Rare Word ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Low Resource ◽

Word Level ◽

Morphologically Rich Languages

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.

Incorporating Source-Side Phrase Structures into Neural Machine Translation

Computational Linguistics ◽

10.1162/coli_a_00348 ◽

2019 ◽

Vol 45 (2) ◽

pp. 267-292 ◽

Cited By ~ 4

Author(s):

Akiko Eriguchi ◽

Kazuma Hashimoto ◽

Yoshimasa Tsuruoka

Keyword(s):

Machine Translation ◽

Syntactic Structure ◽

Statistical Machine Translation ◽

Training Data ◽

Great Success ◽

Data Set ◽

Neural Machine Translation ◽

Proposed Model ◽

Source Sentence

Neural machine translation (NMT) has shown great success as a new alternative to the traditional Statistical Machine Translation model in multiple languages. Early NMT models are based on sequence-to-sequence learning that encodes a sequence of source words into a vector space and generates another sequence of target words from the vector. In those NMT models, sentences are simply treated as sequences of words without any internal structure. In this article, we focus on the role of the syntactic structure of source sentences and propose a novel end-to-end syntactic NMT model, which we call a tree-to-sequence NMT model, extending a sequence-to-sequence model with the source-side phrase structure. Our proposed model has an attention mechanism that enables the decoder to generate a translated word while softly aligning it with phrases as well as words of the source sentence. We have empirically compared the proposed model with sequence-to-sequence models in various settings on Chinese-to-Japanese and English-to-Japanese translation tasks. Our experimental results suggest that the use of syntactic structure can be beneficial when the training data set is small, but is not as effective as using a bi-directional encoder. As the size of training data set increases, the benefits of using a syntactic tree tends to diminish.

Gutenberg Goes Neural: Comparing Features of Dutch Human Translations with Raw Neural Machine Translation Outputs in a Corpus of English Literary Classics

Informatics ◽

10.3390/informatics7030032 ◽

2020 ◽

Vol 7 (3) ◽

pp. 32

Author(s):

Rebecca Webster ◽

Margot Fonteyne ◽

Arda Tezcan ◽

Lieve Macken ◽

Joke Daems

Keyword(s):

Machine Translation ◽

Syntactic Structure ◽

Literary Translation ◽

Neural Machine Translation ◽

Lexical Richness ◽

Source Sentence ◽

Literary Classics ◽

Local Cohesion ◽

Insight Into

Due to the growing success of neural machine translation (NMT), many have started to question its applicability within the field of literary translation. In order to grasp the possibilities of NMT, we studied the output of the neural machine system of Google Translate (GNMT) and DeepL when applied to four classic novels translated from English into Dutch. The quality of the NMT systems is discussed by focusing on manual annotations, and we also employed various metrics in order to get an insight into lexical richness, local cohesion, syntactic, and stylistic difference. Firstly, we discovered that a large proportion of the translated sentences contained errors. We also observed a lower level of lexical richness and local cohesion in the NMTs compared to the human translations. In addition, NMTs are more likely to follow the syntactic structure of a source sentence, whereas human translations can differ. Lastly, the human translations deviate from the machine translations in style.

Explicit Sentence Compression for Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6347 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8311-8318

Author(s):

Zuchao Li ◽

Rui Wang ◽

Kehai Chen ◽

Masao Utiyama ◽

Eiichiro Sumita ◽

...

Keyword(s):

Machine Translation ◽

State Of The Art ◽

General Information ◽

Compression Method ◽

Neural Machine Translation ◽

Sentence Compression ◽

French And English ◽

Source Sentence ◽

Empirical Tests ◽

Target Side

State-of-the-art Transformer-based neural machine translation (NMT) systems still follow a standard encoder-decoder framework, in which source sentence representation can be well done by an encoder with self-attention mechanism. Though Transformer-based encoder may effectively capture general information in its resulting source sentence representation, the backbone information, which stands for the gist of a sentence, is not specifically focused on. In this paper, we propose an explicit sentence compression method to enhance the source sentence representation for NMT. In practice, an explicit sentence compression goal used to learn the backbone information in a sentence. We propose three ways, including backbone source-side fusion, target-side fusion, and both-side fusion, to integrate the compressed sentence into NMT. Our empirical tests on the WMT English-to-French and English-to-German translation tasks show that the proposed sentence compression method significantly improves the translation performances over strong baselines.

ME-MD: An Effective Framework for Neural Machine Translation with Multiple Encoders and Decoders

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/474 ◽

2017 ◽

Cited By ~ 1

Author(s):

Jinchao Zhang ◽

Qun Liu ◽

Jie Zhou

Keyword(s):

Machine Translation ◽

English Translation ◽

State Of The Art ◽

The State ◽

Neural Machine Translation ◽

Source Sentence ◽

Target Words

The encoder-decoder neural framework is widely employed for Neural Machine Translation (NMT) with a single encoder to represent the source sentence and a single decoder to generate target words. The translation performance heavily relies on the representation ability of the encoder and the generation ability of the decoder. To further enhance NMT, we propose to extend the original encoder-decoder framework to a novel one, which has multiple encoders and decoders (ME-MD). Through this way, multiple encoders extract more diverse features to represent the source sequence and multiple decoders capture more complicated translation knowledge. Our proposed ME-MD framework is convenient to integrate heterogeneous encoders and decoders with multiple depths and multiple types. Experiment on Chinese-English translation task shows that our ME-MD system surpasses the state-of-the-art NMT system by 2.1 BLEU points and surpasses the phrase-based Moses by 7.38 BLEU points. Our framework is general and can be applied to other sequence to sequence tasks.

Neural Machine Translation with Adequacy-Oriented Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016618 ◽

2019 ◽

Vol 33 ◽

pp. 6618-6625 ◽

Cited By ~ 1

Author(s):

Xiang Kong ◽

Zhaopeng Tu ◽

Shuming Shi ◽

Eduard Hovy ◽

Tong Zhang

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Likelihood Estimation ◽

Neural Machine Translation ◽

Learning Mechanism ◽

Translation Quality ◽

Word Level ◽

Adversarial Training ◽

Level Training ◽

Qualitative Analyses

Although Neural Machine Translation (NMT) models have advanced state-of-the-art performance in machine translation, they face problems like the inadequate translation. We attribute this to that the standard Maximum Likelihood Estimation (MLE) cannot judge the real translation quality due to its several limitations. In this work, we propose an adequacyoriented learning mechanism for NMT by casting translation as a stochastic policy in Reinforcement Learning (RL), where the reward is estimated by explicitly measuring translation adequacy. Benefiting from the sequence-level training of RL strategy and a more accurate reward designed specifically for translation, our model outperforms multiple strong baselines, including (1) standard and coverage-augmented attention models with MLE-based training, and (2) advanced reinforcement and adversarial training strategies with rewards based on both word-level BLEU and character-level CHRF3. Quantitative and qualitative analyses on different language pairs and NMT architectures demonstrate the effectiveness and universality of the proposed approach.

Translating with Bilingual Topic Knowledge for Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017257 ◽

2019 ◽

Vol 33 ◽

pp. 7257-7264

Author(s):

Xiangpeng Wei ◽

Yue Hu ◽

Luxi Xing ◽

Yipeng Wang ◽

Li Gao

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Attention Mechanism ◽

Target Domain ◽

Neural Machine Translation ◽

Topic Knowledge ◽

Proposed Model ◽

Decoder Architecture ◽

Source Sentence ◽

Hidden States

The dominant neural machine translation (NMT) models that based on the encoder-decoder architecture have recently achieved the state-of-the-art performance. Traditionally, the NMT models only depend on the representations learned during training for mapping a source sentence into the target domain. However, the learned representations often suffer from implicit and inadequately informed properties. In this paper, we propose a novel bilingual topic enhanced NMT (BLTNMT) model to improve translation performance by incorporating bilingual topic knowledge into NMT. Specifically, the bilingual topic knowledge is included into the hidden states of both encoder and decoder, as well as the attention mechanism. With this new setting, the proposed BLT-NMT has access to the background knowledge implied in bilingual topics which is beyond the sequential context, and enables the attention mechanism to attend to topic-level attentions for generating accurate target words during translation. Experimental results show that the proposed model consistently outperforms the traditional RNNsearch and the previous topic-informed NMT on Chinese-English and EnglishGerman translation tasks. We also introduce the bilingual topic knowledge into the newly emerged Transformer base model on English-German translation and achieve a notable improvement.

Neural Machine Translation with Key-Value Memory-Augmented Attention

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/357 ◽

2018 ◽

Cited By ~ 1

Author(s):

Fandong Meng ◽

Zhaopeng Tu ◽

Yong Cheng ◽

Haiyang Wu ◽

Junjie Zhai ◽

...

Keyword(s):

Target Word ◽

Machine Translation ◽

English Translation ◽

Translation Process ◽

Neural Machine Translation ◽

Attention Model ◽

Proposed Model ◽

Source Sentence ◽

Remarkable Progress ◽

Source Word

Although attention-based Neural Machine Translation (NMT) has achieved remarkable progress in recent years, it still suffers from issues of repeating and dropping translations. To alleviate these issues, we propose a novel key-value memory-augmented attention model for NMT, called KVMEMATT. Specifically, we maintain a timely updated keymemory to keep track of attention history and a fixed value-memory to store the representation of source sentence throughout the whole translation process. Via nontrivial transformations and iterative interactions between the two memories, the decoder focuses on more appropriate source word(s) for predicting the next target word at each decoding step, therefore can improve the adequacy of translations. Experimental results on Chinese)English and WMT17 German,English translation tasks demonstrate the superiority of the proposed model.

Attending to Entities for Better Text Understanding

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6254 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7554-7561

Author(s):

Pengxiang Cheng ◽

Katrin Erk

Keyword(s):

Large Scale ◽

Human Performance ◽

State Of The Art ◽

Syntactic Structure ◽

Semantic Knowledge ◽

Training Data ◽

Language Models ◽

Long Distance ◽

Future Directions ◽

Text Understanding

Recent progress in NLP witnessed the development of large-scale pre-trained language models (GPT, BERT, XLNet, etc.) based on Transformer (Vaswani et al. 2017), and in a range of end tasks, such models have achieved state-of-the-art results, approaching human performance. This clearly demonstrates the power of the stacked self-attention architecture when paired with a sufficient number of layers and a large amount of pre-training data. However, on tasks that require complex and long-distance reasoning where surface-level cues are not enough, there is still a large gap between the pre-trained models and human performance. Strubell et al. (2018) recently showed that it is possible to inject knowledge of syntactic structure into a model through supervised self-attention. We conjecture that a similar injection of semantic knowledge, in particular, coreference information, into an existing model would improve performance on such complex problems. On the LAMBADA (Paperno et al. 2016) task, we show that a model trained from scratch with coreference as auxiliary supervision for self-attention outperforms the largest GPT-2 model, setting the new state-of-the-art, while only containing a tiny fraction of parameters compared to GPT-2. We also conduct a thorough analysis of different variants of model architectures and supervision configurations, suggesting future directions on applying similar techniques to other problems.

Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00105 ◽

2016 ◽

Vol 4 ◽

pp. 371-383 ◽

Cited By ~ 40

Author(s):

Jie Zhou ◽

Ying Cao ◽

Xuguang Wang ◽

Peng Li ◽

Wei Xu

Keyword(s):

Machine Translation ◽

Short Term Memory ◽

Short Term ◽

Neural Machine Translation ◽

Attention Model ◽

Linear Connections ◽

New Type ◽

Long Short Term Memory ◽

Unknown Words ◽

First Time

Neural machine translation (NMT) aims at solving machine translation (MT) problems using neural networks and has exhibited promising results in recent years. However, most of the existing NMT models are shallow and there is still a performance gap between a single NMT model and the best conventional MT system. In this work, we introduce a new type of linear connections, named fast-forward connections, based on deep Long Short-Term Memory (LSTM) networks, and an interleaved bi-directional architecture for stacking the LSTM layers. Fast-forward connections play an essential role in propagating the gradients and building a deep topology of depth 16. On the WMT’14 English-to-French task, we achieve BLEU=37.7 with a single attention model, which outperforms the corresponding single shallow model by 6.2 BLEU points. This is the first time that a single NMT model achieves state-of-the-art performance and outperforms the best conventional model by 0.7 BLEU points. We can still achieve BLEU=36.3 even without using an attention mechanism. After special handling of unknown words and model ensembling, we obtain the best score reported to date on this task with BLEU=40.4. Our models are also validated on the more difficult WMT’14 English-to-German task.