Translating with Bilingual Topic Knowledge for Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017257 ◽

2019 ◽

Vol 33 ◽

pp. 7257-7264

Author(s):

Xiangpeng Wei ◽

Yue Hu ◽

Luxi Xing ◽

Yipeng Wang ◽

Li Gao

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Attention Mechanism ◽

Target Domain ◽

Neural Machine Translation ◽

Topic Knowledge ◽

Proposed Model ◽

Decoder Architecture ◽

Source Sentence ◽

Hidden States

The dominant neural machine translation (NMT) models that based on the encoder-decoder architecture have recently achieved the state-of-the-art performance. Traditionally, the NMT models only depend on the representations learned during training for mapping a source sentence into the target domain. However, the learned representations often suffer from implicit and inadequately informed properties. In this paper, we propose a novel bilingual topic enhanced NMT (BLTNMT) model to improve translation performance by incorporating bilingual topic knowledge into NMT. Specifically, the bilingual topic knowledge is included into the hidden states of both encoder and decoder, as well as the attention mechanism. With this new setting, the proposed BLT-NMT has access to the background knowledge implied in bilingual topics which is beyond the sequential context, and enables the attention mechanism to attend to topic-level attentions for generating accurate target words during translation. Experimental results show that the proposed model consistently outperforms the traditional RNNsearch and the previous topic-informed NMT on Chinese-English and EnglishGerman translation tasks. We also introduce the bilingual topic knowledge into the newly emerged Transformer base model on English-German translation and achieve a notable improvement.

Download Full-text

Multistep Flow Prediction on Car-Sharing Systems: A Multi-Graph Convolutional Neural Network with Attention Mechanism

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194019400187 ◽

2019 ◽

Vol 29 (11n12) ◽

pp. 1727-1740 ◽

Cited By ~ 3

Author(s):

Hongming Zhu ◽

Yi Luo ◽

Qin Liu ◽

Hongfei Fan ◽

Tianyou Song ◽

...

Keyword(s):

Large Scale ◽

State Of The Art ◽

Spatial Relations ◽

Attention Mechanism ◽

Temporal Relations ◽

Car Sharing ◽

Flow Prediction ◽

Proposed Model ◽

Decoder Architecture ◽

Deep Learning Model

Multistep flow prediction is an essential task for the car-sharing systems. An accurate flow prediction model can help system operators to pre-allocate the cars to meet the demand of users. However, this task is challenging due to the complex spatial and temporal relations among stations. Existing works only considered temporal relations (e.g. using LSTM) or spatial relations (e.g. using CNN) independently. In this paper, we propose an attention to multi-graph convolutional sequence-to-sequence model (AMGC-Seq2Seq), which is a novel deep learning model for multistep flow prediction. The proposed model uses the encoder–decoder architecture, wherein the encoder part, spatial and temporal relations are encoded simultaneously. Then the encoded information is passed to the decoder to generate multistep outputs. In this work, specific multiple graphs are constructed to reflect spatial relations from different aspects, and we model them by using the proposed multi-graph convolution. Attention mechanism is also used to capture the important relations from previous information. Experiments on a large-scale real-world car-sharing dataset demonstrate the effectiveness of our approach over state-of-the-art methods.

Download Full-text

Deep Learning-based Roman-Urdu to Urdu Transliteration

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421520017 ◽

2020 ◽

pp. 2152001

Author(s):

Mehreen Alam ◽

Sibt ul Hussain

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Research Problem ◽

Attention Mechanism ◽

Data Driven ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Source Language ◽

Data Driven Approach ◽

Modern Machine

Attention-based encoder-decoder models have superseded conventional techniques due to their unmatched performance on many neural machine translation problems. Usually, the encoders and decoders are two recurrent neural networks where the decoder is directed to focus on relevant parts of the source language using attention mechanism. This data-driven approach leads to generic and scalable solutions with no reliance on manual hand-crafted features. To the best of our knowledge, none of the modern machine translation approaches has been applied to address the research problem of Urdu machine transliteration. Ours is the first attempt to apply the deep neural network-based encoder-decoder using attention mechanism to address the aforementioned problem using Roman-Urdu and Urdu parallel corpus. To this end, we present (i) the first ever Roman-Urdu to Urdu parallel corpus of 1.1 million sentences, (ii) three state of the art encoder-decoder models, and (iii) a detailed empirical analysis of these three models on the Roman-Urdu to Urdu parallel corpus. Overall, attention-based model gives state-of-the-art performance with the benchmark of 70 BLEU score. Our qualitative experimental evaluation shows that our models generate coherent transliterations which are grammatically and logically correct.

Download Full-text

Guiding Attention in Sequence-to-Sequence Models for Dialogue Act Prediction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6259 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7594-7601

Author(s):

Pierre Colombo ◽

Emile Chapuis ◽

Matteo Manica ◽

Emmanuel Vignon ◽

Giovanna Varni ◽

...

Keyword(s):

Machine Translation ◽

Random Fields ◽

Conditional Random Fields ◽

State Of The Art ◽

The State ◽

Attention Mechanism ◽

Accuracy Score ◽

Beam Search ◽

Conversational Agents ◽

Neural Machine Translation

The task of predicting dialog acts (DA) based on conversational dialog is a key component in the development of conversational agents. Accurately predicting DAs requires a precise modeling of both the conversation and the global tag dependencies. We leverage seq2seq approaches widely adopted in Neural Machine Translation (NMT) to improve the modelling of tag sequentiality. Seq2seq models are known to learn complex global dependencies while currently proposed approaches using linear conditional random fields (CRF) only model local tag dependencies. In this work, we introduce a seq2seq model tailored for DA classification using: a hierarchical encoder, a novel guided attention mechanism and beam search applied to both training and inference. Compared to the state of the art our model does not require handcrafted features and is trained end-to-end. Furthermore, the proposed approach achieves an unmatched accuracy score of 85% on SwDA, and state-of-the-art accuracy score of 91.6% on MRDA.

Download Full-text

Incorporating Source-Side Phrase Structures into Neural Machine Translation

Computational Linguistics ◽

10.1162/coli_a_00348 ◽

2019 ◽

Vol 45 (2) ◽

pp. 267-292 ◽

Cited By ~ 4

Author(s):

Akiko Eriguchi ◽

Kazuma Hashimoto ◽

Yoshimasa Tsuruoka

Keyword(s):

Machine Translation ◽

Syntactic Structure ◽

Statistical Machine Translation ◽

Training Data ◽

Great Success ◽

Data Set ◽

Neural Machine Translation ◽

Proposed Model ◽

Source Sentence

Neural machine translation (NMT) has shown great success as a new alternative to the traditional Statistical Machine Translation model in multiple languages. Early NMT models are based on sequence-to-sequence learning that encodes a sequence of source words into a vector space and generates another sequence of target words from the vector. In those NMT models, sentences are simply treated as sequences of words without any internal structure. In this article, we focus on the role of the syntactic structure of source sentences and propose a novel end-to-end syntactic NMT model, which we call a tree-to-sequence NMT model, extending a sequence-to-sequence model with the source-side phrase structure. Our proposed model has an attention mechanism that enables the decoder to generate a translated word while softly aligning it with phrases as well as words of the source sentence. We have empirically compared the proposed model with sequence-to-sequence models in various settings on Chinese-to-Japanese and English-to-Japanese translation tasks. Our experimental results suggest that the use of syntactic structure can be beneficial when the training data set is small, but is not as effective as using a bi-directional encoder. As the size of training data set increases, the benefits of using a syntactic tree tends to diminish.

Download Full-text

Dynamic Layer Aggregation for Neural Machine Translation with Routing-by-Agreement

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.330186 ◽

2019 ◽

Vol 33 ◽

pp. 86-93 ◽

Cited By ~ 4

Author(s):

Zi-Yi Dou ◽

Zhaopeng Tu ◽

Xing Wang ◽

Longyue Wang ◽

Shuming Shi ◽

...

Keyword(s):

Machine Translation ◽

Recent Progress ◽

Deep Neural Networks ◽

State Of The Art ◽

Baseline Model ◽

Individual Layer ◽

Neural Machine Translation ◽

Translation Model ◽

Hidden States ◽

Dynamic Layer

With the promising progress of deep neural networks, layer aggregation has been used to fuse information across layers in various fields, such as computer vision and machine translation. However, most of the previous methods combine layers in a static fashion in that their aggregation strategy is independent of specific hidden states. Inspired by recent progress on capsule networks, in this paper we propose to use routing-by-agreement strategies to aggregate layers dynamically. Specifically, the algorithm learns the probability of a part (individual layer representations) assigned to a whole (aggregated representations) in an iterative way and combines parts accordingly. We implement our algorithm on top of the state-of-the-art neural machine translation model TRANSFORMER and conduct experiments on the widely-used WMT14 sh⇒German and WMT17 Chinese⇒English translation datasets. Experimental results across language pairs show that the proposed approach consistently outperforms the strong baseline model and a representative static aggregation model.

Download Full-text

Improved Neural Machine Translation with Source Syntax

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/584 ◽

2017 ◽

Cited By ~ 6

Author(s):

Shuangzhi Wu ◽

Ming Zhou ◽

Dongdong Zhang

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Syntactic Structure ◽

Phrase Structure ◽

Long Distance ◽

Neural Machine Translation ◽

Attention Model ◽

Word Level ◽

Source Sentence ◽

Dependency Structures

Neural Machine Translation (NMT) based on the encoder-decoder architecture has recently achieved the state-of-the-art performance. Researchers have proven that extending word level attention to phrase level attention by incorporating source-side phrase structure can enhance the attention model and achieve promising improvement. However, word dependencies that can be crucial to correctly understand a source sentence are not always in a consecutive fashion (i.e. phrase structure), sometimes they can be in long distance. Phrase structures are not the best way to explicitly model long distance dependencies. In this paper we propose a simple but effective method to incorporate source-side long distance dependencies into NMT. Our method based on dependency trees enriches each source state with global dependency structures, which can better capture the inherent syntactic structure of source sentences. Experiments on Chinese-English and English-Japanese translation tasks show that our proposed method outperforms state-of-the-art SMT and NMT baselines.

Download Full-text

Explicit Sentence Compression for Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6347 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8311-8318

Author(s):

Zuchao Li ◽

Rui Wang ◽

Kehai Chen ◽

Masao Utiyama ◽

Eiichiro Sumita ◽

...

Keyword(s):

Machine Translation ◽

State Of The Art ◽

General Information ◽

Compression Method ◽

Neural Machine Translation ◽

Sentence Compression ◽

French And English ◽

Source Sentence ◽

Empirical Tests ◽

Target Side

State-of-the-art Transformer-based neural machine translation (NMT) systems still follow a standard encoder-decoder framework, in which source sentence representation can be well done by an encoder with self-attention mechanism. Though Transformer-based encoder may effectively capture general information in its resulting source sentence representation, the backbone information, which stands for the gist of a sentence, is not specifically focused on. In this paper, we propose an explicit sentence compression method to enhance the source sentence representation for NMT. In practice, an explicit sentence compression goal used to learn the backbone information in a sentence. We propose three ways, including backbone source-side fusion, target-side fusion, and both-side fusion, to integrate the compressed sentence into NMT. Our empirical tests on the WMT English-to-French and English-to-German translation tasks show that the proposed sentence compression method significantly improves the translation performances over strong baselines.

Download Full-text

ME-MD: An Effective Framework for Neural Machine Translation with Multiple Encoders and Decoders

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/474 ◽

2017 ◽

Cited By ~ 1

Author(s):

Jinchao Zhang ◽

Qun Liu ◽

Jie Zhou

Keyword(s):

Machine Translation ◽

English Translation ◽

State Of The Art ◽

The State ◽

Neural Machine Translation ◽

Source Sentence ◽

Target Words

The encoder-decoder neural framework is widely employed for Neural Machine Translation (NMT) with a single encoder to represent the source sentence and a single decoder to generate target words. The translation performance heavily relies on the representation ability of the encoder and the generation ability of the decoder. To further enhance NMT, we propose to extend the original encoder-decoder framework to a novel one, which has multiple encoders and decoders (ME-MD). Through this way, multiple encoders extract more diverse features to represent the source sequence and multiple decoders capture more complicated translation knowledge. Our proposed ME-MD framework is convenient to integrate heterogeneous encoders and decoders with multiple depths and multiple types. Experiment on Chinese-English translation task shows that our ME-MD system surpasses the state-of-the-art NMT system by 2.1 BLEU points and surpasses the phrase-based Moses by 7.38 BLEU points. Our framework is general and can be applied to other sequence to sequence tasks.

Download Full-text

Neural Speech Synthesis with Transformer Network

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016706 ◽

2019 ◽

Vol 33 ◽

pp. 6706-6713 ◽

Cited By ~ 20

Author(s):

Naihan Li ◽

Shujie Liu ◽

Yanqing Liu ◽

Sheng Zhao ◽

Ming Liu

Keyword(s):

Speech Synthesis ◽

Attention Mechanism ◽

Neural Machine Translation ◽

Proposed Model ◽

Speed Up ◽

Low Efficiency ◽

Human Quality ◽

And Performance ◽

Hidden States ◽

Training Efficiency

Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed and achieve state-of-theart performance, they still suffer from two problems: 1) low efficiency during training and inference; 2) hard to model long dependency using current recurrent neural networks (RNNs). Inspired by the success of Transformer network in neural machine translation (NMT), in this paper, we introduce and adapt the multi-head attention mechanism to replace the RNN structures and also the original attention mechanism in Tacotron2. With the help of multi-head self-attention, the hidden states in the encoder and decoder are constructed in parallel, which improves training efficiency. Meanwhile, any two inputs at different times are connected directly by a self-attention mechanism, which solves the long range dependency problem effectively. Using phoneme sequences as input, our Transformer TTS network generates mel spectrograms, followed by a WaveNet vocoder to output the final audio results. Experiments are conducted to test the efficiency and performance of our new network. For the efficiency, our Transformer TTS network can speed up the training about 4.25 times faster compared with Tacotron2. For the performance, rigorous human tests show that our proposed model achieves state-of-the-art performance (outperforms Tacotron2 with a gap of 0.048) and is very close to human quality (4.39 vs 4.44 in MOS).

Download Full-text

Neural Machine Translation with Key-Value Memory-Augmented Attention

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/357 ◽

2018 ◽

Cited By ~ 1

Author(s):

Fandong Meng ◽

Zhaopeng Tu ◽

Yong Cheng ◽

Haiyang Wu ◽

Junjie Zhai ◽

...

Keyword(s):

Target Word ◽

Machine Translation ◽

English Translation ◽

Translation Process ◽

Neural Machine Translation ◽

Attention Model ◽

Proposed Model ◽

Source Sentence ◽

Remarkable Progress ◽

Source Word

Although attention-based Neural Machine Translation (NMT) has achieved remarkable progress in recent years, it still suffers from issues of repeating and dropping translations. To alleviate these issues, we propose a novel key-value memory-augmented attention model for NMT, called KVMEMATT. Specifically, we maintain a timely updated keymemory to keep track of attention history and a fixed value-memory to store the representation of source sentence throughout the whole translation process. Via nontrivial transformations and iterative interactions between the two memories, the decoder focuses on more appropriate source word(s) for predicting the next target word at each decoding step, therefore can improve the adequacy of translations. Experimental results on Chinese)English and WMT17 German,English translation tasks demonstrate the superiority of the proposed model.

Download Full-text