Dual-Source Transformer Model for Neural Machine Translation with Linguistic Knowledge

Incorporating source-side linguistic knowledge into the neural machine translation (NMT) model has recently achieved impressive performance on machine translation tasks. One popular method is to generalize the word embedding layer of the encoder to encode each word and its linguistic features. The other method is to change the architecture of the encoder to encode syntactic information. However, the former cannot explicitly balance the contribution from the word and its linguistic features. The latter cannot flexibly utilize various types of linguistic information. Focusing on the above issues, this paper proposes a novel NMT approach that models the words in parallel to the linguistic knowledge by using two separate encoders. Compared with the single encoder based NMT model, the proposed approach additionally employs the knowledge-based encoder to specially encode linguistic features. Moreover, it shares parameters across encoders to enhance the model representation ability of the source-side language. Extensive experiments show that the approach achieves significant improvements of up to 2.4 and 1.1 BLEU points on Turkish→English and English→Turkish machine translation tasks, respectively, which indicates that it is capable of better utilizing the external linguistic knowledge and effective improving the machine translation quality.

Download Full-text

Replacing Linguists with Dummies: A Serious Need for Trivial Baselines in Multi-Task Neural Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2019-0005 ◽

2019 ◽

Vol 113 (1) ◽

pp. 31-40

Author(s):

Daniel Kondratyuk ◽

Ronald Cardenas ◽

Ondřej Bojar

Keyword(s):

Machine Translation ◽

Deep Neural Networks ◽

Neural Machine Translation ◽

Translation Quality ◽

Syntactic Information ◽

Multiple Tasks ◽

Recent Developments ◽

Resource Conditions ◽

Main Message ◽

Source Word

Abstract Recent developments in machine translation experiment with the idea that a model can improve the translation quality by performing multiple tasks, e.g., translating from source to target and also labeling each source word with syntactic information. The intuition is that the network would generalize knowledge over the multiple tasks, improving the translation performance, especially in low resource conditions. We devised an experiment that casts doubt on this intuition. We perform similar experiments in both multi-decoder and interleaving setups that label each target word either with a syntactic tag or a completely random tag. Surprisingly, we show that the model performs nearly as well on uncorrelated random tags as on true syntactic tags. We hint some possible explanations of this behavior. The main message from our article is that experimental results with deep neural networks should always be complemented with trivial baselines to document that the observed gain is not due to some unrelated properties of the system or training effects. True confidence in where the gains come from will probably remain problematic anyway.

Download Full-text

Source-Word Decomposition for Neural Machine Translation

Mathematical Problems in Engineering ◽

10.1155/2020/4795187 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Thien Nguyen ◽

Hoai Le ◽

Van-Huy Pham

Keyword(s):

Machine Translation ◽

Linguistic Features ◽

Specialized Knowledge ◽

Neural Machine Translation ◽

Translation Model ◽

Part Of Speech ◽

Effective System ◽

Feature Based ◽

Transformer Model ◽

Translation Systems

End-to-end neural machine translation does not require us to have specialized knowledge of investigated language pairs in building an effective system. On the other hand, feature engineering proves to be vital in other artificial intelligence fields, such as speech recognition and computer vision. Inspired by works in those fields, in this paper, we propose a novel feature-based translation model by modifying the state-of-the-art transformer model. Specifically, the encoder of the modified transformer model takes input combinations of linguistic features comprising of lemma, dependency label, part-of-speech tag, and morphological label instead of source words. The experiment results for the Russian-Vietnamese language pair show that the proposed feature-based transformer model improves over the strongest baseline transformer translation model by impressive 4.83 BLEU. In addition, experiment analysis reveals that human judgment on the translation results strongly confirms machine judgment. Our model could be useful in building translation systems translating from a highly inflectional language into a noninflectional language.

Download Full-text

Linguistic knowledge-based vocabularies for Neural Machine Translation

Natural Language Engineering ◽

10.1017/s1351324920000364 ◽

2020 ◽

pp. 1-22

Author(s):

Noe Casas ◽

Marta R. Costa-jussà ◽

José A. R. Fonollosa ◽

Juan A. Alonso ◽

Ramón Fanlo

Keyword(s):

Neural Networks ◽

Machine Translation ◽

Word Formation ◽

Linguistic Knowledge ◽

Learning Approaches ◽

Neural Machine Translation ◽

Knowledge Based ◽

Word Level ◽

The Neural Network ◽

Cross Lingual

Abstract Neural Networks applied to Machine Translation need a finite vocabulary to express textual information as a sequence of discrete tokens. The currently dominant subword vocabularies exploit statistically-discovered common parts of words to achieve the flexibility of character-based vocabularies without delegating the whole learning of word formation to the neural network. However, they trade this for the inability to apply word-level token associations, which limits their use in semantically-rich areas and prevents some transfer learning approaches e.g. cross-lingual pretrained embeddings, and reduces their interpretability. In this work, we propose new hybrid linguistically-grounded vocabulary definition strategies that keep both the advantages of subword vocabularies and the word-level associations, enabling neural networks to profit from the derived benefits. We test the proposed approaches in both morphologically rich and poor languages, showing that, for the former, the quality in the translation of out-of-domain texts is improved with respect to a strong subword baseline.

Download Full-text

DTMT: A Novel Deep Transition Architecture for Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301224 ◽

2019 ◽

Vol 33 ◽

pp. 224-231 ◽

Cited By ~ 3

Author(s):

Fandong Meng ◽

Jinchao Zhang

Keyword(s):

Machine Translation ◽

Linear Transformation ◽

Linear Transformations ◽

Transformation Mechanism ◽

Neural Machine Translation ◽

Training Techniques ◽

Translation Quality ◽

Transition Depth ◽

Transformer Model ◽

Hidden States

Past years have witnessed rapid developments in Neural Machine Translation (NMT). Most recently, with advanced modeling and training techniques, the RNN-based NMT (RNMT) has shown its potential strength, even compared with the well-known Transformer (self-attentional) model. Although the RNMT model can possess very deep architectures through stacking layers, the transition depth between consecutive hidden states along the sequential axis is still shallow. In this paper, we further enhance the RNN-based NMT through increasing the transition depth between consecutive hidden states and build a novel Deep Transition RNN-based Architecture for Neural Machine Translation, named DTMT. This model enhances the hidden-to-hidden transition with multiple non-linear transformations, as well as maintains a linear transformation path throughout this deep transition by the well-designed linear transformation mechanism to alleviate the gradient vanishing problem. Experiments show that with the specially designed deep transition modules, our DTMT can achieve remarkable improvements on translation quality. Experimental results on Chinese⇒English translation task show that DTMT can outperform the Transformer model by +2.09 BLEU points and achieve the best results ever reported in the same dataset. On WMT14 English⇒German and English⇒French translation tasks, DTMT shows superior quality to the state-of-the-art NMT systems, including the Transformer and the RNMT+.

Download Full-text

Semantic and syntactic information for neural machine translation

Machine Translation ◽

10.1007/s10590-021-09264-2 ◽

2021 ◽

Author(s):

Jordi Armengol-Estapé ◽

Marta R. Costa-jussà

Keyword(s):

Machine Translation ◽

Linked Data ◽

Semantic Information ◽

Semantic Features ◽

Linguistic Features ◽

Neural Machine Translation ◽

Low Resource ◽

Syntactic Information ◽

Attentional System

AbstractIntroducing factors such as linguistic features has long been proposed in machine translation to improve the quality of translations. More recently, factored machine translation has proven to still be useful in the case of sequence-to-sequence systems. In this work, we investigate whether this gains hold in the case of the state-of-the-art architecture in neural machine translation, the Transformer, instead of recurrent architectures. We propose a new model, the Factored Transformer, to introduce an arbitrary number of word features in the source sequence in an attentional system. Specifically, we suggest two variants depending on the level at which the features are injected. Moreover, we suggest two combination mechanisms for the word features and words themselves. We experiment both with classical linguistic features and semantic features extracted from a linked data database, and with two low-resource datasets. With the best-found configuration, we show improvements of 0.8 BLEU over the baseline Transformer in the IWSLT German-to-English task. Moreover, we experiment with the more challenging FLoRes English-to-Nepali benchmark, which includes both low-resource and very distant languages, and obtain an improvement of 1.2 BLEU. These improvements are achieved with linguistic and not with semantic information.

Download Full-text

Context-Aware Neural Machine Translation for Korean Honorific Expressions

Electronics ◽

10.3390/electronics10131589 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1589

Author(s):

Yongkeun Hwang ◽

Yanghoon Kim ◽

Kyomin Jung

Keyword(s):

Machine Translation ◽

Deep Neural Networks ◽

Contextual Information ◽

Context Aware ◽

Neural Machine Translation ◽

Translation Quality ◽

Sentence Level ◽

Proposed Model ◽

The Given ◽

The Relationship

Neural machine translation (NMT) is one of the text generation tasks which has achieved significant improvement with the rise of deep neural networks. However, language-specific problems such as handling the translation of honorifics received little attention. In this paper, we propose a context-aware NMT to promote translation improvements of Korean honorifics. By exploiting the information such as the relationship between speakers from the surrounding sentences, our proposed model effectively manages the use of honorific expressions. Specifically, we utilize a novel encoder architecture that can represent the contextual information of the given input sentences. Furthermore, a context-aware post-editing (CAPE) technique is adopted to refine a set of inconsistent sentence-level honorific translations. To demonstrate the efficacy of the proposed method, honorific-labeled test data is required. Thus, we also design a heuristic that labels Korean sentences to distinguish between honorific and non-honorific styles. Experimental results show that our proposed method outperforms sentence-level NMT baselines both in overall translation quality and honorific translations.

Download Full-text

Improving thai-lao neural machine translation with similarity lexicon

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-212236 ◽

2021 ◽

pp. 1-10

Author(s):

Zhiqiang Yu ◽

Yuxin Huang ◽

Junjun Guo

Keyword(s):

Machine Translation ◽

Semantic Information ◽

Neural Machine Translation ◽

Low Resource ◽

Translation Quality ◽

Decoder Architecture ◽

Baseline System ◽

Input Sentence ◽

Resource Conditions ◽

Language Pair

It has been shown that the performance of neural machine translation (NMT) drops starkly in low-resource conditions. Thai-Lao is a typical low-resource language pair of tiny parallel corpus, leading to suboptimal NMT performance on it. However, Thai and Lao have considerable similarities in linguistic morphology and have bilingual lexicon which is relatively easy to obtain. To use this feature, we first build a bilingual similarity lexicon composed of pairs of similar words. Then we propose a novel NMT architecture to leverage the similarity between Thai and Lao. Specifically, besides the prevailing sentence encoder, we introduce an extra similarity lexicon encoder into the conventional encoder-decoder architecture, by which the semantic information carried by the similarity lexicon can be represented. We further provide a simple mechanism in the decoder to balance the information representations delivered from the input sentence and the similarity lexicon. Our approach can fully exploit linguistic similarity carried by the similarity lexicon to improve translation quality. Experimental results demonstrate that our approach achieves significant improvements over the state-of-the-art Transformer baseline system and previous similar works.

Download Full-text

Training Tips for the Transformer Model

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2018-0002 ◽

2018 ◽

Vol 110 (1) ◽

pp. 43-70 ◽

Cited By ~ 15

Author(s):

Martin Popel ◽

Ondřej Bojar

Keyword(s):

Sentence Length ◽

Batch Size ◽

Critical Parameters ◽

Neural Machine Translation ◽

Training Time ◽

Multiple Gpus ◽

Translation Quality ◽

Transformer Model ◽

Data Constraints ◽

And Training

Abstract This article describes our experiments in neural machine translation using the recent Tensor2Tensor framework and the Transformer sequence-to-sequence model (Vaswani et al., 2017). We examine some of the critical parameters that affect the final translation quality, memory usage, training stability and training time, concluding each experiment with a set of recommendations for fellow researchers. In addition to confirming the general mantra “more data and larger models”, we address scaling to multiple GPUs and provide practical tips for improved training regarding batch size, learning rate, warmup steps, maximum sentence length and checkpoint averaging. We hope that our observations will allow others to get better results given their particular hardware and data constraints.

Download Full-text

Recurrent Stacking of Layers for Compact Neural Machine Translation Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016292 ◽

2019 ◽

Vol 33 ◽

pp. 6292-6299 ◽

Cited By ~ 2

Author(s):

Raj Dabre ◽

Atsushi Fujita

Keyword(s):

Machine Translation ◽

Single Layer ◽

Training Data ◽

Neural Machine Translation ◽

Parallel Corpora ◽

Translation Quality ◽

Sequence Generation ◽

Sequence Modeling ◽

Back Translation

In encoder-decoder based sequence-to-sequence modeling, the most common practice is to stack a number of recurrent, convolutional, or feed-forward layers in the encoder and decoder. While the addition of each new layer improves the sequence generation quality, this also leads to a significant increase in the number of parameters. In this paper, we propose to share parameters across all layers thereby leading to a recurrently stacked sequence-to-sequence model. We report on an extensive case study on neural machine translation (NMT) using our proposed method, experimenting with a variety of datasets. We empirically show that the translation quality of a model that recurrently stacks a single-layer 6 times, despite its significantly fewer parameters, approaches that of a model that stacks 6 different layers. We also show how our method can benefit from a prevalent way for improving NMT, i.e., extending training data with pseudo-parallel corpora generated by back-translation. We then analyze the effects of recurrently stacked layers by visualizing the attentions of models that use recurrently stacked layers and models that do not. Finally, we explore the limits of parameter sharing where we share even the parameters between the encoder and decoder in addition to recurrent stacking of layers.

Download Full-text

Knowledge Graphs Enhanced Neural Machine Translation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/559 ◽

2020 ◽

Author(s):

Yang Zhao ◽

Jiajun Zhang ◽

Yu Zhou ◽

Chengqing Zong

Keyword(s):

Machine Translation ◽

Semantic Space ◽

Neural Machine Translation ◽

Translation Quality ◽

Structured Information ◽

Knowledge Graphs ◽

Japanese Translation

Knowledge graphs (KGs) store much structured information on various entities, many of which are not covered by the parallel sentence pairs of neural machine translation (NMT). To improve the translation quality of these entities, in this paper we propose a novel KGs enhanced NMT method. Specifically, we first induce the new translation results of these entities by transforming the source and target KGs into a unified semantic space. We then generate adequate pseudo parallel sentence pairs that contain these induced entity pairs. Finally, NMT model is jointly trained by the original and pseudo sentence pairs. The extensive experiments on Chinese-to-English and Englishto-Japanese translation tasks demonstrate that our method significantly outperforms the strong baseline models in translation quality, especially in handling the induced entities.

Download Full-text