A Systematic Study of Inner-Attention-Based Sentence Representations                     in Multilingual Neural Machine Translation

Neural machine translation has considerably improved the quality of automatic translations by learning good representations of input sentences. In this article, we explore a multilingual translation model capable of producing fixed-size sentence representations by incorporating an intermediate crosslingual shared layer, which we refer to as attention bridge. This layer exploits the semantics from each language and develops into a language-agnostic meaning representation that can be efficiently used for transfer learning. We systematically study the impact of the size of the attention bridge and the effect of including additional languages in the model. In contrast to related previous work, we demonstrate that there is no conflict between translation performance and the use of sentence representations in downstream tasks. In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks. Nevertheless, shorter representations lead to increased compression that is beneficial in non-trainable similarity tasks. Similarly, we show that trainable downstream tasks benefit from multilingual models, whereas additional language signals do not improve performance in non-trainable benchmarks. This is an important insight that helps to properly design models for specific applications. Finally, we also include an in-depth analysis of the proposed attention bridge and its ability to encode linguistic properties. We carefully analyze the information that is captured by individual attention heads and identify interesting patterns that explain the performance of specific settings in linguistic probing tasks.

Download Full-text

Recurrent Stacking of Layers for Compact Neural Machine Translation Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016292 ◽

2019 ◽

Vol 33 ◽

pp. 6292-6299 ◽

Cited By ~ 2

Author(s):

Raj Dabre ◽

Atsushi Fujita

Keyword(s):

Machine Translation ◽

Single Layer ◽

Training Data ◽

Neural Machine Translation ◽

Parallel Corpora ◽

Translation Quality ◽

Sequence Generation ◽

Sequence Modeling ◽

Back Translation

In encoder-decoder based sequence-to-sequence modeling, the most common practice is to stack a number of recurrent, convolutional, or feed-forward layers in the encoder and decoder. While the addition of each new layer improves the sequence generation quality, this also leads to a significant increase in the number of parameters. In this paper, we propose to share parameters across all layers thereby leading to a recurrently stacked sequence-to-sequence model. We report on an extensive case study on neural machine translation (NMT) using our proposed method, experimenting with a variety of datasets. We empirically show that the translation quality of a model that recurrently stacks a single-layer 6 times, despite its significantly fewer parameters, approaches that of a model that stacks 6 different layers. We also show how our method can benefit from a prevalent way for improving NMT, i.e., extending training data with pseudo-parallel corpora generated by back-translation. We then analyze the effects of recurrently stacked layers by visualizing the attentions of models that use recurrently stacked layers and models that do not. Finally, we explore the limits of parameter sharing where we share even the parameters between the encoder and decoder in addition to recurrent stacking of layers.

Download Full-text

Knowledge Graphs Enhanced Neural Machine Translation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/559 ◽

2020 ◽

Author(s):

Yang Zhao ◽

Jiajun Zhang ◽

Yu Zhou ◽

Chengqing Zong

Keyword(s):

Machine Translation ◽

Semantic Space ◽

Neural Machine Translation ◽

Translation Quality ◽

Structured Information ◽

Knowledge Graphs ◽

Japanese Translation

Knowledge graphs (KGs) store much structured information on various entities, many of which are not covered by the parallel sentence pairs of neural machine translation (NMT). To improve the translation quality of these entities, in this paper we propose a novel KGs enhanced NMT method. Specifically, we first induce the new translation results of these entities by transforming the source and target KGs into a unified semantic space. We then generate adequate pseudo parallel sentence pairs that contain these induced entity pairs. Finally, NMT model is jointly trained by the original and pseudo sentence pairs. The extensive experiments on Chinese-to-English and Englishto-Japanese translation tasks demonstrate that our method significantly outperforms the strong baseline models in translation quality, especially in handling the induced entities.

Download Full-text

Estimating Machine Translation Quality of Any Input Sentence

International Journal of Asian Language Processing ◽

10.1142/s2717554520500022 ◽

2020 ◽

Vol 30 (01) ◽

pp. 2050002

Author(s):

Taichi Aida ◽

Kazuhide Yamamoto

Keyword(s):

Machine Translation ◽

Evaluation Model ◽

Joint Probability ◽

Translation System ◽

Neural Machine Translation ◽

Translation Quality ◽

Machine Translation System ◽

Input Sentence ◽

The One

Current methods of neural machine translation may generate sentences with different levels of quality. Methods for automatically evaluating translation output from machine translation can be broadly classified into two types: a method that uses human post-edited translations for training an evaluation model, and a method that uses a reference translation that is the correct answer during evaluation. On the one hand, it is difficult to prepare post-edited translations because it is necessary to tag each word in comparison with the original translated sentences. On the other hand, users who actually employ the machine translation system do not have a correct reference translation. Therefore, we propose a method that trains the evaluation model without using human post-edited sentences and in the test set, estimates the quality of output sentences without using reference translations. We define some indices and predict the quality of translations with a regression model. For the quality of the translated sentences, we employ the BLEU score calculated from the number of word [Formula: see text]-gram matches between the translated sentence and the reference translation. After that, we compute the correlation between quality scores predicted by our method and BLEU actually computed from references. According to the experimental results, the correlation with BLEU is the highest when XGBoost uses all the indices. Moreover, looking at each index, we find that the sentence log-likelihood and the model uncertainty, which are based on the joint probability of generating the translated sentence, are important in BLEU estimation.

Download Full-text

Neural machine translation system for the Kazakh language based on synthetic corpora

MATEC Web of Conferences ◽

10.1051/matecconf/201925203006 ◽

2019 ◽

Vol 252 ◽

pp. 03006

Author(s):

Ualsher Tukeyev ◽

Aidana Karibayeva ◽

Balzhan Abduali

Keyword(s):

Machine Translation ◽

Training Data ◽

Translation System ◽

Natural Languages ◽

Neural Machine Translation ◽

Translation Quality ◽

Parallel Data ◽

Machine Translation System ◽

Turkic Languages

The lack of big parallel data is present for the Kazakh language. This problem seriously impairs the quality of machine translation from and into Kazakh. This article considers the neural machine translation of the Kazakh language on the basis of synthetic corpora. The Kazakh language belongs to the Turkic languages, which are characterised by rich morphology. Neural machine translation of natural languages requires large training data. The article will show the model for the creation of synthetic corpora, namely the generation of sentences based on complete suffixes for the Kazakh language. The novelty of this approach of the synthetic corpora generation for the Kazakh language is the generation of sentences on the basis of the complete system of suffixes of the Kazakh language. By using generated synthetic corpora we are improving the translation quality in neural machine translation of Kazakh-English and Kazakh-Russian pairs.

Download Full-text

Evaluation of English–Slovak Neural and Statistical Machine Translation

Applied Sciences ◽

10.3390/app11072948 ◽

2021 ◽

Vol 11 (7) ◽

pp. 2948

Author(s):

Lucia Benkova ◽

Dasa Munkova ◽

Ľubomír Benko ◽

Michal Munk

Keyword(s):

Machine Translation ◽

Statistical Approach ◽

Statistical Machine Translation ◽

Specific Domain ◽

Neural Network Approach ◽

Neural Machine Translation ◽

Translation Quality ◽

The Neural Network ◽

Language Pair

This study is focused on the comparison of phrase-based statistical machine translation (SMT) systems and neural machine translation (NMT) systems using automatic metrics for translation quality evaluation for the language pair of English and Slovak. As the statistical approach is the predecessor of neural machine translation, it was assumed that the neural network approach would generate results with a better quality. An experiment was performed using residuals to compare the scores of automatic metrics of the accuracy (BLEU_n) of the statistical machine translation with those of the neural machine translation. The results showed that the assumption of better neural machine translation quality regardless of the system used was confirmed. There were statistically significant differences between the SMT and NMT in favor of the NMT based on all BLEU_n scores. The neural machine translation achieved a better quality of translation of journalistic texts from English into Slovak, regardless of if it was a system trained on general texts, such as Google Translate, or specific ones, such as the European Commission’s (EC’s) tool, which was trained on a specific-domain.

Download Full-text

The impact of some linguistic features on the quality of neural machine translation

Journal of Applied Linguistics and Lexicography ◽

10.33910/2687-0215-2019-1-2-365-370 ◽

2019 ◽

Vol 1 (2) ◽

pp. 365-370

Author(s):

Elena A. Shukshina

Keyword(s):

Machine Translation ◽

Linguistic Features ◽

Neural Machine Translation ◽

The Impact

Download Full-text

Automatic evaluation of the quality of machine translation of a scientific text: the results of a five-year-long experiment

E3S Web of Conferences ◽

10.1051/e3sconf/202128408001 ◽

2021 ◽

Vol 284 ◽

pp. 08001

Author(s):

Ilya Ulitkin ◽

Irina Filippova ◽

Natalia Ivanova ◽

Alexey Poroykov

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Dramatic Improvement ◽

Automatic Evaluation ◽

Neural Machine Translation ◽

Translation Quality ◽

Automatic Translation ◽

Translation Systems ◽

Qualitative Changes

We report on various approaches to automatic evaluation of machine translation quality and describe three widely used methods. These methods, i.e. methods based on string matching and n-gram models, make it possible to compare the quality of machine translation to reference translation. We employ modern metrics for automatic evaluation of machine translation quality such as BLEU, F-measure, and TER to compare translations made by Google and PROMT neural machine translation systems with translations obtained 5 years ago, when statistical machine translation and rule-based machine translation algorithms were employed by Google and PROMT, respectively, as the main translation algorithms [6]. The evaluation of the translation quality of candidate texts generated by Google and PROMT with reference translation using an automatic translation evaluation program reveal significant qualitative changes as compared with the results obtained 5 years ago, which indicate a dramatic improvement in the work of the above-mentioned online translation systems. Ways to improve the quality of machine translation are discussed. It is shown that modern systems of automatic evaluation of translation quality allow errors made by machine translation systems to be identified and systematized, which will enable the improvement of the quality of translation by these systems in the future.

Download Full-text

Towards a Better Integration of Fuzzy Matches in Neural Machine Translation through Data Augmentation

Informatics ◽

10.3390/informatics8010007 ◽

2021 ◽

Vol 8 (1) ◽

pp. 7

Author(s):

Arda Tezcan ◽

Bram Bulté ◽

Bram Vanroy

Keyword(s):

Machine Translation ◽

Data Augmentation ◽

Sentence Length ◽

Added Value ◽

Neural Machine Translation ◽

Combination Technique ◽

Translation Quality ◽

Fuzzy Match ◽

The Impact ◽

Matching Techniques

We identify a number of aspects that can boost the performance of Neural Fuzzy Repair (NFR), an easy-to-implement method to integrate translation memory matches and neural machine translation (NMT). We explore various ways of maximising the added value of retrieved matches within the NFR paradigm for eight language combinations, using Transformer NMT systems. In particular, we test the impact of different fuzzy matching techniques, sub-word-level segmentation methods and alignment-based features on overall translation quality. Furthermore, we propose a fuzzy match combination technique that aims to maximise the coverage of source words. This is supplemented with an analysis of how translation quality is affected by input sentence length and fuzzy match score. The results show that applying a combination of the tested modifications leads to a significant increase in estimated translation quality over all baselines for all language combinations.

Download Full-text

Pre-Reordering for Neural Machine Translation: Helpful or Harmful?

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0018 ◽

2017 ◽

Vol 108 (1) ◽

pp. 171-182 ◽

Cited By ~ 5

Author(s):

Jinhua Du ◽

Andy Way

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Class ◽

Word Embeddings ◽

Neural Machine Translation ◽

Parts Of Speech ◽

Translation Quality ◽

The Impact ◽

Japanese English ◽

Target Side

AbstractPre-reordering, a preprocessing to make the source-side word orders close to those of the target side, has been proven very helpful for statistical machine translation (SMT) in improving translation quality. However, is it the case in neural machine translation (NMT)? In this paper, we firstly investigate the impact of pre-reordered source-side data on NMT, and then propose to incorporate features for the pre-reordering model in SMT as input factors into NMT (factored NMT). The features, namely parts-of-speech (POS), word class and reordered index, are encoded as feature vectors and concatenated to the word embeddings to provide extra knowledge for NMT. Pre-reordering experiments conducted on Japanese↔English and Chinese↔English show that pre-reordering the source-side data for NMT is redundant and NMT models trained on pre-reordered data deteriorate translation performance. However, factored NMT using SMT-based pre-reordering features on Japanese→English and Chinese→English is beneficial and can further improve by 4.48 and 5.89 relative BLEU points, respectively, compared to the baseline NMT system.

Download Full-text

Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0006 ◽

2017 ◽

Vol 108 (1) ◽

pp. 27-36

Author(s):

Jan-Thorsten Peter ◽

Arne Nix ◽

Hermann Ney

Keyword(s):

Target Word ◽

Machine Translation ◽

Explicit Knowledge ◽

Attention Mechanism ◽

Target Information ◽

High Quality ◽

Neural Machine Translation ◽

Translation Quality ◽

Successful Approach

AbstractNeural machine translation (NMT) has shown large improvements in recent years. The currently most successful approach in this area relies on the attention mechanism, which is often interpreted as an alignment, even though it is computed without explicit knowledge of the target word. This limitation is the most likely reason that the quality of attention-based alignments is inferior to the quality of traditional alignment methods. Guided alignment training has shown that alignments are still capable of improving translation quality. In this work, we propose an extension of the attention-based NMT model that introduces target information into the attention mechanism to produce high-quality alignments. In comparison to the conventional attention-based alignments, our model halves the Aer with an absolute improvement of 19.1% Aer. Compared to GIZA++ it shows an absolute improvement of 2.0% Aer.

Download Full-text