Context-Aware Neural Machine Translation for Korean Honorific Expressions

Yongkeun Hwang; Yanghoon Kim; Kyomin Jung

doi:10.3390/electronics10131589

Context-Aware Neural Machine Translation for Korean Honorific Expressions

Electronics ◽

10.3390/electronics10131589 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1589

Author(s):

Yongkeun Hwang ◽

Yanghoon Kim ◽

Kyomin Jung

Keyword(s):

Machine Translation ◽

Deep Neural Networks ◽

Contextual Information ◽

Context Aware ◽

Neural Machine Translation ◽

Translation Quality ◽

Sentence Level ◽

Proposed Model ◽

The Given ◽

The Relationship

Neural machine translation (NMT) is one of the text generation tasks which has achieved significant improvement with the rise of deep neural networks. However, language-specific problems such as handling the translation of honorifics received little attention. In this paper, we propose a context-aware NMT to promote translation improvements of Korean honorifics. By exploiting the information such as the relationship between speakers from the surrounding sentences, our proposed model effectively manages the use of honorific expressions. Specifically, we utilize a novel encoder architecture that can represent the contextual information of the given input sentences. Furthermore, a context-aware post-editing (CAPE) technique is adopted to refine a set of inconsistent sentence-level honorific translations. To demonstrate the efficacy of the proposed method, honorific-labeled test data is required. Thus, we also design a heuristic that labels Korean sentences to distinguish between honorific and non-honorific styles. Experimental results show that our proposed method outperforms sentence-level NMT baselines both in overall translation quality and honorific translations.

Download Full-text

Efficient Context-Aware Neural Machine Translation with Layer-Wise Weighting and Input-Aware Gating

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/544 ◽

2020 ◽

Author(s):

Hongfei Xu ◽

Deyi Xiong ◽

Josef van Genabith ◽

Qiuhui Liu

Keyword(s):

Machine Translation ◽

Contextual Information ◽

Computational Cost ◽

Representation Learning ◽

Vital Role ◽

Context Aware ◽

Neural Machine Translation ◽

Gating Mechanism ◽

Sentence Level ◽

Parallel Data

Existing Neural Machine Translation (NMT) systems are generally trained on a large amount of sentence-level parallel data, and during prediction sentences are independently translated, ignoring cross-sentence contextual information. This leads to inconsistency between translated sentences. In order to address this issue, context-aware models have been proposed. However, document-level parallel data constitutes only a small part of the parallel data available, and many approaches build context-aware models based on a pre-trained frozen sentence-level translation model in a two-step training manner. The computational cost of these approaches is usually high. In this paper, we propose to make the most of layers pre-trained on sentence-level data in contextual representation learning, reusing representations from the sentence-level Transformer and significantly reducing the cost of incorporating contexts in translation. We find that representations from shallow layers of a pre-trained sentence-level encoder play a vital role in source context encoding, and propose to perform source context encoding upon weighted combinations of pre-trained encoder layers' outputs. Instead of separately performing source context and input encoding, we propose to iteratively and jointly encode the source input and its contexts and to generate input-aware context representations with a cross-attention layer and a gating mechanism, which resets irrelevant information in context encoding. Our context-aware Transformer model outperforms the recent CADec [Voita et al., 2019c] on the English-Russian subtitle data and is about twice as fast in training and decoding.

Download Full-text

Improving Context-Aware Neural Machine Translation Using Self-Attentive Sentence Embedding

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6494 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9498-9506 ◽

Cited By ~ 1

Author(s):

Hyeongu Yun ◽

Yongkeun Hwang ◽

Kyomin Jung

Keyword(s):

Machine Translation ◽

Contextual Information ◽

Context Aware ◽

Pronoun Resolution ◽

Test Set ◽

Neural Machine Translation ◽

Attentional Networks ◽

Multiple Context ◽

Sentence Level ◽

Level Information

Fully Attentional Networks (FAN) like Transformer (Vaswani et al. 2017) has shown superior results in Neural Machine Translation (NMT) tasks and has become a solid baseline for translation tasks. More recent studies also have reported experimental results that additional contextual sentences improve translation qualities of NMT models (Voita et al. 2018; Müller et al. 2018; Zhang et al. 2018). However, those studies have exploited multiple context sentences as a single long concatenated sentence, that may cause the models to suffer from inefficient computational complexities and long-range dependencies. In this paper, we propose Hierarchical Context Encoder (HCE) that is able to exploit multiple context sentences separately using the hierarchical FAN structure. Our proposed encoder first abstracts sentence-level information from preceding sentences in a self-attentive way, and then hierarchically encodes context-level information. Through extensive experiments, we observe that our HCE records the best performance measured in BLEU score on English-German, English-Turkish, and English-Korean corpus. In addition, we observe that our HCE records the best performance in a crowd-sourced test set which is designed to evaluate how well an encoder can exploit contextual information. Finally, evaluation on English-Korean pronoun resolution test suite also shows that our HCE can properly exploit contextual information.

Download Full-text

Enhancing Lexical Translation Consistency for Document-Level Neural Machine Translation

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3485469 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-21

Author(s):

Xiaomian Kang ◽

Yang Zhao ◽

Jiajun Zhang ◽

Chengqing Zong

Keyword(s):

Machine Translation ◽

English Translation ◽

Test Set ◽

Neural Machine Translation ◽

Global Context ◽

Translation Quality ◽

Sentence Level ◽

Document Level

Document-level neural machine translation (DocNMT) has yielded attractive improvements. In this article, we systematically analyze the discourse phenomena in Chinese-to-English translation, and focus on the most obvious ones, namely lexical translation consistency. To alleviate the lexical inconsistency, we propose an effective approach that is aware of the words which need to be translated consistently and constrains the model to produce more consistent translations. Specifically, we first introduce a global context extractor to extract the document context and consistency context, respectively. Then, the two types of global context are integrated into a encoder enhancer and a decoder enhancer to improve the lexical translation consistency. We create a test set to evaluate the lexical consistency automatically. Experiments demonstrate that our approach can significantly alleviate the lexical translation inconsistency. In addition, our approach can also substantially improve the translation quality compared to sentence-level Transformer.

Download Full-text

Modeling Past and Future for Neural Machine Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00011 ◽

2018 ◽

Vol 6 ◽

pp. 145-157 ◽

Cited By ~ 2

Author(s):

Zaixiang Zheng ◽

Hao Zhou ◽

Shujian Huang ◽

Lili Mou ◽

Xinyu Dai ◽

...

Keyword(s):

Machine Translation ◽

Alignment Error ◽

Source Information ◽

Neural Machine Translation ◽

Attention Model ◽

Translation Quality ◽

The Past ◽

Proposed Model ◽

Coverage Model ◽

Translation Systems

Existing neural machine translation systems do not explicitly model what has been translated and what has not during the decoding phase. To address this problem, we propose a novel mechanism that separates the source information into two parts: translated Past contents and untranslated Future contents, which are modeled by two additional recurrent layers. The Past and Future contents are fed to both the attention model and the decoder states, which provides Neural Machine Translation (NMT) systems with the knowledge of translated and untranslated contents. Experimental results show that the proposed approach significantly improves the performance in Chinese-English, German-English, and English-German translation tasks. Specifically, the proposed model outperforms the conventional coverage model in terms of both the translation quality and the alignment error rate.

Download Full-text

Low Resource Neural Machine Translation: Assamese to/from Other Indo-Aryan (Indic) Languages

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3469721 ◽

2022 ◽

Vol 21 (1) ◽

pp. 1-32

Author(s):

Rupjyoti Baruah ◽

Rajesh Kumar Mundotiya ◽

Anil Kumar Singh

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Basic Sequence ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Translation Quality ◽

Evaluation Scores ◽

Language Characteristics ◽

The Given ◽

Family Trees

Machine translation (MT) systems have been built using numerous different techniques for bridging the language barriers. These techniques are broadly categorized into approaches like Statistical Machine Translation (SMT) and Neural Machine Translation (NMT). End-to-end NMT systems significantly outperform SMT in translation quality on many language pairs, especially those with the adequate parallel corpus. We report comparative experiments on baseline MT systems for Assamese to other Indo-Aryan languages (in both translation directions) using the traditional Phrase-Based SMT as well as some more successful NMT architectures, namely basic sequence-to-sequence model with attention, Transformer, and finetuned Transformer. The results are evaluated using the most prominent and popular standard automatic metric BLEU (BiLingual Evaluation Understudy), as well as other well-known metrics for exploring the performance of different baseline MT systems, since this is the first such work involving Assamese. The evaluation scores are compared for SMT and NMT models for the effectiveness of bi-directional language pairs involving Assamese and other Indo-Aryan languages (Bangla, Gujarati, Hindi, Marathi, Odia, Sinhalese, and Urdu). The highest BLEU scores obtained are for Assamese to Sinhalese for SMT (35.63) and the Assamese to Bangla for NMT systems (seq2seq is 50.92, Transformer is 50.01, and finetuned Transformer is 50.19). We also try to relate the results with the language characteristics, distances, family trees, domains, data sizes, and sentence lengths. We find that the effect of the domain is the most important factor affecting the results for the given data domains and sizes. We compare our results with the only existing MT system for Assamese (Bing Translator) and also with pairs involving Hindi.

Download Full-text

Supervised Classification Based Machine Translation Quality Estimation

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c6029.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2697-2703

Keyword(s):

Machine Translation ◽

Supervised Classification ◽

Translation System ◽

Quality Estimation ◽

Neural Machine Translation ◽

Translation Quality ◽

Sentence Level ◽

Machine Translation System ◽

Language Text ◽

Language Pair

This submission describes the study of linguistically motivated features to estimate the translated sentence quality at sentence level on English-Hindi language pair. Several classification algorithms are employed to build the Quality Estimation (QE) models using the extracted features. We used source language text and the MT output to extract these features. Experiments show that our proposed approach is robust and producing competitive results for the DT based QE model on neural machine translation system.

Download Full-text

Temporally Grounding Language Queries in Videos by Contextual Boundary-Aware Prediction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6897 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12168-12175 ◽

Cited By ~ 6

Author(s):

Jingwen Wang ◽

Lin Ma ◽

Wenhao Jiang

Keyword(s):

Contextual Information ◽

Sliding Window ◽

Precise Localization ◽

Search Window ◽

Clear Margin ◽

Proposed Model ◽

Testing Stage ◽

Public Datasets ◽

The Given ◽

The Relationship

The task of temporally grounding language queries in videos is to temporally localize the best matched video segment corresponding to a given language (sentence). It requires certain models to simultaneously perform visual and linguistic understandings. Previous work predominantly ignores the precision of segment localization. Sliding window based methods use predefined search window sizes, which suffer from redundant computation, while existing anchor-based approaches fail to yield precise localization. We address this issue by proposing an end-to-end boundary-aware model, which uses a lightweight branch to predict semantic boundaries corresponding to the given linguistic information. To better detect semantic boundaries, we propose to aggregate contextual information by explicitly modeling the relationship between the current element and its neighbors. The most confident segments are subsequently selected based on both anchor and boundary predictions at the testing stage. The proposed model, dubbed Contextual Boundary-aware Prediction (CBP), outperforms its competitors with a clear margin on three public datasets.

Download Full-text

Replacing Linguists with Dummies: A Serious Need for Trivial Baselines in Multi-Task Neural Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2019-0005 ◽

2019 ◽

Vol 113 (1) ◽

pp. 31-40

Author(s):

Daniel Kondratyuk ◽

Ronald Cardenas ◽

Ondřej Bojar

Keyword(s):

Machine Translation ◽

Deep Neural Networks ◽

Neural Machine Translation ◽

Translation Quality ◽

Syntactic Information ◽

Multiple Tasks ◽

Recent Developments ◽

Resource Conditions ◽

Main Message ◽

Source Word

Abstract Recent developments in machine translation experiment with the idea that a model can improve the translation quality by performing multiple tasks, e.g., translating from source to target and also labeling each source word with syntactic information. The intuition is that the network would generalize knowledge over the multiple tasks, improving the translation performance, especially in low resource conditions. We devised an experiment that casts doubt on this intuition. We perform similar experiments in both multi-decoder and interleaving setups that label each target word either with a syntactic tag or a completely random tag. Surprisingly, we show that the model performs nearly as well on uncorrelated random tags as on true syntactic tags. We hint some possible explanations of this behavior. The main message from our article is that experimental results with deep neural networks should always be complemented with trivial baselines to document that the observed gain is not due to some unrelated properties of the system or training effects. True confidence in where the gains come from will probably remain problematic anyway.

Download Full-text

Balancing Quality and Human Involvement: An Effective Approach to Interactive Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6514 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9660-9667

Author(s):

Tianxiang Zhao ◽

Lemao Liu ◽

Guoping Huang ◽

Huayang Li ◽

Yingling Liu ◽

...

Keyword(s):

Reinforcement Learning ◽

Target Word ◽

Machine Translation ◽

Neural Machine Translation ◽

Translation Quality ◽

Proposed Model ◽

Learning Technique ◽

History Of ◽

The Cost ◽

Partial Translation

Conventional interactive machine translation typically requires a human translator to validate every generated target word, even though most of them are correct in the advanced neural machine translation (NMT) scenario. Previous studies have exploited confidence approaches to address the intensive human involvement issue, which request human guidance only for a few number of words with low confidences. However, such approaches do not take the history of human involvement into account, and optimize the models only for the translation quality while ignoring the cost of human involvement. In response to these pitfalls, we propose a novel interactive NMT model, which explicitly accounts the history of human involvements and particularly is optimized towards two objectives corresponding to the translation quality and the cost of human involvement, respectively. Specifically, the model jointly predicts a target word and a decision on whether to request human guidance, which is based on both the partial translation and the history of human involvements. Since there is no explicit signals on the decisions of requesting human guidance in the bilingual corpus, we optimize the model with the reinforcement learning technique which enables our model to accurately predict when to request human guidance. Simulated and real experiments show that the proposed model can achieve higher translation quality with similar or less human involvement over the confidence-based baseline.

Download Full-text

Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference Using a Delta Posterior

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6413 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8846-8853 ◽

Cited By ~ 2

Author(s):

Raphael Shu ◽

Jason Lee ◽

Hideki Nakayama ◽

Kyunghyun Cho

Keyword(s):

Machine Translation ◽

Latent Variables ◽

Latent Variable ◽

Target Sequence ◽

Inference Algorithm ◽

Inference Procedure ◽

Neural Machine Translation ◽

Translation Quality ◽

Proposed Model ◽

Recent Refinement

Although neural machine translation models reached high translation quality, the autoregressive nature makes inference difficult to parallelize and leads to high translation latency. Inspired by recent refinement-based approaches, we propose LaNMT, a latent-variable non-autoregressive model with continuous latent variables and deterministic inference procedure. In contrast to existing approaches, we use a deterministic inference algorithm to find the target sequence that maximizes the lowerbound to the log-probability. During inference, the length of translation automatically adapts itself. Our experiments show that the lowerbound can be greatly increased by running the inference algorithm, resulting in significantly improved translation quality. Our proposed model closes the performance gap between non-autoregressive and autoregressive approaches on ASPEC Ja-En dataset with 8.6x faster decoding. On WMT'14 En-De dataset, our model narrows the gap with autoregressive baseline to 2.0 BLEU points with 12.5x speedup. By decoding multiple initial latent variables in parallel and rescore using a teacher model, the proposed model further brings the gap down to 1.0 BLEU point on WMT'14 En-De task with 6.8x speedup.

Download Full-text