Joint Training for Pivot-based Neural Machine Translation

While recent neural machine translation approaches have delivered state-of-the-art performance for resource-rich language pairs, they suffer from the data scarcity problem for resource-scarce language pairs. Although this problem can be alleviated by exploiting a pivot language to bridge the source and target languages, the source-to-pivot and pivot-to-target translation models are usually independently trained. In this work, we introduce a joint training algorithm for pivot-based neural machine translation. We propose three methods to connect the two models and enable them to interact with each other during training. Experiments on Europarl and WMT corpora show that joint training of source-to-pivot and pivot-to-target models leads to significant improvements over independent training across various languages.

Download Full-text

Correct-and-Memorize: Learning to Translate from Interactive Revisions

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/730 ◽

2019 ◽

Cited By ~ 1

Author(s):

Rongxiang Weng ◽

Hao Zhou ◽

Shujian Huang ◽

Lei Li ◽

Yifan Xia ◽

...

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Translation Process ◽

Neural Machine Translation ◽

Human Interactions ◽

Interactive Environment ◽

Novel Method ◽

Critical Revision ◽

Target Languages ◽

Translation Errors

State-of-the-art machine translation models are still not on a par with human translators. Previous work takes human interactions into the neural machine translation process to obtain improved results in target languages. However, not all model--translation errors are equal -- some are critical while others are minor. In the meanwhile, same translation mistakes occur repeatedly in similar context. To solve both issues, we propose CAMIT, a novel method for translating in an interactive environment. Our proposed method works with critical revision instructions, therefore allows human to correct arbitrary words in model-translated sentences. In addition, CAMIT learns from and softly memorizes revision actions based on the context, alleviating the issue of repeating mistakes. Experiments in both ideal and real interactive translation settings demonstrate that our proposed CAMIT enhances machine translation results significantly while requires fewer revision instructions from human compared to previous methods.

Download Full-text

A Survey on Low-Resource Neural Machine Translation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/629 ◽

2021 ◽

Author(s):

Rui Wang ◽

Xu Tan ◽

Renqian Luo ◽

Tao Qin ◽

Tie-Yan Liu

Keyword(s):

Machine Translation ◽

Large Scale ◽

State Of The Art ◽

Neural Machine Translation ◽

Modal Data ◽

Low Resource ◽

Resource Setting ◽

Low Resource Setting ◽

Parallel Data ◽

Target Languages

Neural approaches have achieved state-of-the-art accuracy on machine translation but suffer from the high cost of collecting large scale parallel data. Thus, a lot of research has been conducted for neural machine translation (NMT) with very limited parallel data, i.e., the low-resource setting. In this paper, we provide a survey for low-resource NMT and classify related works into three categories according to the auxiliary data they used: (1) exploiting monolingual data of source and/or target languages, (2) exploiting data from auxiliary languages, and (3) exploiting multi-modal data. We hope that our survey can help researchers to better understand this field and inspire them to design better algorithms, and help industry practitioners to choose appropriate algorithms for their applications.

Download Full-text

Towards Making the Most of Context in Neural Machine Translation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/551 ◽

2020 ◽

Cited By ~ 1

Author(s):

Zaixiang Zheng ◽

Xiang Yue ◽

Shujian Huang ◽

Jiajun Chen ◽

Alexandra Birch

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Local Context ◽

Unified Approach ◽

Neural Machine Translation ◽

Global Context ◽

Level Data ◽

Sentence Level ◽

Target Languages ◽

Document Level

Document-level machine translation manages to outperform sentence level models by a small margin, but have failed to be widely adopted. We argue that previous research did not make a clear use of the global context, and propose a new document-level NMT framework that deliberately models the local context of each sentence with the awareness of the global context of the document in both source and target languages. We specifically design the model to be able to deal with documents containing any number of sentences, including single sentences. This unified approach allows our model to be trained elegantly on standard datasets without needing to train on sentence and document level data separately. Experimental results demonstrate that our model outperforms Transformer baselines and previous document-level NMT models with substantial margins of up to 2.1 BLEU on state-of-the-art baselines. We also provide analyses which show the benefit of context far beyond the neighboring two or three sentences, which previous studies have typically incorporated.

Download Full-text

Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation

International Journal of Asian Language Processing ◽

10.1142/s2717554520500174 ◽

2021 ◽

pp. 2050017

Author(s):

Rashmini Naranpanawa ◽

Ravinga Perera ◽

Thilakshi Fonseka ◽

Uthayasanker Thayasivam

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Translation System ◽

Rare Word ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Low Resource ◽

Word Level ◽

Morphologically Rich Languages

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.

Download Full-text

Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015466 ◽

2019 ◽

Vol 33 ◽

pp. 5466-5473 ◽

Cited By ~ 2

Author(s):

Yingce Xia ◽

Tianyu He ◽

Xu Tan ◽

Fei Tian ◽

Di He ◽

...

Keyword(s):

Machine Translation ◽

English Translation ◽

State Of The Art ◽

Compact Model ◽

Word Embeddings ◽

Simple Method ◽

Neural Machine Translation ◽

German Translation ◽

One Step ◽

Target Side

Sharing source and target side vocabularies and word embeddings has been a popular practice in neural machine translation (briefly, NMT) for similar languages (e.g., English to French or German translation). The success of such wordlevel sharing motivates us to move one step further: we consider model-level sharing and tie the whole parts of the encoder and decoder of an NMT model. We share the encoder and decoder of Transformer (Vaswani et al. 2017), the state-of-the-art NMT model, and obtain a compact model named Tied Transformer. Experimental results demonstrate that such a simple method works well for both similar and dissimilar language pairs. We empirically verify our framework for both supervised NMT and unsupervised NMT: we achieve a 35.52 BLEU score on IWSLT 2014 German to English translation, 28.98/29.89 BLEU scores on WMT 2014 English to German translation without/with monolingual data, and a 22.05 BLEU score on WMT 2016 unsupervised German to English translation.

Download Full-text

Deep Learning-based Roman-Urdu to Urdu Transliteration

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421520017 ◽

2020 ◽

pp. 2152001

Author(s):

Mehreen Alam ◽

Sibt ul Hussain

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Research Problem ◽

Attention Mechanism ◽

Data Driven ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Source Language ◽

Data Driven Approach ◽

Modern Machine

Attention-based encoder-decoder models have superseded conventional techniques due to their unmatched performance on many neural machine translation problems. Usually, the encoders and decoders are two recurrent neural networks where the decoder is directed to focus on relevant parts of the source language using attention mechanism. This data-driven approach leads to generic and scalable solutions with no reliance on manual hand-crafted features. To the best of our knowledge, none of the modern machine translation approaches has been applied to address the research problem of Urdu machine transliteration. Ours is the first attempt to apply the deep neural network-based encoder-decoder using attention mechanism to address the aforementioned problem using Roman-Urdu and Urdu parallel corpus. To this end, we present (i) the first ever Roman-Urdu to Urdu parallel corpus of 1.1 million sentences, (ii) three state of the art encoder-decoder models, and (iii) a detailed empirical analysis of these three models on the Roman-Urdu to Urdu parallel corpus. Overall, attention-based model gives state-of-the-art performance with the benchmark of 70 BLEU score. Our qualitative experimental evaluation shows that our models generate coherent transliterations which are grammatically and logically correct.

Download Full-text

Synchronous Bidirectional Neural Machine Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00256 ◽

2019 ◽

Vol 7 ◽

pp. 91-105 ◽

Cited By ~ 8

Author(s):

Long Zhou ◽

Jiajun Zhang ◽

Chengqing Zong

Keyword(s):

Machine Translation ◽

Large Scale ◽

State Of The Art ◽

Target Language ◽

Single Model ◽

Neural Machine Translation ◽

German Translation ◽

Transformer Model ◽

Target Side ◽

Future Information

Existing approaches to neural machine translation (NMT) generate the target language sequence token-by-token from left to right. However, this kind of unidirectional decoding framework cannot make full use of the target-side future contexts which can be produced in a right-to-left decoding direction, and thus suffers from the issue of unbalanced outputs. In this paper, we introduce a synchronous bidirectional–neural machine translation (SB-NMT) that predicts its outputs using left-to-right and right-to-left decoding simultaneously and interactively, in order to leverage both of the history and future information at the same time. Specifically, we first propose a new algorithm that enables synchronous bidirectional decoding in a single model. Then, we present an interactive decoding model in which left-to-right (right-to-left) generation does not only depend on its previously generated outputs, but also relies on future contexts predicted by right-to-left (left-to-right) decoding. We extensively evaluate the proposed SB-NMT model on large-scale NIST Chinese-English, WMT14 English-German, and WMT18 Russian-English translation tasks. Experimental results demonstrate that our model achieves significant improvements over the strong Transformer model by 3.92, 1.49, and 1.04 BLEU points, respectively, and obtains the state-of-the-art per- formance on Chinese-English and English- German translation tasks. 1

Download Full-text

Empirical Investigation of Optimization Algorithms in Neural Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0005 ◽

2017 ◽

Vol 108 (1) ◽

pp. 13-25 ◽

Cited By ~ 2

Author(s):

Parnia Bahar ◽

Tamer Alkhouli ◽

Jan-Thorsten Peter ◽

Christopher Jan-Steffen Brix ◽

Hermann Ney

Keyword(s):

Neural Networks ◽

Machine Translation ◽

Optimization Problem ◽

Empirical Investigation ◽

State Of The Art ◽

Optimization Techniques ◽

Neural Machine Translation ◽

Translation Quality ◽

And Training ◽

Dimensional Optimization

AbstractTraining neural networks is a non-convex and a high-dimensional optimization problem. In this paper, we provide a comparative study of the most popular stochastic optimization techniques used to train neural networks. We evaluate the methods in terms of convergence speed, translation quality, and training stability. In addition, we investigate combinations that seek to improve optimization in terms of these aspects. We train state-of-the-art attention-based models and apply them to perform neural machine translation. We demonstrate our results on two tasks: WMT 2016 En→Ro and WMT 2015 De→En.

Download Full-text

Biomedical Concept Recognition Using Deep Neural Sequence Models

10.1101/530337 ◽

2019 ◽

Cited By ~ 2

Author(s):

Negacy D. Hailu ◽

Michael Bada ◽

Asmelash Teka Hadgu ◽

Lawrence E. Hunter

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

State Of The Art ◽

Conditional Random Field ◽

Concept Recognition ◽

Performance Improvements ◽

Art Performance

AbstractBackgroundthe automated identification of mentions of ontological concepts in natural language texts is a central task in biomedical information extraction. Despite more than a decade of effort, performance in this task remains below the level necessary for many applications.Resultsrecently, applications of deep learning in natural language processing have demonstrated striking improvements over previously state-of-the-art performance in many related natural language processing tasks. Here we demonstrate similarly striking performance improvements in recognizing biomedical ontology concepts in full text journal articles using deep learning techniques originally developed for machine translation. For example, our best performing system improves the performance of the previous state-of-the-art in recognizing terms in the Gene Ontology Biological Process hierarchy, from a previous best F1 score of 0.40 to an F1 of 0.70, nearly halving the error rate. Nearly all other ontologies show similar performance improvements.ConclusionsA two-stage concept recognition system, which is a conditional random field model for span detection followed by a deep neural sequence model for normalization, improves the state-of-the-art performance for biomedical concept recognition. Treating the biomedical concept normalization task as a sequence-to-sequence mapping task similar to neural machine translation improves performance.

Download Full-text

Neural Machine Translation between Vietnamese and English: an Empirical Study

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/35/2/13233 ◽

2019 ◽

Vol 35 (2) ◽

pp. 147-166 ◽

Cited By ~ 2

Author(s):

Hong-Hai Phan-Vu ◽

Viet Trung Tran ◽

Van Nam Nguyen ◽

Hoang Vu Dang ◽

Phan Thuan Do

Keyword(s):

Neural Networks ◽

Empirical Study ◽

Machine Translation ◽

Deep Neural Networks ◽

State Of The Art ◽

Neural Models ◽

Neural Machine Translation ◽

Parallel Corpora ◽

Parameter Search ◽

Popular Language

Machine translation is shifting to an end-to-end approach based on deep neural networks. The state of the art achieves impressive results for popular language pairs such as English - French or English - Chinese. However for English - Vietnamese the shortage of parallel corpora and expensive hyper-parameter search present practical challenges to neural-based approaches. This paper highlights our efforts on improving English-Vietnamese translations in two directions: (1) Building the largest open Vietnamese - English corpus to date, and (2) Extensive experiments with the latest neural models to achieve the highest BLEU scores. Our experiments provide practical examples of effectively employing different neural machine translation models with low-resource language pairs.

Download Full-text