Linguistic knowledge-based vocabularies for Neural Machine Translation

Abstract Neural Networks applied to Machine Translation need a finite vocabulary to express textual information as a sequence of discrete tokens. The currently dominant subword vocabularies exploit statistically-discovered common parts of words to achieve the flexibility of character-based vocabularies without delegating the whole learning of word formation to the neural network. However, they trade this for the inability to apply word-level token associations, which limits their use in semantically-rich areas and prevents some transfer learning approaches e.g. cross-lingual pretrained embeddings, and reduces their interpretability. In this work, we propose new hybrid linguistically-grounded vocabulary definition strategies that keep both the advantages of subword vocabularies and the word-level associations, enabling neural networks to profit from the derived benefits. We test the proposed approaches in both morphologically rich and poor languages, showing that, for the former, the quality in the translation of out-of-domain texts is improved with respect to a strong subword baseline.

Download Full-text

Dual-Source Transformer Model for Neural Machine Translation with Linguistic Knowledge

10.20944/preprints202002.0273.v1 ◽

2020 ◽

Author(s):

Yirong Pan ◽

Xiao Li ◽

Yating Yang ◽

Rui Dong

Keyword(s):

Machine Translation ◽

Linguistic Knowledge ◽

Linguistic Features ◽

Neural Machine Translation ◽

Popular Method ◽

Translation Quality ◽

Knowledge Based ◽

Syntactic Information ◽

Dual Source ◽

Transformer Model

Incorporating source-side linguistic knowledge into the neural machine translation (NMT) model has recently achieved impressive performance on machine translation tasks. One popular method is to generalize the word embedding layer of the encoder to encode each word and its linguistic features. The other method is to change the architecture of the encoder to encode syntactic information. However, the former cannot explicitly balance the contribution from the word and its linguistic features. The latter cannot flexibly utilize various types of linguistic information. Focusing on the above issues, this paper proposes a novel NMT approach that models the words in parallel to the linguistic knowledge by using two separate encoders. Compared with the single encoder based NMT model, the proposed approach additionally employs the knowledge-based encoder to specially encode linguistic features. Moreover, it shares parameters across encoders to enhance the model representation ability of the source-side language. Extensive experiments show that the approach achieves significant improvements of up to 2.4 and 1.1 BLEU points on Turkish→English and English→Turkish machine translation tasks, respectively, which indicates that it is capable of better utilizing the external linguistic knowledge and effective improving the machine translation quality.

Download Full-text

ABSent: Cross-Lingual Sentence Representation Mapping with Bidirectional GANs

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6279 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7756-7763 ◽

Cited By ~ 2

Author(s):

Zuohui Fu ◽

Yikun Xian ◽

Shijie Geng ◽

Yingqiang Ge ◽

Yuting Wang ◽

...

Keyword(s):

Neural Networks ◽

Transfer Learning ◽

Real World ◽

Source Code ◽

Training Data ◽

Learning Approaches ◽

Word Level ◽

Parallel Data ◽

Parallel Text ◽

Cross Lingual

A number of cross-lingual transfer learning approaches based on neural networks have been proposed for the case when large amounts of parallel text are at our disposal. However, in many real-world settings, the size of parallel annotated training data is restricted. Additionally, prior cross-lingual mapping research has mainly focused on the word level. This raises the question of whether such techniques can also be applied to effortlessly obtain cross-lingually aligned sentence representations. To this end, we propose an Adversarial Bi-directional Sentence Embedding Mapping (ABSent) framework, which learns mappings of cross-lingual sentence representations from limited quantities of parallel data. The experiments show that our method outperforms several technically more powerful approaches, especially under challenging low-resource circumstances. The source code is available from https://github.com/zuohuif/ABSent along with relevant datasets.

Download Full-text

Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation

International Journal of Asian Language Processing ◽

10.1142/s2717554520500174 ◽

2021 ◽

pp. 2050017

Author(s):

Rashmini Naranpanawa ◽

Ravinga Perera ◽

Thilakshi Fonseka ◽

Uthayasanker Thayasivam

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Translation System ◽

Rare Word ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Low Resource ◽

Word Level ◽

Morphologically Rich Languages

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.

Download Full-text

Unsupervised Neural Machine Translation with SMT as Posterior Regularization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301241 ◽

2019 ◽

Vol 33 ◽

pp. 241-248 ◽

Cited By ~ 3

Author(s):

Shuo Ren ◽

Zhirui Zhang ◽

Shujie Liu ◽

Ming Zhou ◽

Shuai Ma

Keyword(s):

Machine Translation ◽

Language Models ◽

Translation Process ◽

Weak Supervision ◽

Neural Machine Translation ◽

Back Translation ◽

Negative Effect ◽

Model Training ◽

Cross Lingual ◽

Pseudo Data

Without real bilingual corpus available, unsupervised Neural Machine Translation (NMT) typically requires pseudo parallel data generated with the back-translation method for the model training. However, due to weak supervision, the pseudo data inevitably contain noises and errors that will be accumulated and reinforced in the subsequent training process, leading to bad translation performance. To address this issue, we introduce phrase based Statistic Machine Translation (SMT) models which are robust to noisy data, as posterior regularizations to guide the training of unsupervised NMT models in the iterative back-translation process. Our method starts from SMT models built with pre-trained language models and word-level translation tables inferred from cross-lingual embeddings. Then SMT and NMT models are optimized jointly and boost each other incrementally in a unified EM framework. In this way, (1) the negative effect caused by errors in the iterative back-translation process can be alleviated timely by SMT filtering noises from its phrase tables; meanwhile, (2) NMT can compensate for the deficiency of fluency inherent in SMT. Experiments conducted on en-fr and en-de translation tasks show that our method outperforms the strong baseline and achieves new state-of-the-art unsupervised machine translation performance.

Download Full-text

Empirical Investigation of Optimization Algorithms in Neural Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0005 ◽

2017 ◽

Vol 108 (1) ◽

pp. 13-25 ◽

Cited By ~ 2

Author(s):

Parnia Bahar ◽

Tamer Alkhouli ◽

Jan-Thorsten Peter ◽

Christopher Jan-Steffen Brix ◽

Hermann Ney

Keyword(s):

Neural Networks ◽

Machine Translation ◽

Optimization Problem ◽

Empirical Investigation ◽

State Of The Art ◽

Optimization Techniques ◽

Neural Machine Translation ◽

Translation Quality ◽

And Training ◽

Dimensional Optimization

AbstractTraining neural networks is a non-convex and a high-dimensional optimization problem. In this paper, we provide a comparative study of the most popular stochastic optimization techniques used to train neural networks. We evaluate the methods in terms of convergence speed, translation quality, and training stability. In addition, we investigate combinations that seek to improve optimization in terms of these aspects. We train state-of-the-art attention-based models and apply them to perform neural machine translation. We demonstrate our results on two tasks: WMT 2016 En→Ro and WMT 2015 De→En.

Download Full-text

Neural Machine Translation between Vietnamese and English: an Empirical Study

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/35/2/13233 ◽

2019 ◽

Vol 35 (2) ◽

pp. 147-166 ◽

Cited By ~ 2

Author(s):

Hong-Hai Phan-Vu ◽

Viet Trung Tran ◽

Van Nam Nguyen ◽

Hoang Vu Dang ◽

Phan Thuan Do

Keyword(s):

Neural Networks ◽

Empirical Study ◽

Machine Translation ◽

Deep Neural Networks ◽

State Of The Art ◽

Neural Models ◽

Neural Machine Translation ◽

Parallel Corpora ◽

Parameter Search ◽

Popular Language

Machine translation is shifting to an end-to-end approach based on deep neural networks. The state of the art achieves impressive results for popular language pairs such as English - French or English - Chinese. However for English - Vietnamese the shortage of parallel corpora and expensive hyper-parameter search present practical challenges to neural-based approaches. This paper highlights our efforts on improving English-Vietnamese translations in two directions: (1) Building the largest open Vietnamese - English corpus to date, and (2) Extensive experiments with the latest neural models to achieve the highest BLEU scores. Our experiments provide practical examples of effectively employing different neural machine translation models with low-resource language pairs.

Download Full-text

Cross-lingual Supervision Improves Unsupervised Neural Machine Translation

10.18653/v1/2021.naacl-industry.12 ◽

2021 ◽

Author(s):

Mingxuan Wang ◽

Hongxiao Bai ◽

Lei Li ◽

Hai Zhao

Keyword(s):

Machine Translation ◽

Neural Machine Translation ◽

Cross Lingual

Download Full-text

Analyses of Possible Failure Mechanisms and Root Failure Causes in Power Plant Components Using Neural Networks and Structural Failure Database

Journal of Pressure Vessel Technology ◽

10.1115/1.2842186 ◽

1996 ◽

Vol 118 (2) ◽

pp. 237-246 ◽

Cited By ~ 2

Author(s):

S. Yoshimura ◽

A. S. Jovanovic

Keyword(s):

Neural Network ◽

Neural Networks ◽

Case Studies ◽

Failure Mechanisms ◽

Structural Components ◽

Type B ◽

Structural Failure ◽

Knowledge Based ◽

The Neural Network ◽

Unknown Case

This paper describes analyses of case studies on failure of structural components in power plants using hierarchical (multilayer) neural networks. Using selected test data about case studies stored in the structural failure database of a knowledge-based system, the network is trained: either to predict possible failure mechanisms like creep, overheating (OH), or overstressing (OS)-induced failure (network of Type A), or to classify a root failure cause of each case study into either a primary or secondary cause (network of Type B). In the present study, the primary root cause is defined as “manufacturing, material or design-induced causes,” while the secondary one as “not manufacturing, material or design-induced causes, e.g., failures due to operation or mal-operation.” An ordinary three-layer neural network employing the back propagation algorithm with the momentum method is utilized in this study. The results clearly show that the neural network is a powerful tool for analyzing case studies of failure in structural components. For example, the trained network of Type A predicts creep-induced failure in unknown case studies with an accuracy of 86 percent, while the network of Type B classifies root failure causes of unknown case studies with an accuracy of 88 percent. It should be noted that, due to a shortage of available case studies, an appropriate selection of case studies and input parameters to be used for network training was necessary in order to attain high accuracy. A collection of more case studies should, however, resolve this problem, and improve the accuracy of the analyses. An analysis module for case studies using the neural network has also been developed and successfully implemented in a knowledge-based system.

Download Full-text

Multi-Domain Neural Machine Translation with Word-Level Adaptive Layer-wise Domain Mixing

10.18653/v1/2020.acl-main.165 ◽

2020 ◽

Author(s):

Haoming Jiang ◽

Chen Liang ◽

Chong Wang ◽

Tuo Zhao

Keyword(s):

Machine Translation ◽

Neural Machine Translation ◽

Word Level

Download Full-text

Hybrid Combination of Machine Translation with Part-of-Speech Analysis

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.416-417.1552 ◽

2013 ◽

Vol 416-417 ◽

pp. 1552-1557

Author(s):

Xiao Xu Hu

Keyword(s):

Machine Translation ◽

Linguistic Knowledge ◽

Main Method ◽

Advantages And Disadvantages ◽

Part Of Speech ◽

Word Level ◽

The Arts ◽

Sentence Level ◽

Hybrid Framework ◽

The Rich

Hypothesis combination is a main method to improve the performance of machine translation (MT) system. The state-of-the-arts strategies include sentence-level and word-level methods, which has its own advantages and disadvantages. And, the current strategies mainly depends on the statistical method with little guidance from the rich linguistic knowledge. This paper propose hybrid framework to combine the ability of the sentence-level and word-level methods. In word-level stage, the method select the well translated words according to its part-of-speech and translation ability of this part-of-speech of the MT system which generate this word. The experimental results with different MT systems proves the effectiveness of this approach.

Download Full-text