An Evaluation of Neural Machine Translation and Pre-trained Word Embeddings in Multilingual Neural Sentiment Analysis

Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015466 ◽

2019 ◽

Vol 33 ◽

pp. 5466-5473 ◽

Cited By ~ 2

Author(s):

Yingce Xia ◽

Tianyu He ◽

Xu Tan ◽

Fei Tian ◽

Di He ◽

...

Keyword(s):

Machine Translation ◽

English Translation ◽

State Of The Art ◽

Compact Model ◽

Word Embeddings ◽

Simple Method ◽

Neural Machine Translation ◽

German Translation ◽

One Step ◽

Target Side

Sharing source and target side vocabularies and word embeddings has been a popular practice in neural machine translation (briefly, NMT) for similar languages (e.g., English to French or German translation). The success of such wordlevel sharing motivates us to move one step further: we consider model-level sharing and tie the whole parts of the encoder and decoder of an NMT model. We share the encoder and decoder of Transformer (Vaswani et al. 2017), the state-of-the-art NMT model, and obtain a compact model named Tied Transformer. Experimental results demonstrate that such a simple method works well for both similar and dissimilar language pairs. We empirically verify our framework for both supervised NMT and unsupervised NMT: we achieve a 35.52 BLEU score on IWSLT 2014 German to English translation, 28.98/29.89 BLEU scores on WMT 2014 English to German translation without/with monolingual data, and a 22.05 BLEU score on WMT 2016 unsupervised German to English translation.

Get full-text (via PubEx)

Morphological Word Embeddings for Arabic Neural Machine Translation in Low-Resource Settings

10.18653/v1/w18-1201 ◽

2018 ◽

Cited By ~ 3

Author(s):

Pamela Shapiro ◽

Kevin Duh

Keyword(s):

Machine Translation ◽

Word Embeddings ◽

Neural Machine Translation ◽

Low Resource Settings ◽

Low Resource

Get full-text (via PubEx)

Equalizing Gender Bias in Neural Machine Translation with Word Embeddings Techniques

10.18653/v1/w19-3821 ◽

2019 ◽

Author(s):

Joel Escudé Font ◽

Marta R. Costa-jussà

Keyword(s):

Machine Translation ◽

Gender Bias ◽

Word Embeddings ◽

Neural Machine Translation

Get full-text (via PubEx)

Unsupervised Translation Quality Estimation for Digital Entertainment Content Subtitles

International Journal of Semantic Computing ◽

10.1142/s1793351x20500026 ◽

2020 ◽

Vol 14 (01) ◽

pp. 137-151

Author(s):

Prabhakar Gupta ◽

Mayank Sharma

Keyword(s):

Machine Translation ◽

Experimental Results ◽

Parallel Translation ◽

Word Embeddings ◽

Quality Estimation ◽

Neural Machine Translation ◽

Intensity Index ◽

Translation Quality ◽

Digital Entertainment ◽

Reference Corpus

We demonstrate the potential for using aligned bilingual word embeddings in developing an unsupervised method to evaluate machine translations without a need for parallel translation corpus or reference corpus. We explain different aspects of digital entertainment content subtitles. We share our experimental results for four languages pairs English to French, German, Portuguese, Spanish, and present findings on the shortcomings of Neural Machine Translation for subtitles. We propose several improvements over the system designed by Gupta et al. [P. Gupta, S. Shekhawat and K. Kumar, Unsupervised quality estimation without reference corpus for subtitle machine translation using word embeddings, IEEE 13th Int. Conf. Semantic Computing, 2019, pp. 32–38.] by incorporating custom embedding model curated to subtitles, compound word splits and punctuation inclusion. We show a massive run time improvement of the order of [Formula: see text] by considering three types of edits, removing Proximity Intensity Index (PII) and changing post-edit score calculation from their system.

Get full-text (via PubEx)

Making sense of neural machine translation

Translation Spaces ◽

10.1075/ts.6.2.06for ◽

2017 ◽

Vol 6 (2) ◽

pp. 291-309 ◽

Cited By ~ 11

Author(s):

Mikel L. Forcada

Keyword(s):

Machine Translation ◽

New Technology ◽

Statistical Machine Translation ◽

Software Requirements ◽

Word Embeddings ◽

Neural Machine Translation ◽

Training Time ◽

Making Sense ◽

Translation Systems ◽

New Machine

Abstract The last few years have witnessed a surge in the interest of a new machine translation paradigm: neural machine translation (NMT). Neural machine translation is starting to displace its corpus-based predecessor, statistical machine translation (SMT). In this paper, I introduce NMT, and explain in detail, without the mathematical complexity, how neural machine translation systems work, how they are trained, and their main differences with SMT systems. The paper will try to decipher NMT jargon such as “distributed representations”, “deep learning”, “word embeddings”, “vectors”, “layers”, “weights”, “encoder”, “decoder”, and “attention”, and build upon these concepts, so that individual translators and professionals working for the translation industry as well as students and academics in translation studies can make sense of this new technology and know what to expect from it. Aspects such as how NMT output differs from SMT, and the hardware and software requirements of NMT, both at training time and at run time, on the translation industry, will be discussed.

Get full-text (via PubEx)

When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation?

10.18653/v1/n18-2084 ◽

2018 ◽

Cited By ~ 13

Author(s):

Ye Qi ◽

Devendra Sachan ◽

Matthieu Felix ◽

Sarguna Padmanabhan ◽

Graham Neubig

Keyword(s):

Machine Translation ◽

Word Embeddings ◽

Neural Machine Translation

Get full-text (via PubEx)

Pre-Reordering for Neural Machine Translation: Helpful or Harmful?

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0018 ◽

2017 ◽

Vol 108 (1) ◽

pp. 171-182 ◽

Cited By ~ 5

Author(s):

Jinhua Du ◽

Andy Way

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Class ◽

Word Embeddings ◽

Neural Machine Translation ◽

Parts Of Speech ◽

Translation Quality ◽

The Impact ◽

Japanese English ◽

Target Side

AbstractPre-reordering, a preprocessing to make the source-side word orders close to those of the target side, has been proven very helpful for statistical machine translation (SMT) in improving translation quality. However, is it the case in neural machine translation (NMT)? In this paper, we firstly investigate the impact of pre-reordered source-side data on NMT, and then propose to incorporate features for the pre-reordering model in SMT as input factors into NMT (factored NMT). The features, namely parts-of-speech (POS), word class and reordered index, are encoded as feature vectors and concatenated to the word embeddings to provide extra knowledge for NMT. Pre-reordering experiments conducted on Japanese↔English and Chinese↔English show that pre-reordering the source-side data for NMT is redundant and NMT models trained on pre-reordered data deteriorate translation performance. However, factored NMT using SMT-based pre-reordering features on Japanese→English and Chinese→English is beneficial and can further improve by 4.48 and 5.89 relative BLEU points, respectively, compared to the baseline NMT system.

Get full-text (via PubEx)

Tackling neural machine translation in low-resource settings: a Portuguese case study

10.5753/stil.2021.17807 ◽

2021 ◽

Author(s):

Arthur T. Estrella ◽

João B. O. Souza Filho

Keyword(s):

Machine Translation ◽

Data Augmentation ◽

Word Embeddings ◽

Effective Solution ◽

Computational Power ◽

Limited Data ◽

Neural Machine Translation ◽

Low Resource

Neural machine translation (NMT) nowadays requires an increasing amount of data and computational power, so succeeding in this task with limited data and using a single GPU might be challenging. Strategies such as the use of pre-trained word embeddings, subword embeddings, and data augmentation solutions can potentially address some issues faced in low-resource experimental settings, but their impact on the quality of translations is unclear. This work evaluates some of these strategies on two low-resource experiments beyond just reporting BLEU: errors are categorized on the Portuguese-English pair with the help of a translator, considering semantic and syntactic aspects. The BPE subword approach has shown to be the most effective solution, allowing a BLEU increase of 59% p.p. compared to the standard Transformer.

Get full-text (via PubEx)