Lexical Chains meet Word Embeddings in Document-level Statistical Machine Translation

Abstract We integrate new mechanisms in a document-level machine translation decoder to improve the lexical consistency of document translations. First, we develop a document-level feature designed to score the lexical consistency of a translation. This feature, which applies to words that have been translated into different forms within the document, uses word embeddings to measure the adequacy of each word translation given its context. Second, we extend the decoder with a new stochastic mechanism that, at translation time, allows to introduce changes in the translation oriented to improve its lexical consistency. We evaluate our system on English–Spanish document translation, and we conduct automatic and manual assessments of its quality. The automatic evaluation metrics, applied mainly at sentence level, do not reflect significant variations. On the contrary, the manual evaluation shows that the system dealing with lexical consistency is preferred over both a standard sentence-level and a standard document-level phrase-based MT systems.

Download Full-text

Making sense of neural machine translation

Translation Spaces ◽

10.1075/ts.6.2.06for ◽

2017 ◽

Vol 6 (2) ◽

pp. 291-309 ◽

Cited By ~ 11

Author(s):

Mikel L. Forcada

Keyword(s):

Machine Translation ◽

New Technology ◽

Statistical Machine Translation ◽

Software Requirements ◽

Word Embeddings ◽

Neural Machine Translation ◽

Training Time ◽

Making Sense ◽

Translation Systems ◽

New Machine

Abstract The last few years have witnessed a surge in the interest of a new machine translation paradigm: neural machine translation (NMT). Neural machine translation is starting to displace its corpus-based predecessor, statistical machine translation (SMT). In this paper, I introduce NMT, and explain in detail, without the mathematical complexity, how neural machine translation systems work, how they are trained, and their main differences with SMT systems. The paper will try to decipher NMT jargon such as “distributed representations”, “deep learning”, “word embeddings”, “vectors”, “layers”, “weights”, “encoder”, “decoder”, and “attention”, and build upon these concepts, so that individual translators and professionals working for the translation industry as well as students and academics in translation studies can make sense of this new technology and know what to expect from it. Aspects such as how NMT output differs from SMT, and the hardware and software requirements of NMT, both at training time and at run time, on the translation industry, will be discussed.

Download Full-text

Human evaluation of three machine translation systems: from quality to attitudes by professional translators

Vigo International Journal of Applied Linguistics ◽

10.35869/vial.v0i18.3366 ◽

2021 ◽

pp. 123-148

Author(s):

Anna Fernández Torné ◽

Anna Matamala

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Evaluation Process ◽

Translation System ◽

Neural Machine Translation ◽

System A ◽

Human Evaluation ◽

Machine Translation System ◽

Translation Systems ◽

Document Level

This article aims to compare three machine translation systems with a focus on human evaluation. The systems under analysis are a domain-adapted statistical machine translation system, a domain-adapted neural machine translation system and a generic machine translation system. The comparison is carried out on translation from Spanish into German with industrial documentation of machine tool components and processes. The focus is on the human evaluation of the machine translation output, specifically on: fluency, adequacy and ranking at the segment level; fluency, adequacy, need for post-editing, ease of post-editing, and mental effort required in post-editing at the document level; productivity (post-editing speed and post-editing effort) and attitudes. Emphasis is placed on human factors in the evaluation process.

Download Full-text

Providing Morphological Information for SMT Using Neural Networks

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0026 ◽

2017 ◽

Vol 108 (1) ◽

pp. 271-282 ◽

Cited By ~ 1

Author(s):

Peyman Passban ◽

Qun Liu ◽

Andy Way

Keyword(s):

Neural Networks ◽

Machine Translation ◽

Statistical Machine Translation ◽

Language Model ◽

Language Modeling ◽

Word Embeddings ◽

Surface Form ◽

Complex Word ◽

Complex Words

Abstract Treating morphologically complex words (MCWs) as atomic units in translation would not yield a desirable result. Such words are complicated constituents with meaningful subunits. A complex word in a morphologically rich language (MRL) could be associated with a number of words or even a full sentence in a simpler language, which means the surface form of complex words should be accompanied with auxiliary morphological information in order to provide a precise translation and a better alignment. In this paper we follow this idea and propose two different methods to convey such information for statistical machine translation (SMT) models. In the first model we enrich factored SMT engines by introducing a new morphological factor which relies on subword-aware word embeddings. In the second model we focus on the language-modeling component. We explore a subword-level neural language model (NLM) to capture sequence-, word- and subword-level dependencies. Our NLM is able to approximate better scores for conditional word probabilities, so the decoder generates more fluent translations. We studied two languages Farsi and German in our experiments and observed significant improvements for both of them.

Download Full-text

Pre-Reordering for Neural Machine Translation: Helpful or Harmful?

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0018 ◽

2017 ◽

Vol 108 (1) ◽

pp. 171-182 ◽

Cited By ~ 5

Author(s):

Jinhua Du ◽

Andy Way

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Class ◽

Word Embeddings ◽

Neural Machine Translation ◽

Parts Of Speech ◽

Translation Quality ◽

The Impact ◽

Japanese English ◽

Target Side

AbstractPre-reordering, a preprocessing to make the source-side word orders close to those of the target side, has been proven very helpful for statistical machine translation (SMT) in improving translation quality. However, is it the case in neural machine translation (NMT)? In this paper, we firstly investigate the impact of pre-reordered source-side data on NMT, and then propose to incorporate features for the pre-reordering model in SMT as input factors into NMT (factored NMT). The features, namely parts-of-speech (POS), word class and reordered index, are encoded as feature vectors and concatenated to the word embeddings to provide extra knowledge for NMT. Pre-reordering experiments conducted on Japanese↔English and Chinese↔English show that pre-reordering the source-side data for NMT is redundant and NMT models trained on pre-reordered data deteriorate translation performance. However, factored NMT using SMT-based pre-reordering features on Japanese→English and Chinese→English is beneficial and can further improve by 4.48 and 5.89 relative BLEU points, respectively, compared to the baseline NMT system.

Download Full-text

Novel Document Level Features for Statistical Machine Translation

10.18653/v1/w15-2520 ◽

2015 ◽

Author(s):

Rong Zhang ◽

Abraham Ittycheriah

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Document Level

Download Full-text

Using Word Embeddings for Improving Statistical Machine Translation of Phrasal Verbs

10.18653/v1/w16-1808 ◽

2016 ◽

Cited By ~ 1

Author(s):

Kostadin Cholakov ◽

Valia Kordoni

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Embeddings ◽

Phrasal Verbs

Download Full-text

Factored Statistical Machine Translation for German-English

Journal of Applied Information, Communication and Technology ◽

10.33555/ejaict.v5i1.47 ◽

2018 ◽

Vol 5 (1) ◽

pp. 37-45

Author(s):

Darryl Yunus Sulistyan

Keyword(s):

Machine Translation ◽

English Language ◽

Statistical Machine Translation ◽

New Model ◽

Language Pair

Machine Translation is a machine that is going to automatically translate given sentences in a language to other particular language. This paper aims to test the effectiveness of a new model of machine translation which is factored machine translation. We compare the performance of the unfactored system as our baseline compared to the factored model in terms of BLEU score. We test the model in German-English language pair using Europarl corpus. The tools we are using is called MOSES. It is freely downloadable and use. We found, however, that the unfactored model scored over 24 in BLEU and outperforms the factored model which scored below 24 in BLEU for all cases. In terms of words being translated, however, all of factored models outperforms the unfactored model.

Download Full-text

Proceedings of the Workshop on Statistical Machine Translation - StatMT '06

10.3115/1654650 ◽

2006 ◽

Cited By ~ 1

Keyword(s):

Machine Translation ◽

Statistical Machine Translation

Download Full-text

Proceedings of the Second Workshop on Statistical Machine Translation - StatMT '07

10.3115/1626355 ◽

2007 ◽

Cited By ~ 1

Keyword(s):

Machine Translation ◽

Statistical Machine Translation

Download Full-text

Lexical Chains meet Word Embeddings in Document-level Statistical Machine Translation

Using Word Embeddings to Enforce Document-Level Lexical Consistency in Machine Translation

Making sense of neural machine translation

Human evaluation of three machine translation systems: from quality to attitudes by professional translators

Providing Morphological Information for SMT Using Neural Networks

Pre-Reordering for Neural Machine Translation: Helpful or Harmful?

Novel Document Level Features for Statistical Machine Translation

Using Word Embeddings for Improving Statistical Machine Translation of Phrasal Verbs

Factored Statistical Machine Translation for German-English

Proceedings of the Workshop on Statistical Machine Translation - StatMT '06

Proceedings of the Second Workshop on Statistical Machine Translation - StatMT '07

Export Citation Format