An overview of the phrase-based statistical machine translation techniques

AbstractThis work provides a general overview of the statistical machine translation (SMT) scientific field, which is a subfield of machine translation (MT). Specifically, this paper focuses on one of the most popular SMT approaches, that is, the phrase-based system.The phrase-based translation units are typically extracted using statistical criteria, and they are weighted using different models. These models are log-linearly combined in the decoding, which is in charge of choosing the most probable translation. Significant quality improvements have been produced from original phrase-based SMT systems. Among others, the main challenges are reordering, domain adaptation and evaluation.

Download Full-text

A survey of domain adaptation for statistical machine translation

Machine Translation ◽

10.1007/s10590-018-9216-8 ◽

2017 ◽

Vol 31 (4) ◽

pp. 187-224

Author(s):

Hoang Cuong ◽

Khalil Sima’an

Keyword(s):

Machine Translation ◽

Domain Adaptation ◽

Statistical Machine Translation

Download Full-text

Domain adaptation for statistical machine translation with monolingual resources

10.3115/1626431.1626468 ◽

2009 ◽

Cited By ~ 23

Author(s):

Nicola Bertoldi ◽

Marcello Federico

Keyword(s):

Machine Translation ◽

Domain Adaptation ◽

Statistical Machine Translation

Download Full-text

Neural Networks Classifier for Data Selection in Statistical Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0027 ◽

2017 ◽

Vol 108 (1) ◽

pp. 283-294 ◽

Cited By ~ 1

Author(s):

Álvaro Peris ◽

Mara Chinea-Ríos ◽

Francisco Casacuberta

Keyword(s):

Neural Networks ◽

Machine Translation ◽

Domain Adaptation ◽

Statistical Machine Translation ◽

Data Selection ◽

Target Domain ◽

Translation Quality ◽

Bilingual Corpora ◽

Proper Estimation ◽

Adaptation Field

AbstractCorpora are precious resources, as they allow for a proper estimation of statistical machine translation models. Data selection is a variant of the domain adaptation field, aimed to extract those sentences from an out-of-domain corpus that are the most useful to translate a different target domain. We address the data selection problem in statistical machine translation as a classification task. We present a new method, based on neural networks, able to deal with monolingual and bilingual corpora. Empirical results show that our data selection method provides slightly better translation quality, compared to a state-of-the-art method (cross-entropy), requiring substantially less data. Moreover, the results obtained are coherent across different language pairs, demonstrating the robustness of our proposal.

Download Full-text