An overview of the phrase-based statistical machine translation techniques

2012 ◽  
Vol 27 (4) ◽  
pp. 413-431 ◽  
Author(s):  
Marta Ruiz Costa-jussà

AbstractThis work provides a general overview of the statistical machine translation (SMT) scientific field, which is a subfield of machine translation (MT). Specifically, this paper focuses on one of the most popular SMT approaches, that is, the phrase-based system.The phrase-based translation units are typically extracted using statistical criteria, and they are weighted using different models. These models are log-linearly combined in the decoding, which is in charge of choosing the most probable translation. Significant quality improvements have been produced from original phrase-based SMT systems. Among others, the main challenges are reordering, domain adaptation and evaluation.

2017 ◽  
Vol 108 (1) ◽  
pp. 283-294 ◽  
Author(s):  
Álvaro Peris ◽  
Mara Chinea-Ríos ◽  
Francisco Casacuberta

AbstractCorpora are precious resources, as they allow for a proper estimation of statistical machine translation models. Data selection is a variant of the domain adaptation field, aimed to extract those sentences from an out-of-domain corpus that are the most useful to translate a different target domain. We address the data selection problem in statistical machine translation as a classification task. We present a new method, based on neural networks, able to deal with monolingual and bilingual corpora. Empirical results show that our data selection method provides slightly better translation quality, compared to a state-of-the-art method (cross-entropy), requiring substantially less data. Moreover, the results obtained are coherent across different language pairs, demonstrating the robustness of our proposal.


Sign in / Sign up

Export Citation Format

Share Document