Statistical Machine Translation

Author(s):  
Lucia Specia

Statistical Machine Translation (SMT) is an approach to automatic text translation based on the use of statistical models and examples of translations. SMT is the current dominant research paradigm for machine translation and has been attracting significant commercial interest in recent years. In this chapter, the authors introduce the rationale behind SMT, describe the currently leading approach (phrase-based SMT), and present a number of emerging approaches (tree-based SMT, discriminative SMT). They also present popular metrics to evaluate the performance of SMT systems and discuss promising research directions in the field.

2014 ◽  
pp. 897-931
Author(s):  
Lucia Specia

Statistical Machine Translation (SMT) is an approach to automatic text translation based on the use of statistical models and examples of translations. SMT is the current dominant research paradigm for machine translation and has been attracting significant commercial interest in recent years. In this chapter, the authors introduce the rationale behind SMT, describe the currently leading approach (phrase-based SMT), and present a number of emerging approaches (tree-based SMT, discriminative SMT). They also present popular metrics to evaluate the performance of SMT systems and discuss promising research directions in the field.


2020 ◽  
Vol 30 (02) ◽  
pp. 2050008
Author(s):  
Akihiro Katsuta ◽  
Kazuhide Yamamoto

In recent years, simple Japanese has been attracting attention as information transmission for foreigners. Automatic text simplification aims to reduce the complexity of vocabulary and expressions in a sentence while retaining its original meaning. This paper aims at compressing vocabulary, focusing on lexical simplification. Since the construction or expansion of a simplification corpus is very costly, we construct a simplification model by unsupervised learning that does not require a parallel corpus for simplification. We construct a simplification model that does not require a parallel corpus using Unsupervised Statistical Machine Translation. Based on a predetermined vocabulary, a pseudo-corpus for simplification is constructed from a web corpus and we learn the simplification model by the pseudo-corpus. We only need a vocabulary and a plain text corpus to train the simplification model. Moreover, we propose to clean the phrase table by WordNet, which improves the performance in BLEU and SARI metrics. By suppressing distant paraphrasing with WordNet, it became easier to select the correct paraphrase candidate.


Author(s):  
Andy Way

Phrase-Based Statistical Machine Translation (PB-SMT) is clearly the leading paradigm in the field today. Nevertheless—and this may come as some surprise to the PB-SMT community—most translators and, somewhat more surprisingly perhaps, many experienced MT protagonists find the basic model extremely difficult to understand. The main aim of this paper, therefore, is to discuss why this might be the case. Our basic thesis is that proponents of PB-SMT do not seek to address any community other than their own, for they do not feel any need to do so. We demonstrate that this was not always the case; on the contrary, when statistical models of trans-lation were first presented, the language used to describe how such a model might work was very conciliatory, and inclusive. Over the next five years, things changed considerably; once SMT achieved dominance particularly over the rule-based paradigm, it had established a position where it did not need to bring along the rest of the MT community with it, and in our view, this has largely pertained to this day. Having discussed these issues, we discuss three additional issues: the role of automatic MT evaluation metrics when describing PB-SMT systems; the recent syntactic embellishments of PB-SMT, noting especially that most of these contributions have come from researchers who have prior experience in fields other than statistical models of translation; and the relationship between PB-SMT and other models of translation, suggesting that there are many gains to be had if the SMT community were to open up more to the other MT paradigms.


2018 ◽  
Vol 5 (1) ◽  
pp. 37-45
Author(s):  
Darryl Yunus Sulistyan

Machine Translation is a machine that is going to automatically translate given sentences in a language to other particular language. This paper aims to test the effectiveness of a new model of machine translation which is factored machine translation. We compare the performance of the unfactored system as our baseline compared to the factored model in terms of BLEU score. We test the model in German-English language pair using Europarl corpus. The tools we are using is called MOSES. It is freely downloadable and use. We found, however, that the unfactored model scored over 24 in BLEU and outperforms the factored model which scored below 24 in BLEU for all cases. In terms of words being translated, however, all of factored models outperforms the unfactored model.


2009 ◽  
Vol 35 (10) ◽  
pp. 1317-1326
Author(s):  
Hong-Fei JIANG ◽  
Sheng LI ◽  
Min ZHANG ◽  
Tie-Jun ZHAO ◽  
Mu-Yun YANG

Author(s):  
Herry Sujaini

Extended Word Similarity Based (EWSB) Clustering is a word clustering algorithm based on the value of words similarity obtained from the computation of a corpus. One of the benefits of clustering with this algorithm is to improve the translation of a statistical machine translation. Previous research proved that EWSB algorithm could improve the Indonesian-English translator, where the algorithm was applied to Indonesian language as target language.This paper discusses the results of a research using EWSB algorithm on a Indonesian to Minang statistical machine translator, where the algorithm is applied to Minang language as the target language. The research obtained resulted that the EWSB algorithm is quite effective when used in Minang language as the target language. The results of this study indicate that EWSB algorithm can improve the translation accuracy by 6.36%.


2016 ◽  
Vol 1 (1) ◽  
pp. 45-49
Author(s):  
Avinash Singh ◽  
Asmeet Kour ◽  
Shubhnandan S. Jamwal

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.


Sign in / Sign up

Export Citation Format

Share Document