Statistical Machine Translation

Computational Linguistics ◽

10.4018/978-1-4666-6042-7.ch043 ◽

2014 ◽

pp. 897-931

Author(s):

Lucia Specia

Keyword(s):

Machine Translation ◽

Statistical Models ◽

Statistical Machine Translation ◽

Research Paradigm ◽

Research Directions ◽

Commercial Interest ◽

Automatic Text

Statistical Machine Translation (SMT) is an approach to automatic text translation based on the use of statistical models and examples of translations. SMT is the current dominant research paradigm for machine translation and has been attracting significant commercial interest in recent years. In this chapter, the authors introduce the rationale behind SMT, describe the currently leading approach (phrase-based SMT), and present a number of emerging approaches (tree-based SMT, discriminative SMT). They also present popular metrics to evaluate the performance of SMT systems and discuss promising research directions in the field.

Download Full-text

Lexical Simplification by Unsupervised Machine Translation

International Journal of Asian Language Processing ◽

10.1142/s2717554520500083 ◽

2020 ◽

Vol 30 (02) ◽

pp. 2050008

Author(s):

Akihiro Katsuta ◽

Kazuhide Yamamoto

Keyword(s):

Information Transmission ◽

Unsupervised Learning ◽

Machine Translation ◽

Statistical Machine Translation ◽

Text Corpus ◽

Parallel Corpus ◽

Plain Text ◽

Original Meaning ◽

Text Simplification ◽

Automatic Text

In recent years, simple Japanese has been attracting attention as information transmission for foreigners. Automatic text simplification aims to reduce the complexity of vocabulary and expressions in a sentence while retaining its original meaning. This paper aims at compressing vocabulary, focusing on lexical simplification. Since the construction or expansion of a simplification corpus is very costly, we construct a simplification model by unsupervised learning that does not require a parallel corpus for simplification. We construct a simplification model that does not require a parallel corpus using Unsupervised Statistical Machine Translation. Based on a predetermined vocabulary, a pseudo-corpus for simplification is constructed from a web corpus and we learn the simplification model by the pseudo-corpus. We only need a vocabulary and a plain text corpus to train the simplification model. Moreover, we propose to clean the phrase table by WordNet, which improves the performance in BLEU and SARI metrics. By suppressing distant paraphrasing with WordNet, it became easier to select the correct paraphrase candidate.

Download Full-text

A Critique of Statistical Machine Translation

Linguistica Antverpiensia, New Series – Themes in Translation Studies ◽

10.52034/lanstts.v8i.243 ◽

2021 ◽

Vol 8 ◽

Author(s):

Andy Way

Keyword(s):

Machine Translation ◽

Statistical Models ◽

Prior Experience ◽

Statistical Machine Translation ◽

Rule Based ◽

Basic Model ◽

Mt Evaluation ◽

The Relationship ◽

Do So

Phrase-Based Statistical Machine Translation (PB-SMT) is clearly the leading paradigm in the field today. Nevertheless—and this may come as some surprise to the PB-SMT community—most translators and, somewhat more surprisingly perhaps, many experienced MT protagonists find the basic model extremely difficult to understand. The main aim of this paper, therefore, is to discuss why this might be the case. Our basic thesis is that proponents of PB-SMT do not seek to address any community other than their own, for they do not feel any need to do so. We demonstrate that this was not always the case; on the contrary, when statistical models of trans-lation were first presented, the language used to describe how such a model might work was very conciliatory, and inclusive. Over the next five years, things changed considerably; once SMT achieved dominance particularly over the rule-based paradigm, it had established a position where it did not need to bring along the rest of the MT community with it, and in our view, this has largely pertained to this day. Having discussed these issues, we discuss three additional issues: the role of automatic MT evaluation metrics when describing PB-SMT systems; the recent syntactic embellishments of PB-SMT, noting especially that most of these contributions have come from researchers who have prior experience in fields other than statistical models of translation; and the relationship between PB-SMT and other models of translation, suggesting that there are many gains to be had if the SMT community were to open up more to the other MT paradigms.

Download Full-text

Factored Statistical Machine Translation for German-English

Journal of Applied Information, Communication and Technology ◽

10.33555/ejaict.v5i1.47 ◽

2018 ◽

Vol 5 (1) ◽

pp. 37-45

Author(s):

Darryl Yunus Sulistyan

Keyword(s):

Machine Translation ◽

English Language ◽

Statistical Machine Translation ◽

New Model ◽

Language Pair

Machine Translation is a machine that is going to automatically translate given sentences in a language to other particular language. This paper aims to test the effectiveness of a new model of machine translation which is factored machine translation. We compare the performance of the unfactored system as our baseline compared to the factored model in terms of BLEU score. We test the model in German-English language pair using Europarl corpus. The tools we are using is called MOSES. It is freely downloadable and use. We found, however, that the unfactored model scored over 24 in BLEU and outperforms the factored model which scored below 24 in BLEU for all cases. In terms of words being translated, however, all of factored models outperforms the unfactored model.

Download Full-text

Proceedings of the Workshop on Statistical Machine Translation - StatMT '06

10.3115/1654650 ◽

2006 ◽

Cited By ~ 1

Keyword(s):

Machine Translation ◽

Statistical Machine Translation

Download Full-text

Proceedings of the Second Workshop on Statistical Machine Translation - StatMT '07

10.3115/1626355 ◽

2007 ◽

Cited By ~ 1

Keyword(s):

Machine Translation ◽

Statistical Machine Translation

Download Full-text

Improve Statistical Machine Translation with Context-Sensitive Bilingual Semantic Embedding Model

10.3115/v1/d14-1015 ◽

2014 ◽

Cited By ~ 3

Author(s):

Haiyang Wu ◽

Daxiang Dong ◽

Xiaoguang Hu ◽

Dianhai Yu ◽

Wei He ◽

...

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Context Sensitive ◽

Semantic Embedding

Download Full-text

Synchronous Tree Sequence Substitution Grammar for Statistical Machine Translation

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2009.01317 ◽

2009 ◽

Vol 35 (10) ◽

pp. 1317-1326

Author(s):

Hong-Fei JIANG ◽

Sheng LI ◽

Min ZHANG ◽

Tie-Jun ZHAO ◽

Mu-Yun YANG

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Sequence Substitution

Download Full-text

Analysis Accuracy of Similar Word Based Clustering (EWSB) Algorithm on Machine Translator Bahasa Indonesia-Minang

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v3i3.241 ◽

2018 ◽

Vol 3 (3) ◽

Author(s):

Herry Sujaini

Keyword(s):

Machine Translation ◽

Clustering Algorithm ◽

Statistical Machine Translation ◽

Target Language ◽

Word Similarity ◽

Similar Word ◽

Word Clustering ◽

Translation Accuracy ◽

Bahasa Indonesia

Extended Word Similarity Based (EWSB) Clustering is a word clustering algorithm based on the value of words similarity obtained from the computation of a corpus. One of the benefits of clustering with this algorithm is to improve the translation of a statistical machine translation. Previous research proved that EWSB algorithm could improve the Indonesian-English translator, where the algorithm was applied to Indonesian language as target language.This paper discusses the results of a research using EWSB algorithm on a Indonesian to Minang statistical machine translator, where the algorithm is applied to Minang language as the target language. The research obtained resulted that the EWSB algorithm is quite effective when used in Minang language as the target language. The results of this study indicate that EWSB algorithm can improve the translation accuracy by 6.36%.

Download Full-text

English-Dogri Translation System using MOSES

Circulation in Computer Science ◽

10.22632/ccs-2016-251-25 ◽

2016 ◽

Vol 1 (1) ◽

pp. 45-49

Author(s):

Avinash Singh ◽

Asmeet Kour ◽

Shubhnandan S. Jamwal

Keyword(s):

Natural Language Processing ◽

Machine Translation ◽

Language Processing ◽

Statistical Machine Translation ◽

Translation System ◽

Parallel Corpus ◽

English System ◽

Machine Translation System ◽

Translation Machine ◽

Language Pair

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.

Download Full-text