A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena

Word reordering is one of the most difficult aspects of statistical machine translation (SMT), and an important factor of its quality and efficiency. Despite the vast amount of research published to date, the interest of the community in this problem has not decreased, and no single method appears to be strongly dominant across language pairs. Instead, the choice of the optimal approach for a new translation task still seems to be mostly driven by empirical trials. To orient the reader in this vast and complex research area, we present a comprehensive survey of word reordering viewed as a statistical modeling challenge and as a natural language phenomenon. The survey describes in detail how word reordering is modeled within different string-based and tree-based SMT frameworks and as a stand-alone task, including systematic overviews of the literature in advanced reordering modeling. We then question why some approaches are more successful than others in different language pairs. We argue that besides measuring the amount of reordering, it is important to understand which kinds of reordering occur in a given language pair. To this end, we conduct a qualitative analysis of word reordering phenomena in a diverse sample of language pairs, based on a large collection of linguistic knowledge. Empirical results in the SMT literature are shown to support the hypothesis that a few linguistic facts can be very useful to anticipate the reordering characteristics of a language pair and to select the SMT framework that best suits them.

Download Full-text

Factored Statistical Machine Translation for German-English

Journal of Applied Information, Communication and Technology ◽

10.33555/ejaict.v5i1.47 ◽

2018 ◽

Vol 5 (1) ◽

pp. 37-45

Author(s):

Darryl Yunus Sulistyan

Keyword(s):

Machine Translation ◽

English Language ◽

Statistical Machine Translation ◽

New Model ◽

Language Pair

Machine Translation is a machine that is going to automatically translate given sentences in a language to other particular language. This paper aims to test the effectiveness of a new model of machine translation which is factored machine translation. We compare the performance of the unfactored system as our baseline compared to the factored model in terms of BLEU score. We test the model in German-English language pair using Europarl corpus. The tools we are using is called MOSES. It is freely downloadable and use. We found, however, that the unfactored model scored over 24 in BLEU and outperforms the factored model which scored below 24 in BLEU for all cases. In terms of words being translated, however, all of factored models outperforms the unfactored model.

Download Full-text

English-Dogri Translation System using MOSES

Circulation in Computer Science ◽

10.22632/ccs-2016-251-25 ◽

2016 ◽

Vol 1 (1) ◽

pp. 45-49

Author(s):

Avinash Singh ◽

Asmeet Kour ◽

Shubhnandan S. Jamwal

Keyword(s):

Natural Language Processing ◽

Machine Translation ◽

Language Processing ◽

Statistical Machine Translation ◽

Translation System ◽

Parallel Corpus ◽

English System ◽

Machine Translation System ◽

Translation Machine ◽

Language Pair

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.

Download Full-text

Phrase-Based Statistical Machine Translation for a Low-Density Language Pair

Advances in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-642-13059-5_27 ◽

2010 ◽

pp. 273-277 ◽

Cited By ~ 1

Author(s):

Maxim Roy ◽

Fred Popowich

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Low Density ◽

Language Pair

Download Full-text

Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair

Language Resources and Evaluation ◽

10.1007/s10579-011-9137-0 ◽

2011 ◽

Vol 45 (2) ◽

pp. 181-208 ◽

Cited By ~ 8

Author(s):

Mireia Farrús ◽

Marta R. Costa-jussà ◽

José B. Mariño ◽

Marc Poch ◽

Adolfo Hernández ◽

...

Keyword(s):

Error Analysis ◽

Machine Translation ◽

Statistical Machine Translation ◽

Spanish Language ◽

Language Pair

Download Full-text

Evaluation of English–Slovak Neural and Statistical Machine Translation

Applied Sciences ◽

10.3390/app11072948 ◽

2021 ◽

Vol 11 (7) ◽

pp. 2948

Author(s):

Lucia Benkova ◽

Dasa Munkova ◽

Ľubomír Benko ◽

Michal Munk

Keyword(s):

Machine Translation ◽

Statistical Approach ◽

Statistical Machine Translation ◽

Specific Domain ◽

Neural Network Approach ◽

Neural Machine Translation ◽

Translation Quality ◽

The Neural Network ◽

Language Pair

This study is focused on the comparison of phrase-based statistical machine translation (SMT) systems and neural machine translation (NMT) systems using automatic metrics for translation quality evaluation for the language pair of English and Slovak. As the statistical approach is the predecessor of neural machine translation, it was assumed that the neural network approach would generate results with a better quality. An experiment was performed using residuals to compare the scores of automatic metrics of the accuracy (BLEU_n) of the statistical machine translation with those of the neural machine translation. The results showed that the assumption of better neural machine translation quality regardless of the system used was confirmed. There were statistically significant differences between the SMT and NMT in favor of the NMT based on all BLEU_n scores. The neural machine translation achieved a better quality of translation of journalistic texts from English into Slovak, regardless of if it was a system trained on general texts, such as Google Translate, or specific ones, such as the European Commission’s (EC’s) tool, which was trained on a specific-domain.

Download Full-text

Evaluating Indirect Strategies for Chinese-Spanish Statistical Machine Translation

Journal of Artificial Intelligence Research ◽

10.1613/jair.3786 ◽

2012 ◽

Vol 45 ◽

pp. 761-780 ◽

Cited By ~ 1

Author(s):

M. R. Costa-jussà ◽

C. A. Henríquez ◽

R. E. Banchs

Keyword(s):

Machine Translation ◽

Experimental Work ◽

State Of The Art ◽

Statistical Machine Translation ◽

Research Community ◽

Translation Strategy ◽

The World ◽

System Output ◽

Demographic Impact ◽

Language Pair

Although, Chinese and Spanish are two of the most spoken languages in the world, not much research has been done in machine translation for this language pair. This paper focuses on investigating the state-of-the-art of Chinese-to-Spanish statistical machine translation (SMT), which nowadays is one of the most popular approaches to machine translation. For this purpose, we report details of the available parallel corpus which are Basic Traveller Expressions Corpus (BTEC), Holy Bible and United Nations (UN). Additionally, we conduct experimental work with the largest of these three corpora to explore alternative SMT strategies by means of using a pivot language. Three alternatives are considered for pivoting: cascading, pseudo-corpus and triangulation. As pivot language, we use either English, Arabic or French. Results show that, for a phrase-based SMT system, English is the best pivot language between Chinese and Spanish. We propose a system output combination using the pivot strategies which is capable of outperforming the direct translation strategy. The main objective of this work is motivating and involving the research community to work in this important pair of languages given their demographic impact.

Download Full-text

Statistical machine translation of subtitles for highly inflected language pair

Pattern Recognition Letters ◽

10.1016/j.patrec.2014.05.012 ◽

2014 ◽

Vol 46 ◽

pp. 96-103 ◽

Cited By ~ 4

Author(s):

Mirjam Sepesy Maučec ◽

Zdravko Kačič ◽

Darinka Verdonik

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Language Pair

Download Full-text

A tree does not make a well-formed sentence: Improving syntactic string-to-tree statistical machine translation with more linguistic knowledge

Computer Speech & Language ◽

10.1016/j.csl.2014.09.002 ◽

2015 ◽

Vol 32 (1) ◽

pp. 27-45 ◽

Cited By ~ 3

Author(s):

Rico Sennrich ◽

Philip Williams ◽

Matthias Huck

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Linguistic Knowledge

Download Full-text

Hindi Chhattisgarhi Machine Translation System Using Statistical Approach

Webology ◽

10.14704/web/v18si02/web18067 ◽

2021 ◽

Vol 18 (Special Issue 02) ◽

pp. 208-222

Author(s):

Vikas Pandey ◽

Dr.M.V. Padmavati ◽

Dr. Ramesh Kumar

Keyword(s):

Machine Translation ◽

Language Processing ◽

Statistical Approach ◽

Statistical Machine Translation ◽

Target Language ◽

Translation System ◽

Parallel Corpus ◽

Machine Translation System ◽

Unknown Words ◽

Language Pair

Machine Translation is a subfield of Natural language Processing (NLP) which uses to translate source language to target language. In this paper an attempt has been made to make a Hindi Chhattisgarhi machine translation system which is based on statistical approach. In the state of Chhattisgarh there is a long awaited need for Hindi to Chhattisgarhi machine translation system for converting Hindi into Chhattisgarhi especially for non Chhattisgarhi speaking people. In order to develop Hindi Chhattisgarhi statistical machine translation system an open source software called Moses is used. Moses is a statistical machine translation system and used to automatically train the translation model for Hindi Chhattisgarhi language pair called as parallel corpus. A collection of structured text to study linguistic properties is called corpus. This machine translation system works on parallel corpus of 40,000 Hindi-Chhattisgarhi bilingual sentences. In order to overcome translation problem related to proper noun and unknown words, a transliteration system is also embedded in it. These sentences are extracted from various domains like stories, novels, text books and news papers etc. This system is tested on 1000 sentences to check the grammatical correctness of sentences and it was found that an accuracy of 75% is achieved.

Download Full-text

Improving statistical machine translation using shallow linguistic knowledge

Computer Speech & Language ◽

10.1016/j.csl.2006.06.007 ◽

2007 ◽

Vol 21 (2) ◽

pp. 350-372 ◽

Cited By ~ 5

Author(s):

Young-Sook Hwang ◽

Andrew Finch ◽

Yutaka Sasaki

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Linguistic Knowledge

Download Full-text