Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation

Christoph Tillmann; Hermann Ney

doi:10.1162/089120103321337458

Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation

Computational Linguistics ◽

10.1162/089120103321337458 ◽

2003 ◽

Vol 29 (1) ◽

pp. 97-133 ◽

Cited By ~ 50

Author(s):

Christoph Tillmann ◽

Hermann Ney

Keyword(s):

Dynamic Programming ◽

Machine Translation ◽

Search Algorithm ◽

Statistical Machine Translation ◽

Target Language ◽

Search Procedure ◽

Beam Search ◽

Translation Model ◽

Novel Technique ◽

The Traveling Salesman Problem

In this article, we describe an efficient beam search algorithm for statistical machine translation based on dynamic programming (DP). The search algorithm uses the translation model presented in Brown et al. (1993). Starting from a DP-based solution to the traveling-salesman problem, we present a novel technique to restrict the possible word reorderings between source and target language in order to achieve an efficient search algorithm. Word reordering restrictions especially useful for the translation direction German to English are presented. The restrictions are generalized, and a set of four parameters to control the word reordering is introduced, which then can easily be adopted to new translation directions. The beam search procedure has been successfully tested on the Verbmobil task (German to English, 8,000-word vocabulary) and on the Canadian Hansards task (French to English, 100,000-word vocabulary). For the medium-sized Verbmobil task, a sentence can be translated in a few seconds, only a small number of search errors occur, and there is no performance degradation as measured by the word error criterion used in this article.

Get full-text (via PubEx)

The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics ◽

10.1162/0891201042544884 ◽

2004 ◽

Vol 30 (4) ◽

pp. 417-449 ◽

Cited By ~ 212

Author(s):

Franz Josef Och ◽

Hermann Ney

Keyword(s):

Machine Translation ◽

Word Order ◽

Search Algorithm ◽

Statistical Machine Translation ◽

Target Language ◽

Linear Modeling ◽

Translation Model ◽

Log Linear ◽

Translation Systems ◽

English Canadian

A phrase-based statistical machine translation approach — the alignment template approach — is described. This translation approach allows for general many-to-many relations between words. Thereby, the context of words is taken into account in the translation model, and local changes in word order from source to target language can be learned explicitly. The model is described using a log-linear modeling approach, which is a generalization of the often used source-channel approach. Thereby, the model is easier to extend than classical statistical machine translation systems. We describe in detail the process for learning phrasal translations, the feature functions used, and the search algorithm. The evaluation of this approach is performed on three different tasks. For the German-English speech Verbmobil task, we analyze the effect of various system components. On the French-English Canadian Hansards task, the alignment template system obtains significantly better results than a single-word-based translation model. In the Chinese-English 2002 National Institute of Standards and Technology (NIST) machine translation evaluation it yields statistically significantly better NIST scores than all competing research and commercial translation systems.

Get full-text (via PubEx)

Sulis: An Open Source Transfer Decoder for Deep Syntactic Statistical Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.2478/v10108-010-0005-7 ◽

2010 ◽

Vol 93 (1) ◽

pp. 17-26 ◽

Cited By ~ 1

Author(s):

Yvette Graham

Keyword(s):

Open Source ◽

Linear Combination ◽

Machine Translation ◽

Statistical Machine Translation ◽

Language Model ◽

Beam Search ◽

Translation Model ◽

Transfer Rules ◽

Log Linear

Sulis: An Open Source Transfer Decoder for Deep Syntactic Statistical Machine Translation In this paper, we describe an open source transfer decoder for Deep Syntactic Transfer-Based Statistical Machine Translation. Transfer decoding involves the application of transfer rules to a SL structure. The N-best TL structures are found via a beam search of TL hypothesis structures which are ranked via a log-linear combination of feature scores, such as translation model and dependency-based language model.

Get full-text (via PubEx)

Query Rewriting Using Monolingual Statistical Machine Translation

Computational Linguistics ◽

10.1162/coli_a_00010 ◽

2010 ◽

Vol 36 (3) ◽

pp. 569-582 ◽

Cited By ~ 23

Author(s):

Stefan Riezler ◽

Yi Liu

Keyword(s):

Machine Translation ◽

Query Expansion ◽

Web Search ◽

Query Language ◽

State Of The Art ◽

Statistical Machine Translation ◽

Target Language ◽

Translation Model ◽

User Query ◽

Query Logs

Long queries often suffer from low recall in Web search due to conjunctive term matching. The chances of matching words in relevant documents can be increased by rewriting query terms into new terms with similar statistical properties. We present a comparison of approaches that deploy user query logs to learn rewrites of query terms into terms from the document space. We show that the best results are achieved by adopting the perspective of bridging the “lexical chasm” between queries and documents by translating from a source language of user queries into a target language of Web documents. We train a state-of-the-art statistical machine translation model on query-snippet pairs from user query logs, and extract expansion terms from the query rewrites produced by the monolingual translation system. We show in an extrinsic evaluation in a real-world Web search task that the combination of a query-to-snippet translation model with a query language model achieves improved contextual query expansion compared to a state-of-the-art query expansion model that is trained on the same query log data.

Get full-text (via PubEx)

Source-Side Discontinuous Phrases for Machine Translation: A Comparative Study on Phrase Extraction and Search

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2013-0002 ◽

2013 ◽

Vol 99 (1) ◽

pp. 17-38

Author(s):

Matthias Huck ◽

Erik Scharwächter ◽

Hermann Ney

Keyword(s):

Machine Translation ◽

Large Scale ◽

Search Algorithm ◽

Statistical Machine Translation ◽

Empirical Evaluation ◽

Training Data ◽

Beam Search ◽

Phrase Extraction ◽

System Configurations ◽

Translation Systems

Abstract Standard phrase-based statistical machine translation systems generate translations based on an inventory of continuous bilingual phrases. In this work, we extend a phrase-based decoder with the ability to make use of phrases that are discontinuous in the source part. Our dynamic programming beam search algorithm supports separate pruning of coverage hypotheses per cardinality and of lexical hypotheses per coverage, as well as coverage constraints that impose restrictions on the possible reorderings. In addition to investigating these aspects, which are related to the decoding procedure, we also concentrate our attention on the question of how to obtain source-side discontinuous phrases from parallel training data. Two approaches (hierarchical and discontinuous extraction) are presented and compared. On a large-scale Chinese!English translation task, we conduct a thorough empirical evaluation in order to study a number of system configurations with source-side discontinuous phrases, and to compare them to setups which employ continuous phrases only.

Get full-text (via PubEx)

Analysis Accuracy of Similar Word Based Clustering (EWSB) Algorithm on Machine Translator Bahasa Indonesia-Minang

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v3i3.241 ◽

2018 ◽

Vol 3 (3) ◽

Author(s):

Herry Sujaini

Keyword(s):

Machine Translation ◽

Clustering Algorithm ◽

Statistical Machine Translation ◽

Target Language ◽

Word Similarity ◽

Similar Word ◽

Word Clustering ◽

Translation Accuracy ◽

Bahasa Indonesia

Extended Word Similarity Based (EWSB) Clustering is a word clustering algorithm based on the value of words similarity obtained from the computation of a corpus. One of the benefits of clustering with this algorithm is to improve the translation of a statistical machine translation. Previous research proved that EWSB algorithm could improve the Indonesian-English translator, where the algorithm was applied to Indonesian language as target language.This paper discusses the results of a research using EWSB algorithm on a Indonesian to Minang statistical machine translator, where the algorithm is applied to Minang language as the target language. The research obtained resulted that the EWSB algorithm is quite effective when used in Minang language as the target language. The results of this study indicate that EWSB algorithm can improve the translation accuracy by 6.36%.

Get full-text (via PubEx)

A Topic-Triggered Translation Model for Statistical Machine Translation

Chinese Journal of Electronics ◽

10.1049/cje.2016.10.007 ◽

2017 ◽

Vol 26 (1) ◽

pp. 65-72 ◽

Cited By ~ 2

Author(s):

Jinsong Su ◽

Zhihao Wang ◽

Qingqiang Wu ◽

Junfeng Yao ◽

Fei Long ◽

...

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Translation Model

Get full-text (via PubEx)

Optimizing Tokenization Choice for Machine Translation across Multiple Target Languages

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0025 ◽

2017 ◽

Vol 108 (1) ◽

pp. 257-269 ◽

Cited By ~ 4

Author(s):

Nasser Zalmout ◽

Nizar Habash

Keyword(s):

Machine Translation ◽

Performance Enhancement ◽

Statistical Machine Translation ◽

Target Language ◽

Source Language ◽

Context Variable ◽

Significant Performance ◽

Morphologically Rich Languages ◽

Target Languages ◽

Language Text

AbstractTokenization is very helpful for Statistical Machine Translation (SMT), especially when translating from morphologically rich languages. Typically, a single tokenization scheme is applied to the entire source-language text and regardless of the target language. In this paper, we evaluate the hypothesis that SMT performance may benefit from different tokenization schemes for different words within the same text, and also for different target languages. We apply this approach to Arabic as a source language, with five target languages of varying morphological complexity: English, French, Spanish, Russian and Chinese. Our results show that different target languages indeed require different source-language schemes; and a context-variable tokenization scheme can outperform a context-constant scheme with a statistically significant performance enhancement of about 1.4 BLEU points.

Get full-text (via PubEx)

A DP based search algorithm for statistical machine translation

10.3115/980691.980727 ◽

1998 ◽

Cited By ~ 1

Author(s):

S. Nießen ◽

S. Vogel ◽

H. Ney ◽

C. Tillmann

Keyword(s):

Machine Translation ◽

Search Algorithm ◽

Statistical Machine Translation

Get full-text (via PubEx)

MTIL2017: Machine Translation Using Recurrent Neural Network on Statistical Machine Translation

Journal of Intelligent Systems ◽

10.1515/jisys-2018-0016 ◽

2019 ◽

Vol 28 (3) ◽

pp. 447-453 ◽

Cited By ~ 5

Author(s):

Sainik Kumar Mahata ◽

Dipankar Das ◽

Sivaji Bandyopadhyay

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Language Model ◽

Target Language ◽

Data Sets ◽

Shared Task ◽

Automatic Translation ◽

External Data ◽

Statistical Mt

Abstract Machine translation (MT) is the automatic translation of the source language to its target language by a computer system. In the current paper, we propose an approach of using recurrent neural networks (RNNs) over traditional statistical MT (SMT). We compare the performance of the phrase table of SMT to the performance of the proposed RNN and in turn improve the quality of the MT output. This work has been done as a part of the shared task problem provided by the MTIL2017. We have constructed the traditional MT model using Moses toolkit and have additionally enriched the language model using external data sets. Thereafter, we have ranked the phrase tables using an RNN encoder-decoder module created originally as a part of the GroundHog project of LISA lab.

Get full-text (via PubEx)

Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models

Machine Translation: From Real Users to Research - Lecture Notes in Computer Science ◽

10.1007/978-3-540-30194-3_13 ◽

2004 ◽

pp. 115-124 ◽

Cited By ~ 78

Author(s):

Philipp Koehn

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Beam Search

Get full-text (via PubEx)