Refining Kazakh Word Alignment Using Simulation Modeling Methods for Statistical Machine Translation

Does GIZA++ Make Search Errors?

Computational Linguistics ◽

10.1162/coli_a_00008 ◽

2010 ◽

Vol 36 (3) ◽

pp. 295-302 ◽

Cited By ~ 2

Author(s):

Sujith Ravi ◽

Kevin Knight

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Alignment ◽

Alignment Algorithm

Word alignment is a critical procedure within statistical machine translation (SMT). Brown et al. (1993) have provided the most popular word alignment algorithm to date, one that has been implemented in the GIZA (Al-Onaizan et al., 1999) and GIZA++ (Och and Ney 2003) software and adopted by nearly every SMT project. In this article, we investigate whether this algorithm makes search errors when it computes Viterbi alignments, that is, whether it returns alignments that are sub-optimal according to a trained model.

Download Full-text

Bayesian Word Alignment and Phrase Table Training for Statistical Machine Translation

IEICE Transactions on Information and Systems ◽

10.1587/transinf.e96.d.1536 ◽

2013 ◽

Vol E96.D (7) ◽

pp. 1536-1543

Author(s):

Zezhong LI ◽

Hideto IKEDA ◽

Junichi FUKUMOTO

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Alignment

Download Full-text

Parallel Treebanking Spanish-Quechua

Linguistic Issues in Language Technology ◽

10.33011/lilt.v7i.1285 ◽

2012 ◽

Vol 7 ◽

Author(s):

Annette Rios ◽

Anne Göhring ◽

Martin Volk

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Segmentation ◽

Word Alignment ◽

Alignment Quality ◽

Important Prerequisite ◽

Bilingual Lexicon ◽

Preliminary Work ◽

First Impression ◽

Agglutinative Language

Parallel treebanking is greatly facilitated by automatic word alignment. We work on building a trilingual treebank for German, Spanish and Quechua. We ran different alignment experiments on parallel Spanish-Quechua texts, measured the alignment quality, and compared these results to the figures we obtained aligning a comparable corpus of Spanish-German texts. This preliminary work has shown us the best word segmentation to use for the agglutinative language Quechua with respect to alignment. We also acquired a first impression about how well Quechua can be aligned to Spanish, an important prerequisite for bilingual lexicon extraction, parallel treebanking or statistical machine translation.

Download Full-text

Extracting parallel phrases from comparable data for machine translation

Natural Language Engineering ◽

10.1017/s1351324916000139 ◽

2016 ◽

Vol 22 (4) ◽

pp. 549-573 ◽

Cited By ~ 3

Author(s):

SANJIKA HEWAVITHARANA ◽

STEPHAN VOGEL

Keyword(s):

Machine Translation ◽

Language Processing ◽

Statistical Machine Translation ◽

Word Alignment ◽

Data Set ◽

Comparable Corpora ◽

Alignment Algorithms ◽

Extraction Algorithm ◽

Phrase Alignment ◽

Translation Systems

AbstractMining parallel data from comparable corpora is a promising approach for overcoming the data sparseness in statistical machine translation and other natural language processing applications. In this paper, we address the task of detecting parallel phrase pairs embedded in comparable sentence pairs. We present a novel phrase alignment approach that is designed to only align parallel sections bypassing non-parallel sections of the sentence. We compare the proposed approach with two other alignment methods: (1) the standard phrase extraction algorithm, which relies on the Viterbi path of the word alignment, (2) a binary classifier to detect parallel phrase pairs when presented with a large collection of phrase pair candidates. We evaluate the accuracy of these approaches using a manually aligned data set, and show that the proposed approach outperforms the other two approaches. Finally, we demonstrate the effectiveness of the extracted phrase pairs by using them in Arabic–English and Urdu–English translation systems, which resulted in improvements upto 1.2 Bleu over the baseline. The main contributions of this paper are two-fold: (1) novel phrase alignment algorithms to extract parallel phrase pairs from comparable sentences, (2) evaluating the utility of the extracted phrases by using them directly in the MT decoder.

Download Full-text

SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation

Security and Intelligent Information Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-642-25261-7_30 ◽

2012 ◽

pp. 379-390 ◽

Cited By ~ 6

Author(s):

Marcin Junczys-Dowmunt ◽

Arkadiusz Szał

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Alignment ◽

Word Alignment Models

Download Full-text

Improving Statistical Machine Translation Using Bayesian Word Alignment and Gibbs Sampling

IEEE Transactions on Audio Speech and Language Processing ◽

10.1109/tasl.2013.2244087 ◽

2013 ◽

Vol 21 (5) ◽

pp. 1090-1101 ◽

Cited By ~ 4

Author(s):

Coşkun Mermer ◽

Murat Saraclar ◽

Ruhi Sarikaya

Keyword(s):

Machine Translation ◽

Gibbs Sampling ◽

Statistical Machine Translation ◽

Word Alignment

Download Full-text

What types of word alignment improve statistical machine translation?

Machine Translation ◽

10.1007/s10590-012-9123-3 ◽

2012 ◽

Vol 26 (4) ◽

pp. 289-323 ◽

Cited By ~ 2

Author(s):

Patrik Lambert ◽

Simon Petitrenaud ◽

Yanjun Ma ◽

Andy Way

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Alignment

Download Full-text

A weighted finite state transducer translation template model for statistical machine translation

Natural Language Engineering ◽

10.1017/s1351324905003815 ◽

2005 ◽

Vol 12 (1) ◽

pp. 35-75 ◽

Cited By ~ 21

Author(s):

SHANKAR KUMAR ◽

YONGGANG DENG ◽

WILLIAM BYRNE

Keyword(s):

Machine Translation ◽

Error Rate ◽

Channel Model ◽

Statistical Machine Translation ◽

Word Alignment ◽

Alignment Error ◽

Template Model ◽

Translation Template ◽

Finite State ◽

Finite State Transducer

We present a Weighted Finite State Transducer Translation Template Model for statistical machine translation. This is a source-channel model of translation inspired by the Alignment Template translation model. The model attempts to overcome the deficiencies of word-to-word translation models by considering phrases rather than words as units of translation. The approach we describe allows us to implement each constituent distribution of the model as a weighted finite state transducer or acceptor. We show that bitext word alignment and translation under the model can be performed with standard finite state machine operations involving these transducers. One of the benefits of using this framework is that it avoids the need to develop specialized search procedures, even for the generation of lattices or N-Best lists of bitext word alignments and translation hypotheses. We report and analyze bitext word alignment and translation performance on the Hansards French-English task and the FBIS Chinese-English task under the Alignment Error Rate, BLEU, NIST and Word Error-Rate metrics. These experiments identify the contribution of each of the model components to different aspects of alignment and translation performance. We finally discuss translation performance with large bitext training sets on the NIST 2004 Chinese-English and Arabic-English MT tasks.

Download Full-text