Joint Phrase Alignment and Extraction for Statistical Machine Translation

AbstractMining parallel data from comparable corpora is a promising approach for overcoming the data sparseness in statistical machine translation and other natural language processing applications. In this paper, we address the task of detecting parallel phrase pairs embedded in comparable sentence pairs. We present a novel phrase alignment approach that is designed to only align parallel sections bypassing non-parallel sections of the sentence. We compare the proposed approach with two other alignment methods: (1) the standard phrase extraction algorithm, which relies on the Viterbi path of the word alignment, (2) a binary classifier to detect parallel phrase pairs when presented with a large collection of phrase pair candidates. We evaluate the accuracy of these approaches using a manually aligned data set, and show that the proposed approach outperforms the other two approaches. Finally, we demonstrate the effectiveness of the extracted phrase pairs by using them in Arabic–English and Urdu–English translation systems, which resulted in improvements upto 1.2 Bleu over the baseline. The main contributions of this paper are two-fold: (1) novel phrase alignment algorithms to extract parallel phrase pairs from comparable sentences, (2) evaluating the utility of the extracted phrases by using them directly in the MT decoder.

Download Full-text

BIA: a Discriminative Phrase Alignment Toolkit

Prague Bulletin of Mathematical Linguistics ◽

10.2478/v10108-012-0003-z ◽

2012 ◽

Vol 97 (1) ◽

pp. 43-53

Author(s):

Patrik Lambert ◽

Rafael Banchs

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Alignment ◽

Training Phase ◽

Initial Alignment ◽

The One ◽

Linear Alignment ◽

Phrase Alignment ◽

Translation Systems ◽

Better Than

BIA: a Discriminative Phrase Alignment Toolkit In most statistical machine translation systems, bilingual segments are extracted via word alignment. However, word alignment is performed independently from the requirements of the machine translation task. Furthermore, although phrase-based translation models have replaced word-based translation models nearly ten years ago, word-based models are still widely used for word alignment. In this paper we present the BIA (BIlingual Aligner) toolkit, a suite consisting of a discriminative phrase-based word alignment decoder based on linear alignment models, along with training and tuning tools. In the training phase, relative link probabilities are calculated based on an initial alignment. The tuning of the model weights may be performed directly according to machine translation metrics. We give implementation details and report results of experiments conducted on the Spanish-English Europarl task (with three corpus sizes), on the Chinese-English FBIS task, and on the Chinese-English BTEC task. The BLEU score obtained with BIA alignment is always as good or better than the one obtained with the initial alignment used to train BIA models. In addition, in four out of the five tasks, the BIA toolkit yields the best BLEU score of a collection of ten alignment systems. Finally, usage guidelines are presented.

Download Full-text

Statistical machine translation using hierarchical phrase alignment

Systems and Computers in Japan ◽

10.1002/scj.20271 ◽

2007 ◽

Vol 38 (6) ◽

pp. 70-79 ◽

Cited By ~ 1

Author(s):

Taro Watanabe ◽

Kenji Imamura ◽

Eiichiro Sumita ◽

Hiroshi G. Okuno

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Phrase Alignment

Download Full-text

HMM word and phrase alignment for statistical machine translation

10.3115/1220575.1220597 ◽

2005 ◽

Cited By ~ 11

Author(s):

Yonggang Deng ◽

William Byrne

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Phrase Alignment

Download Full-text

Phrase alignment confidence for statistical machine translation

10.21437/interspeech.2010-686 ◽

2010 ◽

Author(s):

Sankaranarayanan Ananthakrishnan ◽

Rohit Prasad ◽

Prem Natarajan

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Phrase Alignment

Download Full-text

Factored Statistical Machine Translation for German-English

Journal of Applied Information, Communication and Technology ◽

10.33555/ejaict.v5i1.47 ◽

2018 ◽

Vol 5 (1) ◽

pp. 37-45

Author(s):

Darryl Yunus Sulistyan

Keyword(s):

Machine Translation ◽

English Language ◽

Statistical Machine Translation ◽

New Model ◽

Language Pair

Machine Translation is a machine that is going to automatically translate given sentences in a language to other particular language. This paper aims to test the effectiveness of a new model of machine translation which is factored machine translation. We compare the performance of the unfactored system as our baseline compared to the factored model in terms of BLEU score. We test the model in German-English language pair using Europarl corpus. The tools we are using is called MOSES. It is freely downloadable and use. We found, however, that the unfactored model scored over 24 in BLEU and outperforms the factored model which scored below 24 in BLEU for all cases. In terms of words being translated, however, all of factored models outperforms the unfactored model.

Download Full-text