scholarly journals BIA: a Discriminative Phrase Alignment Toolkit

2012 ◽  
Vol 97 (1) ◽  
pp. 43-53
Author(s):  
Patrik Lambert ◽  
Rafael Banchs

BIA: a Discriminative Phrase Alignment Toolkit In most statistical machine translation systems, bilingual segments are extracted via word alignment. However, word alignment is performed independently from the requirements of the machine translation task. Furthermore, although phrase-based translation models have replaced word-based translation models nearly ten years ago, word-based models are still widely used for word alignment. In this paper we present the BIA (BIlingual Aligner) toolkit, a suite consisting of a discriminative phrase-based word alignment decoder based on linear alignment models, along with training and tuning tools. In the training phase, relative link probabilities are calculated based on an initial alignment. The tuning of the model weights may be performed directly according to machine translation metrics. We give implementation details and report results of experiments conducted on the Spanish-English Europarl task (with three corpus sizes), on the Chinese-English FBIS task, and on the Chinese-English BTEC task. The BLEU score obtained with BIA alignment is always as good or better than the one obtained with the initial alignment used to train BIA models. In addition, in four out of the five tasks, the BIA toolkit yields the best BLEU score of a collection of ten alignment systems. Finally, usage guidelines are presented.

2016 ◽  
Vol 22 (4) ◽  
pp. 549-573 ◽  
Author(s):  
SANJIKA HEWAVITHARANA ◽  
STEPHAN VOGEL

AbstractMining parallel data from comparable corpora is a promising approach for overcoming the data sparseness in statistical machine translation and other natural language processing applications. In this paper, we address the task of detecting parallel phrase pairs embedded in comparable sentence pairs. We present a novel phrase alignment approach that is designed to only align parallel sections bypassing non-parallel sections of the sentence. We compare the proposed approach with two other alignment methods: (1) the standard phrase extraction algorithm, which relies on the Viterbi path of the word alignment, (2) a binary classifier to detect parallel phrase pairs when presented with a large collection of phrase pair candidates. We evaluate the accuracy of these approaches using a manually aligned data set, and show that the proposed approach outperforms the other two approaches. Finally, we demonstrate the effectiveness of the extracted phrase pairs by using them in Arabic–English and Urdu–English translation systems, which resulted in improvements upto 1.2 Bleu over the baseline. The main contributions of this paper are two-fold: (1) novel phrase alignment algorithms to extract parallel phrase pairs from comparable sentences, (2) evaluating the utility of the extracted phrases by using them directly in the MT decoder.


2010 ◽  
Vol 36 (3) ◽  
pp. 295-302 ◽  
Author(s):  
Sujith Ravi ◽  
Kevin Knight

Word alignment is a critical procedure within statistical machine translation (SMT). Brown et al. (1993) have provided the most popular word alignment algorithm to date, one that has been implemented in the GIZA (Al-Onaizan et al., 1999) and GIZA++ (Och and Ney 2003) software and adopted by nearly every SMT project. In this article, we investigate whether this algorithm makes search errors when it computes Viterbi alignments, that is, whether it returns alignments that are sub-optimal according to a trained model.


2018 ◽  
Author(s):  
Benjamin Marie ◽  
Rui Wang ◽  
Atsushi Fujita ◽  
Masao Utiyama ◽  
Eiichiro Sumita

2016 ◽  
Vol 5 (4) ◽  
pp. 51-66 ◽  
Author(s):  
Krzysztof Wolk ◽  
Krzysztof P. Marasek

The quality of machine translation is rapidly evolving. Today one can find several machine translation systems on the web that provide reasonable translations, although the systems are not perfect. In some specific domains, the quality may decrease. A recently proposed approach to this domain is neural machine translation. It aims at building a jointly-tuned single neural network that maximizes translation performance, a very different approach from traditional statistical machine translation. Recently proposed neural machine translation models often belong to the encoder-decoder family in which a source sentence is encoded into a fixed length vector that is, in turn, decoded to generate a translation. The present research examines the effects of different training methods on a Polish-English Machine Translation system used for medical data. The European Medicines Agency parallel text corpus was used as the basis for training of neural and statistical network-based translation systems. A comparison and implementation of a medical translator is the main focus of our experiments.


2012 ◽  
Vol 20 (2) ◽  
pp. 512-523
Author(s):  
Graham Neubig ◽  
Taro Watanabe ◽  
Eiichiro Sumita ◽  
Shinsuke Mori ◽  
Tatsuya Kawahara

Sign in / Sign up

Export Citation Format

Share Document