A hybrid word alignment approach to build bilingual lexicons for English-Arabic machine translation

Does GIZA++ Make Search Errors?

Computational Linguistics ◽

10.1162/coli_a_00008 ◽

2010 ◽

Vol 36 (3) ◽

pp. 295-302 ◽

Cited By ~ 2

Author(s):

Sujith Ravi ◽

Kevin Knight

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Alignment ◽

Alignment Algorithm

Word alignment is a critical procedure within statistical machine translation (SMT). Brown et al. (1993) have provided the most popular word alignment algorithm to date, one that has been implemented in the GIZA (Al-Onaizan et al., 1999) and GIZA++ (Och and Ney 2003) software and adopted by nearly every SMT project. In this article, we investigate whether this algorithm makes search errors when it computes Viterbi alignments, that is, whether it returns alignments that are sub-optimal according to a trained model.

Download Full-text

Refining Kazakh Word Alignment Using Simulation Modeling Methods for Statistical Machine Translation

Natural Language Processing and Chinese Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-25207-0_38 ◽

2015 ◽

pp. 421-427

Author(s):

Amandyk Kartbayev

Keyword(s):

Machine Translation ◽

Simulation Modeling ◽

Statistical Machine Translation ◽

Word Alignment ◽

Modeling Methods

Download Full-text

Enhancing Machine Translation by Integrating Linguistic Knowledge in the Word Alignment Module

2020 International Conference on Intelligent Systems and Computer Vision (ISCV) ◽

10.1109/iscv49265.2020.9204328 ◽

2020 ◽

Author(s):

Safae Berrichi ◽

Azzeddine Mazroui

Keyword(s):

Machine Translation ◽

Linguistic Knowledge ◽

Word Alignment

Download Full-text

Bayesian Word Alignment and Phrase Table Training for Statistical Machine Translation

IEICE Transactions on Information and Systems ◽

10.1587/transinf.e96.d.1536 ◽

2013 ◽

Vol E96.D (7) ◽

pp. 1536-1543

Author(s):

Zezhong LI ◽

Hideto IKEDA ◽

Junichi FUKUMOTO

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Alignment

Download Full-text

CASMACAT: An Open Source Workbench for Advanced Computer Aided Translation

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2013-0016 ◽

2013 ◽

Vol 100 (1) ◽

pp. 101-112 ◽

Cited By ~ 19

Author(s):

Vicent Alabau ◽

Ragnar Bonk ◽

Christian Buck ◽

Michael Carl ◽

Francisco Casacuberta ◽

...

Keyword(s):

Open Source ◽

Machine Translation ◽

Word Alignment ◽

Computer Aided ◽

Advanced Computer

Abstract We describe an open source workbench that offers advanced computer aided translation (CAT) functionality: post-editing machine translation (MT), interactive translation prediction (ITP), visualization of word alignment, extensive logging with replay mode, integration with eye trackers and e-pen.

Download Full-text

Parallel Treebanking Spanish-Quechua

Linguistic Issues in Language Technology ◽

10.33011/lilt.v7i.1285 ◽

2012 ◽

Vol 7 ◽

Author(s):

Annette Rios ◽

Anne Göhring ◽

Martin Volk

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Segmentation ◽

Word Alignment ◽

Alignment Quality ◽

Important Prerequisite ◽

Bilingual Lexicon ◽

Preliminary Work ◽

First Impression ◽

Agglutinative Language

Parallel treebanking is greatly facilitated by automatic word alignment. We work on building a trilingual treebank for German, Spanish and Quechua. We ran different alignment experiments on parallel Spanish-Quechua texts, measured the alignment quality, and compared these results to the figures we obtained aligning a comparable corpus of Spanish-German texts. This preliminary work has shown us the best word segmentation to use for the agglutinative language Quechua with respect to alignment. We also acquired a first impression about how well Quechua can be aligned to Spanish, an important prerequisite for bilingual lexicon extraction, parallel treebanking or statistical machine translation.

Download Full-text

Improving statistical word alignment with a rule-based machine translation system

10.3115/1220355.1220360 ◽

2004 ◽

Cited By ~ 1

Author(s):

Wu Hua ◽

Wang Haifeng

Keyword(s):

Machine Translation ◽

Translation System ◽

Word Alignment ◽

Rule Based ◽

Machine Translation System

Download Full-text

Extracting parallel phrases from comparable data for machine translation

Natural Language Engineering ◽

10.1017/s1351324916000139 ◽

2016 ◽

Vol 22 (4) ◽

pp. 549-573 ◽

Cited By ~ 3

Author(s):

SANJIKA HEWAVITHARANA ◽

STEPHAN VOGEL

Keyword(s):

Machine Translation ◽

Language Processing ◽

Statistical Machine Translation ◽

Word Alignment ◽

Data Set ◽

Comparable Corpora ◽

Alignment Algorithms ◽

Extraction Algorithm ◽

Phrase Alignment ◽

Translation Systems

AbstractMining parallel data from comparable corpora is a promising approach for overcoming the data sparseness in statistical machine translation and other natural language processing applications. In this paper, we address the task of detecting parallel phrase pairs embedded in comparable sentence pairs. We present a novel phrase alignment approach that is designed to only align parallel sections bypassing non-parallel sections of the sentence. We compare the proposed approach with two other alignment methods: (1) the standard phrase extraction algorithm, which relies on the Viterbi path of the word alignment, (2) a binary classifier to detect parallel phrase pairs when presented with a large collection of phrase pair candidates. We evaluate the accuracy of these approaches using a manually aligned data set, and show that the proposed approach outperforms the other two approaches. Finally, we demonstrate the effectiveness of the extracted phrase pairs by using them in Arabic–English and Urdu–English translation systems, which resulted in improvements upto 1.2 Bleu over the baseline. The main contributions of this paper are two-fold: (1) novel phrase alignment algorithms to extract parallel phrase pairs from comparable sentences, (2) evaluating the utility of the extracted phrases by using them directly in the MT decoder.

Download Full-text

SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation

Security and Intelligent Information Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-642-25261-7_30 ◽

2012 ◽

pp. 379-390 ◽

Cited By ~ 6

Author(s):

Marcin Junczys-Dowmunt ◽

Arkadiusz Szał

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Alignment ◽

Word Alignment Models

Download Full-text

Corpus Augmentation for Neural Machine Translation with Chinese-Japanese Parallel Corpora

Applied Sciences ◽

10.3390/app9102036 ◽

2019 ◽

Vol 9 (10) ◽

pp. 2036

Author(s):

Jinyi Zhang ◽

Tadahiro Matsumoto

Keyword(s):

Machine Translation ◽

Scientific Paper ◽

Training Data ◽

Word Alignment ◽

Sentence Pair ◽

Neural Machine Translation ◽

Parallel Corpora ◽

Translation Quality ◽

Parallel Data ◽

Source Sentence

The translation quality of Neural Machine Translation (NMT) systems depends strongly on the training data size. Sufficient amounts of parallel data are, however, not available for many language pairs. This paper presents a corpus augmentation method, which has two variations: one is for all language pairs, and the other is for the Chinese-Japanese language pair. The method uses both source and target sentences of the existing parallel corpus and generates multiple pseudo-parallel sentence pairs from a long parallel sentence pair containing punctuation marks as follows: (1) split the sentence pair into parallel partial sentences; (2) back-translate the target partial sentences; and (3) replace each partial sentence in the source sentence with the back-translated target partial sentence to generate pseudo-source sentences. The word alignment information, which is used to determine the split points, is modified with “shared Chinese character rates” in segments of the sentence pairs. The experiment results of the Japanese-Chinese and Chinese-Japanese translation with ASPEC-JC (Asian Scientific Paper Excerpt Corpus, Japanese-Chinese) show that the method substantially improves translation performance. We also supply the code (see Supplementary Materials) that can reproduce our proposed method.

Download Full-text