Word Alignment Quality in the IBM 2 Mixture Model

Parallel treebanking is greatly facilitated by automatic word alignment. We work on building a trilingual treebank for German, Spanish and Quechua. We ran different alignment experiments on parallel Spanish-Quechua texts, measured the alignment quality, and compared these results to the figures we obtained aligning a comparable corpus of Spanish-German texts. This preliminary work has shown us the best word segmentation to use for the agglutinative language Quechua with respect to alignment. We also acquired a first impression about how well Quechua can be aligned to Spanish, an important prerequisite for bilingual lexicon extraction, parallel treebanking or statistical machine translation.

Download Full-text

Improving word alignment quality using morpho-syntactic information

10.3115/1220355.1220400 ◽

2004 ◽

Cited By ~ 4

Author(s):

Maja Popović ◽

Hermann Ney

Keyword(s):

Word Alignment ◽

Alignment Quality ◽

Syntactic Information

Download Full-text

Large-scale Word Alignment Using Soft Dependency Cohesion Constraints

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00228 ◽

2013 ◽

Vol 1 ◽

pp. 291-300 ◽

Cited By ~ 1

Author(s):

Zhiguo Wang ◽

Chengqing Zong

Keyword(s):

Large Scale ◽

Target Language ◽

Model Parameters ◽

Word Alignment ◽

Soft Constraint ◽

Alignment Quality ◽

Source Language ◽

Discriminative Models ◽

Translation Quality ◽

Gibbs Sampling Algorithm

Dependency cohesion refers to the observation that phrases dominated by disjoint dependency subtrees in the source language generally do not overlap in the target language. It has been verified to be a useful constraint for word alignment. However, previous work either treats this as a hard constraint or uses it as a feature in discriminative models, which is ineffective for large-scale tasks. In this paper, we take dependency cohesion as a soft constraint, and integrate it into a generative model for large-scale word alignment experiments. We also propose an approximate EM algorithm and a Gibbs sampling algorithm to estimate model parameters in an unsupervised manner. Experiments on large-scale Chinese-English translation tasks demonstrate that our model achieves improvements in both alignment quality and translation quality.

Download Full-text

Discriminative Word Alignment by Linear Modeling

Computational Linguistics ◽

10.1162/coli_a_00001 ◽

2010 ◽

Vol 36 (3) ◽

pp. 303-339 ◽

Cited By ~ 16

Author(s):

Yang Liu ◽

Qun Liu ◽

Shouxun Lin

Keyword(s):

State Of The Art ◽

Statistical Machine Translation ◽

Generative Models ◽

Target Language ◽

Word Alignment ◽

Alignment Quality ◽

Linear Modeling ◽

Parallel Text ◽

Bilingual Corpora ◽

Translation Systems

Word alignment plays an important role in many NLP tasks as it indicates the correspondence between words in a parallel text. Although widely used to align large bilingual corpora, generative models are hard to extend to incorporate arbitrary useful linguistic information. This article presents a discriminative framework for word alignment based on a linear model. Within this framework, all knowledge sources are treated as feature functions, which depend on a source language sentence, a target language sentence, and the alignment between them. We describe a number of features that could produce symmetric alignments. Our model is easy to extend and can be optimized with respect to evaluation metrics directly. The model achieves state-of-the-art alignment quality on three word alignment shared tasks for five language pairs with varying divergence and richness of resources. We further show that our approach improves translation performance for various statistical machine translation systems.

Download Full-text

Improving Word Alignment Quality by Relearning Translation Models

Journal of Natural Language Processing ◽

10.5715/jnlp.12.2_175 ◽

2005 ◽

Vol 12 (2) ◽

pp. 175-188

Author(s):

SETSUO YAMADA ◽

MASAAKI NAGATA ◽

KENJI YAMADA

Keyword(s):

Word Alignment ◽

Alignment Quality

Download Full-text

Measuring Word Alignment Quality for Statistical Machine Translation

Computational Linguistics ◽

10.1162/coli.2007.33.3.293 ◽

2007 ◽

Vol 33 (3) ◽

pp. 293-303 ◽

Cited By ~ 39

Author(s):

Alexander Fraser ◽

Daniel Marcu

Keyword(s):

Machine Translation ◽

Recent Literature ◽

Statistical Machine Translation ◽

Critical Role ◽

Word Alignment ◽

Alignment Error ◽

Alignment Quality ◽

Alignment Task ◽

State Of Affairs ◽

The Relationship

Automatic word alignment plays a critical role in statistical machine translation. Unfortunately, the relationship between alignment quality and statistical machine translation performance has not been well understood. In the recent literature, the alignment task has frequently been decoupled from the translation task and assumptions have been made about measuring alignment quality for machine translation which, it turns out, are not justified. In particular, none of the tens of papers published over the last five years has shown that significant decreases in alignment error rate (AER) result in significant increases in translation performance. This paper explains this state of affairs and presents steps towards measuring alignment quality in a way which is predictive of statistical machine translation performance.

Download Full-text

Estimating Word Alignment Quality for SMT Reordering Tasks

10.3115/v1/w14-3334 ◽

2014 ◽

Author(s):

Sara Stymne ◽

Jörg Tiedemann ◽

Joakim Nivre

Keyword(s):

Word Alignment ◽

Alignment Quality

Download Full-text

The association of Raynaud’s phenomenon/syndrome with scleroderma and Sjögren’s syndrome – a meta-analysis

VASA ◽

10.1024/0301-1526.37.s73.26 ◽

2008 ◽

Vol 37 (Supplement 73) ◽

pp. 26-32 ◽

Cited By ~ 1

Author(s):

Schlattmann ◽

Höhne ◽

Plümper ◽

Heidrich

Keyword(s):

Quantitative Analysis ◽

Sjögren’S Syndrome ◽

Mixture Model ◽

Sjögren's Syndrome ◽

Meta Analysis ◽

Sjogren’S Syndrome ◽

Sjogren's Syndrome ◽

Raynaud’S Syndrome ◽

Raynaud's Syndrome ◽

Sjögren‘S Syndrome

Background: In order to analyze the prevalence of Raynaud’s syndrome in diseases such as scleroderma and Sjögren’s syndrom – a meta-analysis of published data was performed. Methods: The PubMed data base of the National Library of Medicine was used for studies dealing with Raynaud’s syndrome and scleroderma or Raynaud’s syndroem and Sjögren’s syndrom respectively. The studies found provided data sufficient to estimate the prevalence of Raynaud’s syndrome. The statistical analysis was based on methods for a fixed effects meta-analysis and finite mixture model for proportions. Results: For scleroderma a pooled prevalence of 80.9% and 95% CI (0.78, 0.83) was obtained. A mixture model analysis found four latent classes. We identified a class with a very low prevalence of 11%, weighted with 0.15. On the other hand there is a class with a very high prevalence of 96%. Analysing the association with Sjögren’s syndrome, the pooled analysis leads to a prevalence of Raynaud’s syndrome of 32%, 95% CI(26.7%, 37.7%). A mixture model finds a solution with two latent classes. Here, 38% of the studies show a prevalence of 18.8% whereas 62% observe a prevalence of 38.3%. Conclusion: There is strong variability of studies reporting the prevalence of Raynaud’s syndrome in patients suffering from scleroderma or Sjögren’s syndrome. The available data are insufficient to perform a proper quantitative analysis of the association of Raynaud’s phenomenon with scleroderma or Sjögren’s syndrome. Properly planned and reported epidemiological studies are needed in order to perform a thorough quantitative analysis of risk factors for Raynaud’s syndrome.

Download Full-text