2012 ◽  
Vol 7 ◽  
Author(s):  
Annette Rios ◽  
Anne Göhring ◽  
Martin Volk

Parallel treebanking is greatly facilitated by automatic word alignment. We work on building a trilingual treebank for German, Spanish and Quechua. We ran different alignment experiments on parallel Spanish-Quechua texts, measured the alignment quality, and compared these results to the figures we obtained aligning a comparable corpus of Spanish-German texts. This preliminary work has shown us the best word segmentation to use for the agglutinative language Quechua with respect to alignment. We also acquired a first impression about how well Quechua can be aligned to Spanish, an important prerequisite for bilingual lexicon extraction, parallel treebanking or statistical machine translation.


2013 ◽  
Vol 1 ◽  
pp. 291-300 ◽  
Author(s):  
Zhiguo Wang ◽  
Chengqing Zong

Dependency cohesion refers to the observation that phrases dominated by disjoint dependency subtrees in the source language generally do not overlap in the target language. It has been verified to be a useful constraint for word alignment. However, previous work either treats this as a hard constraint or uses it as a feature in discriminative models, which is ineffective for large-scale tasks. In this paper, we take dependency cohesion as a soft constraint, and integrate it into a generative model for large-scale word alignment experiments. We also propose an approximate EM algorithm and a Gibbs sampling algorithm to estimate model parameters in an unsupervised manner. Experiments on large-scale Chinese-English translation tasks demonstrate that our model achieves improvements in both alignment quality and translation quality.


2010 ◽  
Vol 36 (3) ◽  
pp. 303-339 ◽  
Author(s):  
Yang Liu ◽  
Qun Liu ◽  
Shouxun Lin

Word alignment plays an important role in many NLP tasks as it indicates the correspondence between words in a parallel text. Although widely used to align large bilingual corpora, generative models are hard to extend to incorporate arbitrary useful linguistic information. This article presents a discriminative framework for word alignment based on a linear model. Within this framework, all knowledge sources are treated as feature functions, which depend on a source language sentence, a target language sentence, and the alignment between them. We describe a number of features that could produce symmetric alignments. Our model is easy to extend and can be optimized with respect to evaluation metrics directly. The model achieves state-of-the-art alignment quality on three word alignment shared tasks for five language pairs with varying divergence and richness of resources. We further show that our approach improves translation performance for various statistical machine translation systems.


2005 ◽  
Vol 12 (2) ◽  
pp. 175-188
Author(s):  
SETSUO YAMADA ◽  
MASAAKI NAGATA ◽  
KENJI YAMADA

2007 ◽  
Vol 33 (3) ◽  
pp. 293-303 ◽  
Author(s):  
Alexander Fraser ◽  
Daniel Marcu

Automatic word alignment plays a critical role in statistical machine translation. Unfortunately, the relationship between alignment quality and statistical machine translation performance has not been well understood. In the recent literature, the alignment task has frequently been decoupled from the translation task and assumptions have been made about measuring alignment quality for machine translation which, it turns out, are not justified. In particular, none of the tens of papers published over the last five years has shown that significant decreases in alignment error rate (AER) result in significant increases in translation performance. This paper explains this state of affairs and presents steps towards measuring alignment quality in a way which is predictive of statistical machine translation performance.


2014 ◽  
Author(s):  
Sara Stymne ◽  
Jörg Tiedemann ◽  
Joakim Nivre

VASA ◽  
2008 ◽  
Vol 37 (Supplement 73) ◽  
pp. 26-32 ◽  
Author(s):  
Schlattmann ◽  
Höhne ◽  
Plümper ◽  
Heidrich

Background: In order to analyze the prevalence of Raynaud’s syndrome in diseases such as scleroderma and Sjögren’s syndrom – a meta-analysis of published data was performed. Methods: The PubMed data base of the National Library of Medicine was used for studies dealing with Raynaud’s syndrome and scleroderma or Raynaud’s syndroem and Sjögren’s syndrom respectively. The studies found provided data sufficient to estimate the prevalence of Raynaud’s syndrome. The statistical analysis was based on methods for a fixed effects meta-analysis and finite mixture model for proportions. Results: For scleroderma a pooled prevalence of 80.9% and 95% CI (0.78, 0.83) was obtained. A mixture model analysis found four latent classes. We identified a class with a very low prevalence of 11%, weighted with 0.15. On the other hand there is a class with a very high prevalence of 96%. Analysing the association with Sjögren’s syndrome, the pooled analysis leads to a prevalence of Raynaud’s syndrome of 32%, 95% CI(26.7%, 37.7%). A mixture model finds a solution with two latent classes. Here, 38% of the studies show a prevalence of 18.8% whereas 62% observe a prevalence of 38.3%. Conclusion: There is strong variability of studies reporting the prevalence of Raynaud’s syndrome in patients suffering from scleroderma or Sjögren’s syndrome. The available data are insufficient to perform a proper quantitative analysis of the association of Raynaud’s phenomenon with scleroderma or Sjögren’s syndrome. Properly planned and reported epidemiological studies are needed in order to perform a thorough quantitative analysis of risk factors for Raynaud’s syndrome.


Sign in / Sign up

Export Citation Format

Share Document