Combining Diverse Word-Alignment Symmetrizations Improves Dependency Tree Projection

Author(s):  
David Mareček
2015 ◽  
Author(s):  
Hoang Cuong ◽  
Khalil Sima'an
Keyword(s):  

2014 ◽  
Author(s):  
Yin-Wen Chang ◽  
Alexander M. Rush ◽  
John DeNero ◽  
Michael Collins
Keyword(s):  

Database ◽  
2018 ◽  
Vol 2018 ◽  
Author(s):  
Neha Warikoo ◽  
Yung-Chun Chang ◽  
Wen-Lian Hsu

2021 ◽  
Vol 11 (4) ◽  
pp. 1868
Author(s):  
Sari Dewi Budiwati ◽  
Al Hafiz Akbar Maulana Siagian ◽  
Tirana Noor Fatyanosa ◽  
Masayoshi Aritsugi

Phrase table combination in pivot approaches can be an effective method to deal with low-resource language pairs. The common practice to generate phrase tables in pivot approaches is to use standard symmetrization, i.e., grow-diag-final-and. Although some researchers found that the use of non-standard symmetrization could improve bilingual evaluation understudy (BLEU) scores, the use of non-standard symmetrization has not been commonly employed in pivot approaches. In this study, we propose a strategy that uses the non-standard symmetrization of word alignment in phrase table combination. The appropriate symmetrization is selected based on the highest BLEU scores in each direct translation of source–target, source–pivot, and pivot–target of Kazakh–English (Kk–En) and Japanese–Indonesian (Ja–Id). Our experiments show that our proposed strategy outperforms the direct translation in Kk–En with absolute improvements of 0.35 (a 11.3% relative improvement) and 0.22 (a 6.4% relative improvement) BLEU points for 3-gram and 5-gram, respectively. The proposed strategy shows an absolute gain of up to 0.11 (a 0.9% relative improvement) BLEU points compared to direct translation for 3-gram in Ja–Id. Our proposed strategy using a small phrase table obtains better BLEU scores than a strategy using a large phrase table. The size of the target monolingual and feature function weight of the language model (LM) could reduce perplexity scores.


Author(s):  
Peng Yin ◽  
Zhou Shu ◽  
Yingjun Xia ◽  
Tianmei Shen ◽  
Xiao Guan ◽  
...  
Keyword(s):  

Author(s):  
Shumin Shi ◽  
Dan Luo ◽  
Xing Wu ◽  
Congjun Long ◽  
Heyan Huang

Dependency parsing is an important task for Natural Language Processing (NLP). However, a mature parser requires a large treebank for training, which is still extremely costly to create. Tibetan is a kind of extremely low-resource language for NLP, there is no available Tibetan dependency treebank, which is currently obtained by manual annotation. Furthermore, there are few related kinds of research on the construction of treebank. We propose a novel method of multi-level chunk-based syntactic parsing to complete constituent-to-dependency treebank conversion for Tibetan under scarce conditions. Our method mines more dependencies of Tibetan sentences, builds a high-quality Tibetan dependency tree corpus, and makes fuller use of the inherent laws of the language itself. We train the dependency parsing models on the dependency treebank obtained by the preliminary transformation. The model achieves 86.5% accuracy, 96% LAS, and 97.85% UAS, which exceeds the optimal results of existing conversion methods. The experimental results show that our method has the potential to use a low-resource setting, which means we not only solve the problem of scarce Tibetan dependency treebank but also avoid needless manual annotation. The method embodies the regularity of strong knowledge-guided linguistic analysis methods, which is of great significance to promote the research of Tibetan information processing.


2010 ◽  
Vol 36 (3) ◽  
pp. 295-302 ◽  
Author(s):  
Sujith Ravi ◽  
Kevin Knight

Word alignment is a critical procedure within statistical machine translation (SMT). Brown et al. (1993) have provided the most popular word alignment algorithm to date, one that has been implemented in the GIZA (Al-Onaizan et al., 1999) and GIZA++ (Och and Ney 2003) software and adopted by nearly every SMT project. In this article, we investigate whether this algorithm makes search errors when it computes Viterbi alignments, that is, whether it returns alignments that are sub-optimal according to a trained model.


Sign in / Sign up

Export Citation Format

Share Document