Hapax Legomena: Their Contribution in Number and Efficiency to Word Alignment

Author(s):  
Adrien Lardilleux ◽  
Yves Lepage
2014 ◽  
Author(s):  
Yin-Wen Chang ◽  
Alexander M. Rush ◽  
John DeNero ◽  
Michael Collins
Keyword(s):  

2021 ◽  
Vol 11 (4) ◽  
pp. 1868
Author(s):  
Sari Dewi Budiwati ◽  
Al Hafiz Akbar Maulana Siagian ◽  
Tirana Noor Fatyanosa ◽  
Masayoshi Aritsugi

Phrase table combination in pivot approaches can be an effective method to deal with low-resource language pairs. The common practice to generate phrase tables in pivot approaches is to use standard symmetrization, i.e., grow-diag-final-and. Although some researchers found that the use of non-standard symmetrization could improve bilingual evaluation understudy (BLEU) scores, the use of non-standard symmetrization has not been commonly employed in pivot approaches. In this study, we propose a strategy that uses the non-standard symmetrization of word alignment in phrase table combination. The appropriate symmetrization is selected based on the highest BLEU scores in each direct translation of source–target, source–pivot, and pivot–target of Kazakh–English (Kk–En) and Japanese–Indonesian (Ja–Id). Our experiments show that our proposed strategy outperforms the direct translation in Kk–En with absolute improvements of 0.35 (a 11.3% relative improvement) and 0.22 (a 6.4% relative improvement) BLEU points for 3-gram and 5-gram, respectively. The proposed strategy shows an absolute gain of up to 0.11 (a 0.9% relative improvement) BLEU points compared to direct translation for 3-gram in Ja–Id. Our proposed strategy using a small phrase table obtains better BLEU scores than a strategy using a large phrase table. The size of the target monolingual and feature function weight of the language model (LM) could reduce perplexity scores.


Author(s):  
Peng Yin ◽  
Zhou Shu ◽  
Yingjun Xia ◽  
Tianmei Shen ◽  
Xiao Guan ◽  
...  
Keyword(s):  

2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Tian Shen ◽  
R. Harald Baayen

Abstract In structuralist linguistics, compounds are argued not to constitute morphological categories, due to the absence of systematic form-meaning correspondences. This study investigates subsets of compounds for which systematic form-meaning correspondences are present: adjective–noun compounds in Mandarin. We show that there are substantial differences in the productivity of these compounds. One set of productivity measures (the count of types, the count of hapax legomena, and the estimated count of unseen types) reflect compounds’ profitability. By contrast, the category-conditioned degree of productivity is found to correlate with the internal semantic transparency of the words belonging to a morphological category. Greater semantic transparency, gauged by distributional semantics, predicts greater category-conditioned productivity. This dovetails well with the hypothesis that semantic transparency is a prerequisite for a word formation process to be productive.


2010 ◽  
Vol 36 (3) ◽  
pp. 295-302 ◽  
Author(s):  
Sujith Ravi ◽  
Kevin Knight

Word alignment is a critical procedure within statistical machine translation (SMT). Brown et al. (1993) have provided the most popular word alignment algorithm to date, one that has been implemented in the GIZA (Al-Onaizan et al., 1999) and GIZA++ (Och and Ney 2003) software and adopted by nearly every SMT project. In this article, we investigate whether this algorithm makes search errors when it computes Viterbi alignments, that is, whether it returns alignments that are sub-optimal according to a trained model.


Sign in / Sign up

Export Citation Format

Share Document