Hapax Legomena: Their Contribution in Number and Efficiency to Word Alignment

Phrase table combination in pivot approaches can be an effective method to deal with low-resource language pairs. The common practice to generate phrase tables in pivot approaches is to use standard symmetrization, i.e., grow-diag-final-and. Although some researchers found that the use of non-standard symmetrization could improve bilingual evaluation understudy (BLEU) scores, the use of non-standard symmetrization has not been commonly employed in pivot approaches. In this study, we propose a strategy that uses the non-standard symmetrization of word alignment in phrase table combination. The appropriate symmetrization is selected based on the highest BLEU scores in each direct translation of source–target, source–pivot, and pivot–target of Kazakh–English (Kk–En) and Japanese–Indonesian (Ja–Id). Our experiments show that our proposed strategy outperforms the direct translation in Kk–En with absolute improvements of 0.35 (a 11.3% relative improvement) and 0.22 (a 6.4% relative improvement) BLEU points for 3-gram and 5-gram, respectively. The proposed strategy shows an absolute gain of up to 0.11 (a 0.9% relative improvement) BLEU points compared to direct translation for 3-gram in Ja–Id. Our proposed strategy using a small phrase table obtains better BLEU scores than a strategy using a large phrase table. The size of the target monolingual and feature function weight of the language model (LM) could reduce perplexity scores.

Download Full-text

A Low-Area and Low-Power Comma Detection and Word Alignment Circuits for JESD204B/C Controller

IEEE Transactions on Circuits and Systems I Regular Papers ◽

10.1109/tcsi.2021.3072772 ◽

2021 ◽

pp. 1-11

Author(s):

Peng Yin ◽

Zhou Shu ◽

Yingjun Xia ◽

Tianmei Shen ◽

Xiao Guan ◽

...

Keyword(s):

Low Power ◽

Word Alignment ◽

Low Area

Download Full-text

Adjective–noun compounds in Mandarin: a study on productivity

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt-2020-0059 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Tian Shen ◽

R. Harald Baayen

Keyword(s):

Formation Process ◽

Word Formation ◽

Distributional Semantics ◽

Semantic Transparency ◽

Noun Compounds ◽

Hapax Legomena

Abstract In structuralist linguistics, compounds are argued not to constitute morphological categories, due to the absence of systematic form-meaning correspondences. This study investigates subsets of compounds for which systematic form-meaning correspondences are present: adjective–noun compounds in Mandarin. We show that there are substantial differences in the productivity of these compounds. One set of productivity measures (the count of types, the count of hapax legomena, and the estimated count of unseen types) reflect compounds’ profitability. By contrast, the category-conditioned degree of productivity is found to correlate with the internal semantic transparency of the words belonging to a morphological category. Greater semantic transparency, gauged by distributional semantics, predicts greater category-conditioned productivity. This dovetails well with the hypothesis that semantic transparency is a prerequisite for a word formation process to be productive.

Download Full-text

The Septuagint'S Rendering of Hebrew Hapax Legomena and the Characterization of its “Translation Technique”: The Case of Exodus

Acta Patristica et Byzantina ◽

10.1080/10226486.2009.12128801 ◽

2009 ◽

Vol 20 (1) ◽

pp. 360-376

Author(s):

Hans Ausloos

Keyword(s):

Translation Technique ◽

Hapax Legomena

Download Full-text

Does GIZA++ Make Search Errors?

Computational Linguistics ◽

10.1162/coli_a_00008 ◽

2010 ◽

Vol 36 (3) ◽

pp. 295-302 ◽

Cited By ~ 2

Author(s):

Sujith Ravi ◽

Kevin Knight

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Alignment ◽

Alignment Algorithm

Word alignment is a critical procedure within statistical machine translation (SMT). Brown et al. (1993) have provided the most popular word alignment algorithm to date, one that has been implemented in the GIZA (Al-Onaizan et al., 1999) and GIZA++ (Och and Ney 2003) software and adopted by nearly every SMT project. In this article, we investigate whether this algorithm makes search errors when it computes Viterbi alignments, that is, whether it returns alignments that are sub-optimal according to a trained model.

Download Full-text