Combining Diverse Word-Alignment Symmetrizations Improves Dependency Tree Projection

Latent Domain Word Alignment for Heterogeneous Corpora

10.3115/v1/n15-1043 ◽

2015 ◽

Cited By ~ 1

Author(s):

Hoang Cuong ◽

Khalil Sima'an

Keyword(s):

Word Alignment

Download Full-text

A Constrained Viterbi Relaxation for Bidirectional Word Alignment

10.3115/v1/p14-1139 ◽

2014 ◽

Cited By ~ 3

Author(s):

Yin-Wen Chang ◽

Alexander M. Rush ◽

John DeNero ◽

Michael Collins

Keyword(s):

Word Alignment

Download Full-text

LPTK: a linguistic pattern-aware dependency tree kernel approach for the BioCreative VI CHEMPROT task

Database ◽

10.1093/database/bay108 ◽

2018 ◽

Vol 2018 ◽

Cited By ~ 11

Author(s):

Neha Warikoo ◽

Yung-Chun Chang ◽

Wen-Lian Hsu

Keyword(s):

Tree Kernel ◽

Dependency Tree ◽

Kernel Approach

Download Full-text

Word Alignment Model Based on Maximum Entropy in Foreign Language Translation

2020 5th International Conference on Smart Grid and Electrical Automation (ICSGEA) ◽

10.1109/icsgea51094.2020.00144 ◽

2020 ◽

Author(s):

Chen Jun

Keyword(s):

Foreign Language ◽

Maximum Entropy ◽

Language Translation ◽

Word Alignment ◽

Model Based ◽

Alignment Model

Download Full-text

Phrase Table Combination Based on Symmetrization of Word Alignment for Low-Resource Languages

Applied Sciences ◽

10.3390/app11041868 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1868

Author(s):

Sari Dewi Budiwati ◽

Al Hafiz Akbar Maulana Siagian ◽

Tirana Noor Fatyanosa ◽

Masayoshi Aritsugi

Keyword(s):

Language Model ◽

Word Alignment ◽

Low Resource ◽

Relative Improvement ◽

Direct Translation ◽

The Common ◽

Target Source

Phrase table combination in pivot approaches can be an effective method to deal with low-resource language pairs. The common practice to generate phrase tables in pivot approaches is to use standard symmetrization, i.e., grow-diag-final-and. Although some researchers found that the use of non-standard symmetrization could improve bilingual evaluation understudy (BLEU) scores, the use of non-standard symmetrization has not been commonly employed in pivot approaches. In this study, we propose a strategy that uses the non-standard symmetrization of word alignment in phrase table combination. The appropriate symmetrization is selected based on the highest BLEU scores in each direct translation of source–target, source–pivot, and pivot–target of Kazakh–English (Kk–En) and Japanese–Indonesian (Ja–Id). Our experiments show that our proposed strategy outperforms the direct translation in Kk–En with absolute improvements of 0.35 (a 11.3% relative improvement) and 0.22 (a 6.4% relative improvement) BLEU points for 3-gram and 5-gram, respectively. The proposed strategy shows an absolute gain of up to 0.11 (a 0.9% relative improvement) BLEU points compared to direct translation for 3-gram in Ja–Id. Our proposed strategy using a small phrase table obtains better BLEU scores than a strategy using a large phrase table. The size of the target monolingual and feature function weight of the language model (LM) could reduce perplexity scores.

Download Full-text

A Low-Area and Low-Power Comma Detection and Word Alignment Circuits for JESD204B/C Controller

IEEE Transactions on Circuits and Systems I Regular Papers ◽

10.1109/tcsi.2021.3072772 ◽

2021 ◽

pp. 1-11

Author(s):

Peng Yin ◽

Zhou Shu ◽

Yingjun Xia ◽

Tianmei Shen ◽

Xiao Guan ◽

...

Keyword(s):

Low Power ◽

Word Alignment ◽

Low Area

Download Full-text

Multi-level Chunk-based Constituent-to-Dependency Treebank Transformation for Tibetan Dependency Parsing

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3424247 ◽

2021 ◽

Vol 20 (2) ◽

pp. 1-12

Author(s):

Shumin Shi ◽

Dan Luo ◽

Xing Wu ◽

Congjun Long ◽

Heyan Huang

Keyword(s):

Language Processing ◽

Manual Annotation ◽

Syntactic Parsing ◽

Dependency Parsing ◽

Low Resource ◽

Resource Setting ◽

Dependency Tree ◽

Low Resource Setting ◽

Novel Method ◽

Multi Level

Dependency parsing is an important task for Natural Language Processing (NLP). However, a mature parser requires a large treebank for training, which is still extremely costly to create. Tibetan is a kind of extremely low-resource language for NLP, there is no available Tibetan dependency treebank, which is currently obtained by manual annotation. Furthermore, there are few related kinds of research on the construction of treebank. We propose a novel method of multi-level chunk-based syntactic parsing to complete constituent-to-dependency treebank conversion for Tibetan under scarce conditions. Our method mines more dependencies of Tibetan sentences, builds a high-quality Tibetan dependency tree corpus, and makes fuller use of the inherent laws of the language itself. We train the dependency parsing models on the dependency treebank obtained by the preliminary transformation. The model achieves 86.5% accuracy, 96% LAS, and 97.85% UAS, which exceeds the optimal results of existing conversion methods. The experimental results show that our method has the potential to use a low-resource setting, which means we not only solve the problem of scarce Tibetan dependency treebank but also avoid needless manual annotation. The method embodies the regularity of strong knowledge-guided linguistic analysis methods, which is of great significance to promote the research of Tibetan information processing.

Download Full-text

Does GIZA++ Make Search Errors?

Computational Linguistics ◽

10.1162/coli_a_00008 ◽

2010 ◽

Vol 36 (3) ◽

pp. 295-302 ◽

Cited By ~ 2

Author(s):

Sujith Ravi ◽

Kevin Knight

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Alignment ◽

Alignment Algorithm

Word alignment is a critical procedure within statistical machine translation (SMT). Brown et al. (1993) have provided the most popular word alignment algorithm to date, one that has been implemented in the GIZA (Al-Onaizan et al., 1999) and GIZA++ (Och and Ney 2003) software and adopted by nearly every SMT project. In this article, we investigate whether this algorithm makes search errors when it computes Viterbi alignments, that is, whether it returns alignments that are sub-optimal according to a trained model.

Download Full-text