Word alignment issues in ASR scoring

Phrase table combination in pivot approaches can be an effective method to deal with low-resource language pairs. The common practice to generate phrase tables in pivot approaches is to use standard symmetrization, i.e., grow-diag-final-and. Although some researchers found that the use of non-standard symmetrization could improve bilingual evaluation understudy (BLEU) scores, the use of non-standard symmetrization has not been commonly employed in pivot approaches. In this study, we propose a strategy that uses the non-standard symmetrization of word alignment in phrase table combination. The appropriate symmetrization is selected based on the highest BLEU scores in each direct translation of source–target, source–pivot, and pivot–target of Kazakh–English (Kk–En) and Japanese–Indonesian (Ja–Id). Our experiments show that our proposed strategy outperforms the direct translation in Kk–En with absolute improvements of 0.35 (a 11.3% relative improvement) and 0.22 (a 6.4% relative improvement) BLEU points for 3-gram and 5-gram, respectively. The proposed strategy shows an absolute gain of up to 0.11 (a 0.9% relative improvement) BLEU points compared to direct translation for 3-gram in Ja–Id. Our proposed strategy using a small phrase table obtains better BLEU scores than a strategy using a large phrase table. The size of the target monolingual and feature function weight of the language model (LM) could reduce perplexity scores.

Download Full-text

A Low-Area and Low-Power Comma Detection and Word Alignment Circuits for JESD204B/C Controller

IEEE Transactions on Circuits and Systems I Regular Papers ◽

10.1109/tcsi.2021.3072772 ◽

2021 ◽

pp. 1-11

Author(s):

Peng Yin ◽

Zhou Shu ◽

Yingjun Xia ◽

Tianmei Shen ◽

Xiao Guan ◽

...

Keyword(s):

Low Power ◽

Word Alignment ◽

Low Area

Download Full-text

Does GIZA++ Make Search Errors?

Computational Linguistics ◽

10.1162/coli_a_00008 ◽

2010 ◽

Vol 36 (3) ◽

pp. 295-302 ◽

Cited By ~ 2

Author(s):

Sujith Ravi ◽

Kevin Knight

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Alignment ◽

Alignment Algorithm

Word alignment is a critical procedure within statistical machine translation (SMT). Brown et al. (1993) have provided the most popular word alignment algorithm to date, one that has been implemented in the GIZA (Al-Onaizan et al., 1999) and GIZA++ (Och and Ney 2003) software and adopted by nearly every SMT project. In this article, we investigate whether this algorithm makes search errors when it computes Viterbi alignments, that is, whether it returns alignments that are sub-optimal according to a trained model.

Download Full-text

Word Alignment Based on Multi-Grain Model

2008 6th International Symposium on Chinese Spoken Language Processing ◽

10.1109/chinsl.2008.ecp.79 ◽

2008 ◽

Cited By ~ 1

Author(s):

Yanqing He ◽

Yu Zhou ◽

Chengqing Zong

Keyword(s):

Word Alignment ◽

Grain Model

Download Full-text

Unsupervised Word Alignment Using Frequency Constraint in Posterior Regularized EM

Journal of Natural Language Processing ◽

10.5715/jnlp.23.327 ◽

2016 ◽

Vol 23 (4) ◽

pp. 327-351

Author(s):

Hidetaka Kamigaito ◽

Taro Watanabe ◽

Hiroya Takamura ◽

Manabu Okumura ◽

Eiichiro Sumita

Keyword(s):

Word Alignment ◽

Frequency Constraint ◽

Regularized Em

Download Full-text

Word alignment based on bilingual bracketing

10.3115/1118905.1118908 ◽

2003 ◽

Cited By ~ 4

Author(s):

Bing Zhao ◽

Stephan Vogel

Keyword(s):

Word Alignment

Download Full-text

Learning Tractable Word Alignment Models with Complex Constraints

Computational Linguistics ◽

10.1162/coli_a_00007 ◽

2010 ◽

Vol 36 (3) ◽

pp. 481-504 ◽

Cited By ~ 6

Author(s):

João V. Graça ◽

Kuzman Ganchev ◽

Ben Taskar

Keyword(s):

Probabilistic Models ◽

Learning Algorithm ◽

Word Alignment ◽

Word Level ◽

Word Alignments ◽

Symmetry Constraints ◽

Critical Resource ◽

Complex Constraints ◽

Bilingual Text ◽

Efficient Learning

Word-level alignment of bilingual text is a critical resource for a growing variety of tasks. Probabilistic models for word alignment present a fundamental trade-off between richness of captured constraints and correlations versus efficiency and tractability of inference. In this article, we use the Posterior Regularization framework (Graça, Ganchev, and Taskar 2007) to incorporate complex constraints into probabilistic models during learning without changing the efficiency of the underlying model. We focus on the simple and tractable hidden Markov model, and present an efficient learning algorithm for incorporating approximate bijectivity and symmetry constraints. Models estimated with these constraints produce a significant boost in performance as measured by both precision and recall of manually annotated alignments for six language pairs. We also report experiments on two different tasks where word alignments are required: phrase-based machine translation and syntax transfer, and show promising improvements over standard methods.

Download Full-text