Alignment-Enhanced Transformer for Constraining NMT with Pre-Specified Translations

Kai Song; Kun Wang; Heng Yu; Yue Zhang; Zhongqiang Huang; Weihua Luo; Xiangyu Duan; Min Zhang

doi:10.1609/aaai.v34i05.6418

Alignment-Enhanced Transformer for Constraining NMT with Pre-Specified Translations

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6418 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8886-8893

Author(s):

Kai Song ◽

Kun Wang ◽

Heng Yu ◽

Yue Zhang ◽

Zhongqiang Huang ◽

...

Keyword(s):

Industrial Applications ◽

Practical Significance ◽

Word Alignment ◽

Translation Quality ◽

Highly Effective ◽

Word Alignments ◽

Lexical Constraints

We investigate the task of constraining NMT with pre-specified translations, which has practical significance for a number of research and industrial applications. Existing works impose pre-specified translations as lexical constraints during decoding, which are based on word alignments derived from target-to-source attention weights. However, multiple recent studies have found that word alignment derived from generic attention heads in the Transformer is unreliable. We address this problem by introducing a dedicated head in the multi-head Transformer architecture to capture external supervision signals. Results on five language pairs show that our method is highly effective in constraining NMT with pre-specified translations, consistently outperforming previous methods in translation quality.

Download Full-text

A Relationship: Word Alignment, Phrase Table, and Translation Quality

The Scientific World JOURNAL ◽

10.1155/2014/438106 ◽

2014 ◽

Vol 2014 ◽

pp. 1-13 ◽

Cited By ~ 3

Author(s):

Liang Tian ◽

Derek F. Wong ◽

Lidia S. Chao ◽

Francisco Oliveira

Keyword(s):

Machine Translation ◽

Ad Hoc ◽

Significant Loss ◽

The Other ◽

Word Alignment ◽

Translation Quality ◽

Theoretical Support ◽

Word Alignments ◽

The Relationship ◽

Pruning Technique

In the last years, researchers conducted several studies to evaluate the machine translation quality based on the relationship between word alignments and phrase table. However, existing methods usually employ ad-hoc heuristics without theoretical support. So far, there is no discussion from the aspect of providing a formula to describe the relationship among word alignments, phrase table, and machine translation performance. In this paper, on one hand, we focus on formulating such a relationship for estimating the size of extracted phrase pairs given one or more word alignment points. On the other hand, a corpus-motivated pruning technique is proposed to prune the default large phrase table. Experiment proves that the deduced formula is feasible, which not only can be used to predict the size of the phrase table, but also can be a valuable reference for investigating the relationship between the translation performance and phrase tables based on different links of word alignment. The corpus-motivated pruning results show that nearly 98% of phrases can be reduced without any significant loss in translation quality.

Download Full-text

Efficient Word Alignment with Markov Chain Monte Carlo

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2016-0013 ◽

2016 ◽

Vol 106 (1) ◽

pp. 125-146 ◽

Cited By ~ 1

Author(s):

Robert Östling ◽

Jörg Tiedemann

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

Critical Word ◽

Statistical Machine Translation ◽

Monte Carlo Sampling ◽

Word Alignment ◽

Translation Quality ◽

Word Alignments ◽

Selection Of

Abstract We present EFMARAL, a new system for efficient and accurate word alignment using a Bayesian model with Markov Chain Monte Carlo (MCMC) inference. Through careful selection of data structures and model architecture we are able to surpass the fast_align system, commonly used for performance-critical word alignment, both in computational efficiency and alignment accuracy. Our evaluation shows that a phrase-based statistical machine translation (SMT) system produces translations of higher quality when using word alignments from EFMARAL than from fast_align, and that translation quality is on par with what is obtained using GIZA++, a tool requiring orders of magnitude more processing time. More generally we hope to convince the reader that Monte Carlo sampling, rather than being viewed as a slow method of last resort, should actually be the method of choice for the SMT practitioner and others interested in word alignment.

Download Full-text

Learning Tractable Word Alignment Models with Complex Constraints

Computational Linguistics ◽

10.1162/coli_a_00007 ◽

2010 ◽

Vol 36 (3) ◽

pp. 481-504 ◽

Cited By ~ 6

Author(s):

João V. Graça ◽

Kuzman Ganchev ◽

Ben Taskar

Keyword(s):

Probabilistic Models ◽

Learning Algorithm ◽

Word Alignment ◽

Word Level ◽

Word Alignments ◽

Symmetry Constraints ◽

Critical Resource ◽

Complex Constraints ◽

Bilingual Text ◽

Efficient Learning

Word-level alignment of bilingual text is a critical resource for a growing variety of tasks. Probabilistic models for word alignment present a fundamental trade-off between richness of captured constraints and correlations versus efficiency and tractability of inference. In this article, we use the Posterior Regularization framework (Graça, Ganchev, and Taskar 2007) to incorporate complex constraints into probabilistic models during learning without changing the efficiency of the underlying model. We focus on the simple and tractable hidden Markov model, and present an efficient learning algorithm for incorporating approximate bijectivity and symmetry constraints. Models estimated with these constraints produce a significant boost in performance as measured by both precision and recall of manually annotated alignments for six language pairs. We also report experiments on two different tasks where word alignments are required: phrase-based machine translation and syntax transfer, and show promising improvements over standard methods.

Download Full-text

Bilingual Embeddings and Word Alignments for Translation Quality Estimation

10.18653/v1/w16-2380 ◽

2016 ◽

Author(s):

Amal Abdelsalam ◽

Ondřej Bojar ◽

Samhaa El-Beltagy

Keyword(s):

Quality Estimation ◽

Translation Quality ◽

Word Alignments

Download Full-text

Corpus Augmentation for Neural Machine Translation with Chinese-Japanese Parallel Corpora

Applied Sciences ◽

10.3390/app9102036 ◽

2019 ◽

Vol 9 (10) ◽

pp. 2036

Author(s):

Jinyi Zhang ◽

Tadahiro Matsumoto

Keyword(s):

Machine Translation ◽

Scientific Paper ◽

Training Data ◽

Word Alignment ◽

Sentence Pair ◽

Neural Machine Translation ◽

Parallel Corpora ◽

Translation Quality ◽

Parallel Data ◽

Source Sentence

The translation quality of Neural Machine Translation (NMT) systems depends strongly on the training data size. Sufficient amounts of parallel data are, however, not available for many language pairs. This paper presents a corpus augmentation method, which has two variations: one is for all language pairs, and the other is for the Chinese-Japanese language pair. The method uses both source and target sentences of the existing parallel corpus and generates multiple pseudo-parallel sentence pairs from a long parallel sentence pair containing punctuation marks as follows: (1) split the sentence pair into parallel partial sentences; (2) back-translate the target partial sentences; and (3) replace each partial sentence in the source sentence with the back-translated target partial sentence to generate pseudo-source sentences. The word alignment information, which is used to determine the split points, is modified with “shared Chinese character rates” in segments of the sentence pairs. The experiment results of the Japanese-Chinese and Chinese-Japanese translation with ASPEC-JC (Asian Scientific Paper Excerpt Corpus, Japanese-Chinese) show that the method substantially improves translation performance. We also supply the code (see Supplementary Materials) that can reproduce our proposed method.

Download Full-text

Large-scale Word Alignment Using Soft Dependency Cohesion Constraints

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00228 ◽

2013 ◽

Vol 1 ◽

pp. 291-300 ◽

Cited By ~ 1

Author(s):

Zhiguo Wang ◽

Chengqing Zong

Keyword(s):

Large Scale ◽

Target Language ◽

Model Parameters ◽

Word Alignment ◽

Soft Constraint ◽

Alignment Quality ◽

Source Language ◽

Discriminative Models ◽

Translation Quality ◽

Gibbs Sampling Algorithm

Dependency cohesion refers to the observation that phrases dominated by disjoint dependency subtrees in the source language generally do not overlap in the target language. It has been verified to be a useful constraint for word alignment. However, previous work either treats this as a hard constraint or uses it as a feature in discriminative models, which is ineffective for large-scale tasks. In this paper, we take dependency cohesion as a soft constraint, and integrate it into a generative model for large-scale word alignment experiments. We also propose an approximate EM algorithm and a Gibbs sampling algorithm to estimate model parameters in an unsupervised manner. Experiments on large-scale Chinese-English translation tasks demonstrate that our model achieves improvements in both alignment quality and translation quality.

Download Full-text

Joint Prediction of Word Alignment with Alignment Types

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00076 ◽

2017 ◽

Vol 5 ◽

pp. 501-514

Author(s):

Anahita Mansouri Bigvand ◽

Te Bu ◽

Anoop Sarkar

Keyword(s):

Word Pair ◽

Probabilistic Model ◽

Generative Models ◽

Word Alignment ◽

Different Types ◽

Supervised Learning Algorithms ◽

Word Alignments ◽

Model Alignment ◽

Joint Prediction ◽

Current Word

Current word alignment models do not distinguish between different types of alignment links. In this paper, we provide a new probabilistic model for word alignment where word alignments are associated with linguistically motivated alignment types. We propose a novel task of joint prediction of word alignment and alignment types and propose novel semi-supervised learning algorithms for this task. We also solve a sub-task of predicting the alignment type given an aligned word pair. In our experimental results, the generative models we introduce to model alignment types significantly outperform the models without alignment types.

Download Full-text

Improving syntactic rule extraction through deleting spurious links with translation span alignment

Natural Language Engineering ◽

10.1017/s1351324913000260 ◽

2013 ◽

Vol 21 (2) ◽

pp. 227-249 ◽

Cited By ~ 4

Author(s):

JINGBO ZHU ◽

QIANG LI ◽

TONG XIAO

Keyword(s):

Practical Problem ◽

Statistical Machine Translation ◽

Rule Extraction ◽

Word Alignment ◽

Syntactic Rule ◽

Tree Alignment ◽

Parallel Data ◽

Translation Rule ◽

Word Alignments ◽

Translation Systems

AbstractMost statistical machine translation systems typically rely on word alignments to extract translation rules. This approach would suffer from a practical problem that even one spurious word alignment link can prevent some desirable translation rules from being extracted. To address this issue, this paper presents two approaches, referred to as sub-tree alignment and phrase-based forced decoding methods, to automatically learn translation span alignments from parallel data. Then, we improve the translation rule extraction by deleting spurious links and inserting new links based on bilingual translation span correspondences. Some comparison experiments are designed to demonstrate the effectiveness of the proposed approaches.

Download Full-text

Transformation from Discontinuous to Continuous Word Alignment Improves Translation Quality

10.3115/v1/d14-1016 ◽

2014 ◽

Author(s):

Zhongjun He ◽

Hua Wu ◽

Haifeng Wang ◽

Ting Liu

Keyword(s):

Word Alignment ◽

Translation Quality

Download Full-text

Graph-Based Word Alignment for Clinical Language Evaluation

Computational Linguistics ◽

10.1162/coli_a_00232 ◽

2015 ◽

Vol 41 (4) ◽

pp. 549-578 ◽

Cited By ~ 7

Author(s):

Emily Prud'hommeaux ◽

Brian Roark

Keyword(s):

Language Processing ◽

Expectation Maximization ◽

Automated Analysis ◽

Screening Tools ◽

Word Alignment ◽

Word Level ◽

Word Alignments ◽

Language Data ◽

Novel Method ◽

Time Required

Among the more recent applications for natural language processing algorithms has been the analysis of spoken language data for diagnostic and remedial purposes, fueled by the demand for simple, objective, and unobtrusive screening tools for neurological disorders such as dementia. The automated analysis of narrative retellings in particular shows potential as a component of such a screening tool since the ability to produce accurate and meaningful narratives is noticeably impaired in individuals with dementia and its frequent precursor, mild cognitive impairment, as well as other neurodegenerative and neurodevelopmental disorders. In this article, we present a method for extracting narrative recall scores automatically and highly accurately from a word-level alignment between a retelling and the source narrative. We propose improvements to existing machine translation–based systems for word alignment, including a novel method of word alignment relying on random walks on a graph that achieves alignment accuracy superior to that of standard expectation maximization–based techniques for word alignment in a fraction of the time required for expectation maximization. In addition, the narrative recall score features extracted from these high-quality word alignments yield diagnostic classification accuracy comparable to that achieved using manually assigned scores and significantly higher than that achieved with summary-level text similarity metrics used in other areas of NLP. These methods can be trivially adapted to spontaneous language samples elicited with non-linguistic stimuli, thereby demonstrating the flexibility and generalizability of these methods.

Download Full-text