scholarly journals Unsupervised Sub-tree Alignment for Tree-to-Tree Translation

2013 ◽  
Vol 48 ◽  
pp. 733-782 ◽  
Author(s):  
T. Xiao ◽  
J. Zhu

This article presents a probabilistic sub-tree alignment model and its application to tree-to-tree machine translation. Unlike previous work, we do not resort to surface heuristics or expensive annotated data, but instead derive an unsupervised model to infer the syntactic correspondence between two languages. More importantly, the developed model is syntactically-motivated and does not rely on word alignments. As a by-product, our model outputs a sub-tree alignment matrix encoding a large number of diverse alignments between syntactic structures, from which machine translation systems can efficiently extract translation rules that are often filtered out due to the errors in 1-best alignment. Experimental results show that the proposed approach outperforms three state-of-the-art baseline approaches in both alignment accuracy and grammar quality. When applied to machine translation, our approach yields a +1.0 BLEU improvement and a -0.9 TER reduction on the NIST machine translation evaluation corpora. With tree binarization and fuzzy decoding, it even outperforms a state-of-the-art hierarchical phrase-based system.

2014 ◽  
Vol 101 (1) ◽  
pp. 43-54 ◽  
Author(s):  
Gideon Maillette de Buy Wenniger ◽  
Khalil Sima’an

Abstract Translation equivalence constitutes the basis of all Machine Translation systems including the recent hierarchical and syntax-based systems. For hierarchical MT research it is important to have a tool that supports the qualitative and quantitative analysis of hierarchical translation equivalence relations extracted from word alignments in data. In this paper we present such a toolkit and exemplify some of its uses. The main challenges taken up in designing this tool are the efficient and compact, yet complete, representation of hierarchical translation equivalence coupled with an intuitive visualization of these hierarchical relations. We exploit a new hierarchical representation, called Hierarchical Alignment Trees (HATs), which is based on an extension of the algorithms used for factorizing n-ary branching SCFG rules into their minimally-branching equivalents. Our toolkit further provides a search capability based on hierarchically relevant properties of word alignments and/or translation equivalence relations. Finally, the tool allows detailed statistical analysis of word alignments, thereby providing a breakdown of alignment statistics according to the complexity of translation equivalence units or reordering phenomena. We illustrate this with an empirical study of the coverage of inversion-transduction grammars for a number of corpora enriched with manual or automatic word alignments, followed by a breakdown of corpus statistics to reordering complexity.


2012 ◽  
Vol 98 (1) ◽  
pp. 37-50
Author(s):  
Matthias Huck ◽  
Jan-Thorsten Peter ◽  
Markus Freitag ◽  
Stephan Peitz ◽  
Hermann Ney

Hierarchical Phrase-Based Translation with Jane 2 In this paper, we give a survey of several recent extensions to hierarchical phrase-based machine translation that have been implemented in version 2 of Jane, RWTH's open source statistical machine translation toolkit. We focus on the following techniques: Insertion and deletion models, lexical scoring variants, reordering extensions with non-lexicalized reordering rules and with a discriminative lexicalized reordering model, and soft string-to-dependency hierarchical machine translation. We describe the fundamentals of each of these techniques and present experimental results obtained with Jane 2 to confirm their usefulness in state-of-the-art hierarchical phrase-based translation (HPBT).


Author(s):  
Félix do Carmo ◽  
Dimitar Shterionov ◽  
Joss Moorkens ◽  
Joachim Wagner ◽  
Murhaf Hossari ◽  
...  

AbstractThis article presents a review of the evolution of automatic post-editing, a term that describes methods to improve the output of machine translation systems, based on knowledge extracted from datasets that include post-edited content. The article describes the specificity of automatic post-editing in comparison with other tasks in machine translation, and it discusses how it may function as a complement to them. Particular detail is given in the article to the five-year period that covers the shared tasks presented in WMT conferences (2015–2019). In this period, discussion of automatic post-editing evolved from the definition of its main parameters to an announced demise, associated with the difficulties in improving output obtained by neural methods, which was then followed by renewed interest. The article debates the role and relevance of automatic post-editing, both as an academic endeavour and as a useful application in commercial workflows.


2018 ◽  
Vol 111 (1) ◽  
pp. 113-124 ◽  
Author(s):  
Álvaro Peris ◽  
Francisco Casacuberta

Abstract We present NMT-Keras, a flexible toolkit for training deep learning models, which puts a particular emphasis on the development of advanced applications of neural machine translation systems, such as interactive-predictive translation protocols and long-term adaptation of the translation system via continuous learning. NMT-Keras is based on an extended version of the popular Keras library, and it runs on Theano and TensorFlow. State-of-the-art neural machine translation models are deployed and used following the high-level framework provided by Keras. Given its high modularity and flexibility, it also has been extended to tackle different problems, such as image and video captioning, sentence classification and visual question answering.


Author(s):  
Lucia Specia ◽  
Yorick Wilks

Machine Translation (MT) is and always has been a core application in the field of natural-language processing. It is a very active research area and it has been attracting significant commercial interest, most of which has been driven by the deployment of corpus-based, statistical approaches, which can be built in a much shorter time and at a fraction of the cost of traditional, rule-based approaches, and yet produce translations of comparable or superior quality. This chapter aims at introducing MT and its main approaches. It provides a historical overview of the field, an introduction to different translation methods, both rationalist (rule-based) and empirical, and a more in depth description of state-of-the-art statistical methods. Finally, it covers popular metrics to evaluate the output of machine translation systems.


2013 ◽  
Vol 21 (2) ◽  
pp. 227-249 ◽  
Author(s):  
JINGBO ZHU ◽  
QIANG LI ◽  
TONG XIAO

AbstractMost statistical machine translation systems typically rely on word alignments to extract translation rules. This approach would suffer from a practical problem that even one spurious word alignment link can prevent some desirable translation rules from being extracted. To address this issue, this paper presents two approaches, referred to as sub-tree alignment and phrase-based forced decoding methods, to automatically learn translation span alignments from parallel data. Then, we improve the translation rule extraction by deleting spurious links and inserting new links based on bilingual translation span correspondences. Some comparison experiments are designed to demonstrate the effectiveness of the proposed approaches.


2013 ◽  
Vol 21 (2) ◽  
pp. 201-226 ◽  
Author(s):  
DEYI XIONG ◽  
MIN ZHANG

AbstractThe language model is one of the most important knowledge sources for statistical machine translation. In this article, we present two extensions to standard n-gram language models in statistical machine translation: a backward language model that augments the conventional forward language model, and a mutual information trigger model which captures long-distance dependencies that go beyond the scope of standard n-gram language models. We introduce algorithms to integrate the two proposed models into two kinds of state-of-the-art phrase-based decoders. Our experimental results on Chinese/Spanish/Vietnamese-to-English show that both models are able to significantly improve translation quality in terms of BLEU and METEOR over a competitive baseline.


2020 ◽  
Vol 34 (05) ◽  
pp. 8042-8049
Author(s):  
Tomoyuki Kajiwara ◽  
Biwa Miura ◽  
Yuki Arase

We tackle the low-resource problem in style transfer by employing transfer learning that utilizes abundantly available raw corpora. Our method consists of two steps: pre-training learns to generate a semantically equivalent sentence with an input assured grammaticality, and fine-tuning learns to add a desired style. Pre-training has two options, auto-encoding and machine translation based methods. Pre-training based on AutoEncoder is a simple way to learn these from a raw corpus. If machine translators are available, the model can learn more diverse paraphrasing via roundtrip translation. After these, fine-tuning achieves high-quality paraphrase generation even in situations where only 1k sentence pairs of the parallel corpus for style transfer is available. Experimental results of formality style transfer indicated the effectiveness of both pre-training methods and the method based on roundtrip translation achieves state-of-the-art performance.


2010 ◽  
Vol 36 (2) ◽  
pp. 247-277 ◽  
Author(s):  
Wei Wang ◽  
Jonathan May ◽  
Kevin Knight ◽  
Daniel Marcu

This article shows that the structure of bilingual material from standard parsing and alignment tools is not optimal for training syntax-based statistical machine translation (SMT) systems. We present three modifications to the MT training data to improve the accuracy of a state-of-the-art syntax MT system: re-structuring changes the syntactic structure of training parse trees to enable reuse of substructures; re-labeling alters bracket labels to enrich rule application context; and re-aligning unifies word alignment across sentences to remove bad word alignments and refine good ones. Better structures, labels, and word alignments are learned by the EM algorithm. We show that each individual technique leads to improvement as measured by BLEU, and we also show that the greatest improvement is achieved by combining them. We report an overall 1.48 BLEU improvement on the NIST08 evaluation set over a strong baseline in Chinese/English translation.


2020 ◽  
Vol 30 (4) ◽  
pp. 225-248
Author(s):  
Cynthia Beatrice Costa ◽  
Igor A. Lourenço da Silva

The quality of state-of-the-art machine translation systems have prompted a number of scholars to tap into the readiness of such systems for “literary” translation. However, studies on literary machine translation have not overtly stated what they consider as literature and mistakenly assume that literary translation is a matter of transferring meaning and/or form from one language into another. By approaching literature as art and literary translation as an artistic work of re-creation, we counterpoint, in this article, the notion that literary machine translation can be seen as an indisputable evolution within translation technology. Ethical concerns may well be utilitarian in studies to date, but by advocating for a deontological approach, we consider that aesthetical value, cultural mediation (which includes the use of paratexts), and authorship of literary translation (should) rank higher in our ethical assessments of the feasibility and actual contributions of literary machine translation.


Sign in / Sign up

Export Citation Format

Share Document