Unsupervised Sub-tree Alignment for Tree-to-Tree Translation

This article presents a probabilistic sub-tree alignment model and its application to tree-to-tree machine translation. Unlike previous work, we do not resort to surface heuristics or expensive annotated data, but instead derive an unsupervised model to infer the syntactic correspondence between two languages. More importantly, the developed model is syntactically-motivated and does not rely on word alignments. As a by-product, our model outputs a sub-tree alignment matrix encoding a large number of diverse alignments between syntactic structures, from which machine translation systems can efficiently extract translation rules that are often filtered out due to the errors in 1-best alignment. Experimental results show that the proposed approach outperforms three state-of-the-art baseline approaches in both alignment accuracy and grammar quality. When applied to machine translation, our approach yields a +1.0 BLEU improvement and a -0.9 TER reduction on the NIST machine translation evaluation corpora. With tree binarization and fuzzy decoding, it even outperforms a state-of-the-art hierarchical phrase-based system.

Download Full-text

Visualization, Search and Analysis of Hierarchical Translation Equivalence in Machine Translation Data

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2014-0003 ◽

2014 ◽

Vol 101 (1) ◽

pp. 43-54 ◽

Cited By ~ 1

Author(s):

Gideon Maillette de Buy Wenniger ◽

Khalil Sima’an

Keyword(s):

Machine Translation ◽

Equivalence Relations ◽

Qualitative And Quantitative Analysis ◽

Complete Representation ◽

Qualitative And Quantitative ◽

Hierarchical Relations ◽

Word Alignments ◽

Detailed Statistical Analysis ◽

Translation Systems ◽

Search Capability

Abstract Translation equivalence constitutes the basis of all Machine Translation systems including the recent hierarchical and syntax-based systems. For hierarchical MT research it is important to have a tool that supports the qualitative and quantitative analysis of hierarchical translation equivalence relations extracted from word alignments in data. In this paper we present such a toolkit and exemplify some of its uses. The main challenges taken up in designing this tool are the efficient and compact, yet complete, representation of hierarchical translation equivalence coupled with an intuitive visualization of these hierarchical relations. We exploit a new hierarchical representation, called Hierarchical Alignment Trees (HATs), which is based on an extension of the algorithms used for factorizing n-ary branching SCFG rules into their minimally-branching equivalents. Our toolkit further provides a search capability based on hierarchically relevant properties of word alignments and/or translation equivalence relations. Finally, the tool allows detailed statistical analysis of word alignments, thereby providing a breakdown of alignment statistics according to the complexity of translation equivalence units or reordering phenomena. We illustrate this with an empirical study of the coverage of inversion-transduction grammars for a number of corpora enriched with manual or automatic word alignments, followed by a breakdown of corpus statistics to reordering complexity.

Download Full-text

Hierarchical Phrase-Based Translation with Jane 2

Prague Bulletin of Mathematical Linguistics ◽

10.2478/v10108-012-0007-8 ◽

2012 ◽

Vol 98 (1) ◽

pp. 37-50

Author(s):

Matthias Huck ◽

Jan-Thorsten Peter ◽

Markus Freitag ◽

Stephan Peitz ◽

Hermann Ney

Keyword(s):

Open Source ◽

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Experimental Results ◽

Insertion And Deletion

Hierarchical Phrase-Based Translation with Jane 2 In this paper, we give a survey of several recent extensions to hierarchical phrase-based machine translation that have been implemented in version 2 of Jane, RWTH's open source statistical machine translation toolkit. We focus on the following techniques: Insertion and deletion models, lexical scoring variants, reordering extensions with non-lexicalized reordering rules and with a discriminative lexicalized reordering model, and soft string-to-dependency hierarchical machine translation. We describe the fundamentals of each of these techniques and present experimental results obtained with Jane 2 to confirm their usefulness in state-of-the-art hierarchical phrase-based translation (HPBT).

Download Full-text

A review of the state-of-the-art in automatic post-editing

Machine Translation ◽

10.1007/s10590-020-09252-y ◽

2020 ◽

Author(s):

Félix do Carmo ◽

Dimitar Shterionov ◽

Joss Moorkens ◽

Joachim Wagner ◽

Murhaf Hossari ◽

...

Keyword(s):

Machine Translation ◽

State Of The Art ◽

The State ◽

Definition Of ◽

Translation Systems

AbstractThis article presents a review of the evolution of automatic post-editing, a term that describes methods to improve the output of machine translation systems, based on knowledge extracted from datasets that include post-edited content. The article describes the specificity of automatic post-editing in comparison with other tasks in machine translation, and it discusses how it may function as a complement to them. Particular detail is given in the article to the five-year period that covers the shared tasks presented in WMT conferences (2015–2019). In this period, discussion of automatic post-editing evolved from the definition of its main parameters to an announced demise, associated with the difficulties in improving output obtained by neural methods, which was then followed by renewed interest. The article debates the role and relevance of automatic post-editing, both as an academic endeavour and as a useful application in commercial workflows.

Download Full-text

NMT-Keras: a Very Flexible Toolkit with a Focus on Interactive NMT and Online Learning

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2018-0010 ◽

2018 ◽

Vol 111 (1) ◽

pp. 113-124 ◽

Cited By ~ 3

Author(s):

Álvaro Peris ◽

Francisco Casacuberta

Keyword(s):

Machine Translation ◽

Question Answering ◽

State Of The Art ◽

Translation System ◽

Extended Version ◽

Neural Machine Translation ◽

Video Captioning ◽

High Level ◽

Translation Systems

Abstract We present NMT-Keras, a flexible toolkit for training deep learning models, which puts a particular emphasis on the development of advanced applications of neural machine translation systems, such as interactive-predictive translation protocols and long-term adaptation of the translation system via continuous learning. NMT-Keras is based on an extended version of the popular Keras library, and it runs on Theano and TensorFlow. State-of-the-art neural machine translation models are deployed and used following the high-level framework provided by Keras. Given its high modularity and flexibility, it also has been extended to tackle different problems, such as image and video captioning, sentence classification and visual question answering.

Download Full-text

Machine Translation

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.26 ◽

2016 ◽

Author(s):

Lucia Specia ◽

Yorick Wilks

Keyword(s):

Machine Translation ◽

Language Processing ◽

State Of The Art ◽

Research Area ◽

Rule Based ◽

Active Research ◽

Translation Methods ◽

The Cost ◽

Translation Systems ◽

Active Research Area

Machine Translation (MT) is and always has been a core application in the field of natural-language processing. It is a very active research area and it has been attracting significant commercial interest, most of which has been driven by the deployment of corpus-based, statistical approaches, which can be built in a much shorter time and at a fraction of the cost of traditional, rule-based approaches, and yet produce translations of comparable or superior quality. This chapter aims at introducing MT and its main approaches. It provides a historical overview of the field, an introduction to different translation methods, both rationalist (rule-based) and empirical, and a more in depth description of state-of-the-art statistical methods. Finally, it covers popular metrics to evaluate the output of machine translation systems.

Download Full-text

Improving syntactic rule extraction through deleting spurious links with translation span alignment

Natural Language Engineering ◽

10.1017/s1351324913000260 ◽

2013 ◽

Vol 21 (2) ◽

pp. 227-249 ◽

Cited By ~ 4

Author(s):

JINGBO ZHU ◽

QIANG LI ◽

TONG XIAO

Keyword(s):

Practical Problem ◽

Statistical Machine Translation ◽

Rule Extraction ◽

Word Alignment ◽

Syntactic Rule ◽

Tree Alignment ◽

Parallel Data ◽

Translation Rule ◽

Word Alignments ◽

Translation Systems

AbstractMost statistical machine translation systems typically rely on word alignments to extract translation rules. This approach would suffer from a practical problem that even one spurious word alignment link can prevent some desirable translation rules from being extracted. To address this issue, this paper presents two approaches, referred to as sub-tree alignment and phrase-based forced decoding methods, to automatically learn translation span alignments from parallel data. Then, we improve the translation rule extraction by deleting spurious links and inserting new links based on bilingual translation span correspondences. Some comparison experiments are designed to demonstrate the effectiveness of the proposed approaches.

Download Full-text

Backward and trigger-based language models for statistical machine translation

Natural Language Engineering ◽

10.1017/s1351324913000168 ◽

2013 ◽

Vol 21 (2) ◽

pp. 201-226 ◽

Cited By ~ 2

Author(s):

DEYI XIONG ◽

MIN ZHANG

Keyword(s):

Mutual Information ◽

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Language Model ◽

Experimental Results ◽

Language Models ◽

Knowledge Sources ◽

Long Distance ◽

Translation Quality

AbstractThe language model is one of the most important knowledge sources for statistical machine translation. In this article, we present two extensions to standard n-gram language models in statistical machine translation: a backward language model that augments the conventional forward language model, and a mutual information trigger model which captures long-distance dependencies that go beyond the scope of standard n-gram language models. We introduce algorithms to integrate the two proposed models into two kinds of state-of-the-art phrase-based decoders. Our experimental results on Chinese/Spanish/Vietnamese-to-English show that both models are able to significantly improve translation quality in terms of BLEU and METEOR over a competitive baseline.

Download Full-text

Monolingual Transfer Learning via Bilingual Translators for Style-Sensitive Paraphrase Generation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6314 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8042-8049

Author(s):

Tomoyuki Kajiwara ◽

Biwa Miura ◽

Yuki Arase

Keyword(s):

Machine Translation ◽

Transfer Learning ◽

State Of The Art ◽

Experimental Results ◽

Fine Tuning ◽

Training Methods ◽

High Quality ◽

Style Transfer ◽

Parallel Corpus ◽

Paraphrase Generation

We tackle the low-resource problem in style transfer by employing transfer learning that utilizes abundantly available raw corpora. Our method consists of two steps: pre-training learns to generate a semantically equivalent sentence with an input assured grammaticality, and fine-tuning learns to add a desired style. Pre-training has two options, auto-encoding and machine translation based methods. Pre-training based on AutoEncoder is a simple way to learn these from a raw corpus. If machine translators are available, the model can learn more diverse paraphrasing via roundtrip translation. After these, fine-tuning achieves high-quality paraphrase generation even in situations where only 1k sentence pairs of the parallel corpus for style transfer is available. Experimental results of formality style transfer indicated the effectiveness of both pre-training methods and the method based on roundtrip translation achieves state-of-the-art performance.

Download Full-text

Re-structuring, Re-labeling, and Re-aligning for Syntax-Based Machine Translation

Computational Linguistics ◽

10.1162/coli.2010.36.2.09054 ◽

2010 ◽

Vol 36 (2) ◽

pp. 247-277 ◽

Cited By ~ 11

Author(s):

Wei Wang ◽

Jonathan May ◽

Kevin Knight ◽

Daniel Marcu

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Syntactic Structure ◽

Statistical Machine Translation ◽

Training Data ◽

Word Alignment ◽

The Em Algorithm ◽

Rule Application ◽

Word Alignments ◽

Parse Trees

This article shows that the structure of bilingual material from standard parsing and alignment tools is not optimal for training syntax-based statistical machine translation (SMT) systems. We present three modifications to the MT training data to improve the accuracy of a state-of-the-art syntax MT system: re-structuring changes the syntactic structure of training parse trees to enable reuse of substructures; re-labeling alters bracket labels to enrich rule application context; and re-aligning unifies word alignment across sentences to remove bad word alignments and refine good ones. Better structures, labels, and word alignments are learned by the EM algorithm. We show that each individual technique leads to improvement as measured by BLEU, and we also show that the greatest improvement is achieved by combining them. We report an overall 1.48 BLEU improvement on the NIST08 evaluation set over a strong baseline in Chinese/English translation.

Download Full-text

On the Translation of Literature as a Human Activity par Excellence

Aletria Revista de Estudos de Literatura ◽

10.35699/2317-2096.2020.22047 ◽

2020 ◽

Vol 30 (4) ◽

pp. 225-248

Author(s):

Cynthia Beatrice Costa ◽

Igor A. Lourenço da Silva

Keyword(s):

Machine Translation ◽

Human Activity ◽

State Of The Art ◽

Literary Translation ◽

Cultural Mediation ◽

Ethical Concerns ◽

Artistic Work ◽

Translation Systems

The quality of state-of-the-art machine translation systems have prompted a number of scholars to tap into the readiness of such systems for “literary” translation. However, studies on literary machine translation have not overtly stated what they consider as literature and mistakenly assume that literary translation is a matter of transferring meaning and/or form from one language into another. By approaching literature as art and literary translation as an artistic work of re-creation, we counterpoint, in this article, the notion that literary machine translation can be seen as an indisputable evolution within translation technology. Ethical concerns may well be utilitarian in studies to date, but by advocating for a deontological approach, we consider that aesthetical value, cultural mediation (which includes the use of paratexts), and authorship of literary translation (should) rank higher in our ethical assessments of the feasibility and actual contributions of literary machine translation.

Download Full-text