Backward and trigger-based language models for statistical machine translation

AbstractThe language model is one of the most important knowledge sources for statistical machine translation. In this article, we present two extensions to standard n-gram language models in statistical machine translation: a backward language model that augments the conventional forward language model, and a mutual information trigger model which captures long-distance dependencies that go beyond the scope of standard n-gram language models. We introduce algorithms to integrate the two proposed models into two kinds of state-of-the-art phrase-based decoders. Our experimental results on Chinese/Spanish/Vietnamese-to-English show that both models are able to significantly improve translation quality in terms of BLEU and METEOR over a competitive baseline.

Download Full-text

String-to-Dependency Statistical Machine Translation

Computational Linguistics ◽

10.1162/coli_a_00015 ◽

2010 ◽

Vol 36 (4) ◽

pp. 649-671 ◽

Cited By ~ 13

Author(s):

Libin Shen ◽

Jinxi Xu ◽

Ralph Weischedel

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Language Model ◽

Long Distance ◽

Model Experiments ◽

N Gram

We propose a novel string-to-dependency algorithm for statistical machine translation. This algorithm employs a target dependency language model during decoding to exploit long distance word relations, which cannot be modeled with a traditional n-gram language model. Experiments show that the algorithm achieves significant improvement in MT performance over a state-of-the-art hierarchical string-to-string system on NIST MT06 and MT08 newswire evaluation sets.

Download Full-text

Pushdown Automata in Statistical Machine Translation

Computational Linguistics ◽

10.1162/coli_a_00197 ◽

2014 ◽

Vol 40 (3) ◽

pp. 687-723 ◽

Cited By ~ 3

Author(s):

Cyril Allauzen ◽

Bill Byrne ◽

Adrià de Gispert ◽

Gonzalo Iglesias ◽

Michael Riley

Keyword(s):

Machine Translation ◽

Large Scale ◽

Complexity Analysis ◽

Statistical Machine Translation ◽

Language Model ◽

General Purpose ◽

Language Models ◽

Experimental Conditions ◽

Context Free ◽

Pushdown Automata

This article describes the use of pushdown automata (PDA) in the context of statistical machine translation and alignment under a synchronous context-free grammar. We use PDAs to compactly represent the space of candidate translations generated by the grammar when applied to an input sentence. General-purpose PDA algorithms for replacement, composition, shortest path, and expansion are presented. We describe HiPDT, a hierarchical phrase-based decoder using the PDA representation and these algorithms. We contrast the complexity of this decoder with a decoder based on a finite state automata representation, showing that PDAs provide a more suitable framework to achieve exact decoding for larger synchronous context-free grammars and smaller language models. We assess this experimentally on a large-scale Chinese-to-English alignment and translation task. In translation, we propose a two-pass decoding strategy involving a weaker language model in the first-pass to address the results of PDA complexity analysis. We study in depth the experimental conditions and tradeoffs in which HiPDT can achieve state-of-the-art performance for large-scale SMT.

Download Full-text

Dynamically Shaping the Reordering Search Space of Phrase-Based Statistical Machine Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00231 ◽

2013 ◽

Vol 1 ◽

pp. 327-340 ◽

Cited By ~ 4

Author(s):

Arianna Bisazza ◽

Marcello Federico

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Search Space ◽

Input Word ◽

Binary Classifier ◽

Crucial Issue ◽

Trade Off ◽

Translation Quality ◽

Very High

Defining the reordering search space is a crucial issue in phrase-based SMT between distant languages. In fact, the optimal trade-off between accuracy and complexity of decoding is nowadays reached by harshly limiting the input permutation space. We propose a method to dynamically shape such space and, thus, capture long-range word movements without hurting translation quality nor decoding time. The space defined by loose reordering constraints is dynamically pruned through a binary classifier that predicts whether a given input word should be translated right after another. The integration of this model into a phrase-based decoder improves a strong Arabic-English baseline already including state-of-the-art early distortion cost (Moore and Quirk, 2007) and hierarchical phrase orientation models (Galley and Manning, 2008). Significant improvements in the reordering of verbs are achieved by a system that is notably faster than the baseline, while bleu and meteor remain stable, or even increase, at a very high distortion limit.

Download Full-text

Hierarchical Phrase-Based Translation with Jane 2

Prague Bulletin of Mathematical Linguistics ◽

10.2478/v10108-012-0007-8 ◽

2012 ◽

Vol 98 (1) ◽

pp. 37-50

Author(s):

Matthias Huck ◽

Jan-Thorsten Peter ◽

Markus Freitag ◽

Stephan Peitz ◽

Hermann Ney

Keyword(s):

Open Source ◽

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Experimental Results ◽

Insertion And Deletion

Hierarchical Phrase-Based Translation with Jane 2 In this paper, we give a survey of several recent extensions to hierarchical phrase-based machine translation that have been implemented in version 2 of Jane, RWTH's open source statistical machine translation toolkit. We focus on the following techniques: Insertion and deletion models, lexical scoring variants, reordering extensions with non-lexicalized reordering rules and with a discriminative lexicalized reordering model, and soft string-to-dependency hierarchical machine translation. We describe the fundamentals of each of these techniques and present experimental results obtained with Jane 2 to confirm their usefulness in state-of-the-art hierarchical phrase-based translation (HPBT).

Download Full-text

A SYSTEMATIC READING IN STATISTICAL TRANSLATION: FROM THE STATISTICAL MACHINE TRANSLATION TO THE NEURAL TRANSLATION MODELS.

Journal of Information and Communication Technology ◽

10.32890/jict2017.16.2.8239 ◽

2017 ◽

Author(s):

Zakaria El Maazouzi ◽

Badr Eddine EL Mohajir ◽

Mohammed Al Achhab

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

State Of The Art ◽

Statistical Machine Translation ◽

High Accuracy ◽

Neural Machine Translation ◽

Translation Quality ◽

Automatic Translation

Achieving high accuracy in automatic translation tasks has been one of the challenging goals for researchers in the area of machine translation since decades. Thus, the eagerness of exploring new possible ways to improve machine translation was always the matter for researchers in the field. Automatic translation as a key application in the natural language processing domain has developed many approaches, namely statistical machine translation and recently neural machine translation that improved largely the translation quality especially for Latin languages. They have even made it possible for the translation of some language pairs to approach human translation quality. In this paper, we present a survey of the state of the art of statistical translation, where we describe the different existing methodologies, and we overview the recent research studies while pointing out the main strengths and limitations of the different approaches.

Download Full-text

Modelling and Optimizing on Syntactic N-Grams for Statistical Machine Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00131 ◽

2015 ◽

Vol 3 ◽

pp. 169-182 ◽

Cited By ~ 4

Author(s):

Rico Sennrich

Keyword(s):

Statistical Machine Translation ◽

Language Model ◽

Language Models ◽

Translation Quality ◽

N Gram ◽

Evaluation Metric ◽

Log Linear ◽

Dependency Structures ◽

Free Word

The role of language models in SMT is to promote fluent translation output, but traditional n-gram language models are unable to capture fluency phenomena between distant words, such as some morphological agreement phenomena, subcategorisation, and syntactic collocations with string-level gaps. Syntactic language models have the potential to fill this modelling gap. We propose a language model for dependency structures that is relational rather than configurational and thus particularly suited for languages with a (relatively) free word order. It is trainable with Neural Networks, and not only improves over standard n-gram language models, but also outperforms related syntactic language models. We empirically demonstrate its effectiveness in terms of perplexity and as a feature function in string-to-tree SMT from English to German and Russian. We also show that using a syntactic evaluation metric to tune the log-linear parameters of an SMT system further increases translation quality when coupled with a syntactic language model.

Download Full-text

Margin Infused Relaxed Algorithm for Moses

Prague Bulletin of Mathematical Linguistics ◽

10.2478/v10108-011-0012-3 ◽

2011 ◽

Vol 96 (1) ◽

pp. 69-78 ◽

Cited By ~ 5

Author(s):

Eva Hasler ◽

Barry Haddow ◽

Philipp Koehn

Keyword(s):

Open Source ◽

Machine Translation ◽

Error Rate ◽

Statistical Machine Translation ◽

Experimental Results ◽

Minimum Error ◽

Feature Sets ◽

Translation Quality ◽

Core Feature ◽

Minimum Error Rate Training

Margin Infused Relaxed Algorithm for Moses We describe an open-source implementation of the Margin Infused Relaxed Algorithm (MIRA) for statistical machine translation (SMT). The implementation is part of the Moses toolkit and can be used as an alternative to standard minimum error rate training (MERT). A description of the implementation and its usage on core feature sets as well as large, sparse feature sets is given and we report experimental results comparing the performance of MIRA with MERT in terms of translation quality and stability.

Download Full-text

CloudLM: a Cloud-based Language Model for Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2016-0002 ◽

2016 ◽

Vol 105 (1) ◽

pp. 51-61 ◽

Cited By ~ 1

Author(s):

Jorge Ferrández-Tordera ◽

Sergio Ortiz-Rojas ◽

Antonio Toral

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Machine Translation ◽

Language Processing ◽

State Of The Art ◽

Language Model ◽

Essential Element ◽

Language Models ◽

Language Modelling ◽

Statistical Approaches

Abstract Language models (LMs) are an essential element in statistical approaches to natural language processing for tasks such as speech recognition and machine translation (MT). The advent of big data leads to the availability of massive amounts of data to build LMs, and in fact, for the most prominent languages, using current techniques and hardware, it is not feasible to train LMs with all the data available nowadays. At the same time, it has been shown that the more data is used for a LM the better the performance, e.g. for MT, without any indication yet of reaching a plateau. This paper presents CloudLM, an open-source cloud-based LM intended for MT, which allows to query distributed LMs. CloudLM relies on Apache Solr and provides the functionality of state-of-the-art language modelling (it builds upon KenLM), while allowing to query massive LMs (as the use of local memory is drastically reduced), at the expense of slower decoding speed.

Download Full-text

A General Approach for Word Reordering in English-Vietnamese-English Statistical Machine Translation

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213015500244 ◽

2015 ◽

Vol 24 (06) ◽

pp. 1550024

Author(s):

Nhung T. H. Nguyen ◽

Vinh Q. Le ◽

Minh-Quoc Nghiem ◽

Dien Dinh

Keyword(s):

Machine Translation ◽

Long Range ◽

Word Order ◽

Short Range ◽

Statistical Machine Translation ◽

Experimental Results ◽

Translation Quality ◽

Part Of Speech

Word ordering is among the most important problems in machine translation. In this paper, we describe a general approach to solve this problem in English-Vietnamese- English statistical machine translation. Our model automatically extracts short-range and long-range reordering rules based on part-of-speech tags and alignment information. Our method, therefore, covers both local and global word order, and is more versatile than other methods. To obtain a better set of reordering rules, we omit generated rules if their weight is lower than a threshold [Formula: see text]. The experimental results have shown that the translation quality has been improved significantly compared to the distance-based reordering model and comparable to the lexicalized model. Our approach is not only suitable for English-Vietnamese but also for language pairs which have many differences in syntax, such as English-Chinese and Chinese-Vietnamese.

Download Full-text

Character n-Gram Embeddings to Improve RNN Language Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015074 ◽

2019 ◽

Vol 33 ◽

pp. 5074-5082 ◽

Cited By ~ 2

Author(s):

Sho Takase ◽

Jun Suzuki ◽

Masaaki Nagata

Keyword(s):

Neural Network ◽

Machine Translation ◽

Recurrent Neural Network ◽

Language Model ◽

Language Modeling ◽

Word Embedding ◽

Experimental Results ◽

Language Models ◽

Word Embeddings ◽

N Gram

This paper proposes a novel Recurrent Neural Network (RNN) language model that takes advantage of character information. We focus on character n-grams based on research in the field of word embedding construction (Wieting et al. 2016). Our proposed method constructs word embeddings from character ngram embeddings and combines them with ordinary word embeddings. We demonstrate that the proposed method achieves the best perplexities on the language modeling datasets: Penn Treebank, WikiText-2, and WikiText-103. Moreover, we conduct experiments on application tasks: machine translation and headline generation. The experimental results indicate that our proposed method also positively affects these tasks

Download Full-text