N-gram-based Machine Translation

This article describes in detail an n-gram approach to statistical machine translation. This approach consists of a log-linear combination of a translation model based on n-grams of bilingual units, which are referred to as tuples, along with four specific feature functions. Translation performance, which happens to be in the state of the art, is demonstrated with Spanish-to-English and English-to-Spanish translations of the European Parliament Plenary Sessions (EPPS).

Download Full-text

Sulis: An Open Source Transfer Decoder for Deep Syntactic Statistical Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.2478/v10108-010-0005-7 ◽

2010 ◽

Vol 93 (1) ◽

pp. 17-26 ◽

Cited By ~ 1

Author(s):

Yvette Graham

Keyword(s):

Open Source ◽

Linear Combination ◽

Machine Translation ◽

Statistical Machine Translation ◽

Language Model ◽

Beam Search ◽

Translation Model ◽

Transfer Rules ◽

Log Linear

Sulis: An Open Source Transfer Decoder for Deep Syntactic Statistical Machine Translation In this paper, we describe an open source transfer decoder for Deep Syntactic Transfer-Based Statistical Machine Translation. Transfer decoding involves the application of transfer rules to a SL structure. The N-best TL structures are found via a beam search of TL hypothesis structures which are ranked via a log-linear combination of feature scores, such as translation model and dependency-based language model.

Download Full-text

The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics ◽

10.1162/0891201042544884 ◽

2004 ◽

Vol 30 (4) ◽

pp. 417-449 ◽

Cited By ~ 212

Author(s):

Franz Josef Och ◽

Hermann Ney

Keyword(s):

Machine Translation ◽

Word Order ◽

Search Algorithm ◽

Statistical Machine Translation ◽

Target Language ◽

Linear Modeling ◽

Translation Model ◽

Log Linear ◽

Translation Systems ◽

English Canadian

A phrase-based statistical machine translation approach — the alignment template approach — is described. This translation approach allows for general many-to-many relations between words. Thereby, the context of words is taken into account in the translation model, and local changes in word order from source to target language can be learned explicitly. The model is described using a log-linear modeling approach, which is a generalization of the often used source-channel approach. Thereby, the model is easier to extend than classical statistical machine translation systems. We describe in detail the process for learning phrasal translations, the feature functions used, and the search algorithm. The evaluation of this approach is performed on three different tasks. For the German-English speech Verbmobil task, we analyze the effect of various system components. On the French-English Canadian Hansards task, the alignment template system obtains significantly better results than a single-word-based translation model. In the Chinese-English 2002 National Institute of Standards and Technology (NIST) machine translation evaluation it yields statistically significantly better NIST scores than all competing research and commercial translation systems.

Download Full-text

Research on statistical machine translation model based on deep neural network

Computing ◽

10.1007/s00607-019-00752-1 ◽

2019 ◽

Vol 102 (3) ◽

pp. 643-661 ◽

Cited By ~ 2

Author(s):

Ying Xia

Keyword(s):

Neural Network ◽

Machine Translation ◽

Deep Neural Network ◽

Statistical Machine Translation ◽

Translation Model ◽

Model Based

Download Full-text

Query Rewriting Using Monolingual Statistical Machine Translation

Computational Linguistics ◽

10.1162/coli_a_00010 ◽

2010 ◽

Vol 36 (3) ◽

pp. 569-582 ◽

Cited By ~ 23

Author(s):

Stefan Riezler ◽

Yi Liu

Keyword(s):

Machine Translation ◽

Query Expansion ◽

Web Search ◽

Query Language ◽

State Of The Art ◽

Statistical Machine Translation ◽

Target Language ◽

Translation Model ◽

User Query ◽

Query Logs

Long queries often suffer from low recall in Web search due to conjunctive term matching. The chances of matching words in relevant documents can be increased by rewriting query terms into new terms with similar statistical properties. We present a comparison of approaches that deploy user query logs to learn rewrites of query terms into terms from the document space. We show that the best results are achieved by adopting the perspective of bridging the “lexical chasm” between queries and documents by translating from a source language of user queries into a target language of Web documents. We train a state-of-the-art statistical machine translation model on query-snippet pairs from user query logs, and extract expansion terms from the query rewrites produced by the monolingual translation system. We show in an extrinsic evaluation in a real-world Web search task that the combination of a query-to-snippet translation model with a query language model achieves improved contextual query expansion compared to a state-of-the-art query expansion model that is trained on the same query log data.

Download Full-text

String-to-Dependency Statistical Machine Translation

Computational Linguistics ◽

10.1162/coli_a_00015 ◽

2010 ◽

Vol 36 (4) ◽

pp. 649-671 ◽

Cited By ~ 13

Author(s):

Libin Shen ◽

Jinxi Xu ◽

Ralph Weischedel

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Language Model ◽

Long Distance ◽

Model Experiments ◽

N Gram

We propose a novel string-to-dependency algorithm for statistical machine translation. This algorithm employs a target dependency language model during decoding to exploit long distance word relations, which cannot be modeled with a traditional n-gram language model. Experiments show that the algorithm achieves significant improvement in MT performance over a state-of-the-art hierarchical string-to-string system on NIST MT06 and MT08 newswire evaluation sets.

Download Full-text

A statistical machine translation model based on a synthetic synchronous grammar

10.3115/1667583.1667623 ◽

2009 ◽

Cited By ~ 1

Author(s):

Hongfei Jiang ◽

Muyun Yang ◽

Tiejun Zhao ◽

Sheng Li ◽

Bo Wang

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Translation Model ◽

Model Based

Download Full-text

Neural Machine Translation with Joint Representation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6344 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8285-8292

Author(s):

Yanyang Li ◽

Qiang Wang ◽

Tong Xiao ◽

Tongran Liu ◽

Jingbo Zhu

Keyword(s):

Machine Translation ◽

English Translation ◽

Large Scale ◽

State Of The Art ◽

Statistical Machine Translation ◽

The State ◽

Small Scale ◽

Neural Machine Translation ◽

Joint Representation

Though early successes of Statistical Machine Translation (SMT) systems are attributed in part to the explicit modelling of the interaction between any two source and target units, e.g., alignment, the recent Neural Machine Translation (NMT) systems resort to the attention which partially encodes the interaction for efficiency. In this paper, we employ Joint Representation that fully accounts for each possible interaction. We sidestep the inefficiency issue by refining representations with the proposed efficient attention operation. The resulting Reformer models offer a new Sequence-to-Sequence modelling paradigm besides the Encoder-Decoder framework and outperform the Transformer baseline in either the small scale IWSLT14 German-English, English-German and IWSLT15 Vietnamese-English or the large scale NIST12 Chinese-English translation tasks by about 1 BLEU point. We also propose a systematic model scaling approach, allowing the Reformer model to beat the state-of-the-art Transformer in IWSLT14 German-English and NIST12 Chinese-English with about 50% fewer parameters. The code is publicly available at https://github.com/lyy1994/reformer.

Download Full-text

Statistical Machine Translation Model Based on a Synchronous Tree-Substitution Grammar

Journal of Software ◽

10.3724/sp.j.1001.2009.03409 ◽

2010 ◽

Vol 20 (5) ◽

pp. 1241-1253

Author(s):

Hong-Fei JIANG ◽

Sheng LI ◽

Guo-Hong FU ◽

Tie-Jun ZHAO ◽

Min ZHANG

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Translation Model ◽

Model Based

Download Full-text

A Topic-Triggered Translation Model for Statistical Machine Translation

Chinese Journal of Electronics ◽

10.1049/cje.2016.10.007 ◽

2017 ◽

Vol 26 (1) ◽

pp. 65-72 ◽

Cited By ~ 2

Author(s):

Jinsong Su ◽

Zhihao Wang ◽

Qingqiang Wu ◽

Junfeng Yao ◽

Fei Long ◽

...

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Translation Model

Download Full-text

Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation

International Journal of Asian Language Processing ◽

10.1142/s2717554520500174 ◽

2021 ◽

pp. 2050017

Author(s):

Rashmini Naranpanawa ◽

Ravinga Perera ◽

Thilakshi Fonseka ◽

Uthayasanker Thayasivam

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Translation System ◽

Rare Word ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Low Resource ◽

Word Level ◽

Morphologically Rich Languages

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.

Download Full-text