Fast translation rule matching for syntax-based statistical machine translation

Translation rule selection is a task of selecting appropriate translation rules for an ambiguous source-language segment. As translation ambiguities are pervasive in statistical machine translation, we introduce two topic-based models for translation rule selection which incorporates global topic information into translation disambiguation. We associate each synchronous translation rule with source- and target-side topic distributions.With these topic distributions, we propose a topic dissimilarity model to select desirable (less dissimilar) rules by imposing penalties for rules with a large value of dissimilarity of their topic distributions to those of given documents. In order to encourage the use of non-topic specific translation rules, we also present a topic sensitivity model to balance translation rule selection between generic rules and topic-specific rules. Furthermore, we project target-side topic distributions onto the source-side topic model space so that we can benefit from topic information of both the source and target language. We integrate the proposed topic dissimilarity and sensitivity model into hierarchical phrase-based machine translation for synchronous translation rule selection. Experiments show that our topic-based translation rule selection model can substantially improve translation quality.

Download Full-text

A study of translation rule classification for syntax-based statistical machine translation

Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation - SSST '09 ◽

10.3115/1626344.1626350 ◽

2009 ◽

Author(s):

Hongfei Jiang ◽

Sheng Li ◽

Muyun Yang ◽

Tiejun Zhao

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Translation Rule

Download Full-text

Neural Network Machine Translation Method Based on Unsupervised Domain Adaptation

Complexity ◽

10.1155/2020/6657344 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Rui Wang

Keyword(s):

Neural Network ◽

Machine Translation ◽

Large Scale ◽

Domain Adaptation ◽

Structural Information ◽

Statistical Machine Translation ◽

Target Language ◽

Great Success ◽

Parallel Corpora ◽

Translation Rule

Relying on large-scale parallel corpora, neural machine translation has achieved great success in certain language pairs. However, the acquisition of high-quality parallel corpus is one of the main difficulties in machine translation research. In order to solve this problem, this paper proposes unsupervised domain adaptive neural network machine translation. This method can be trained using only two unrelated monolingual corpora and obtain a good translation result. This article first measures the matching degree of translation rules by adding relevant subject information to the translation rules and dynamically calculating the similarity between each translation rule and the document to be translated during the decoding process. Secondly, through the joint training of multiple training tasks, the source language can learn useful semantic and structural information from the monolingual corpus of a third language that is not parallel to the current two languages during the process of translation into the target language. Experimental results show that better results can be obtained than traditional statistical machine translation.

Download Full-text

Factored Statistical Machine Translation for German-English

Journal of Applied Information, Communication and Technology ◽

10.33555/ejaict.v5i1.47 ◽

2018 ◽

Vol 5 (1) ◽

pp. 37-45

Author(s):

Darryl Yunus Sulistyan

Keyword(s):

Machine Translation ◽

English Language ◽

Statistical Machine Translation ◽

New Model ◽

Language Pair

Machine Translation is a machine that is going to automatically translate given sentences in a language to other particular language. This paper aims to test the effectiveness of a new model of machine translation which is factored machine translation. We compare the performance of the unfactored system as our baseline compared to the factored model in terms of BLEU score. We test the model in German-English language pair using Europarl corpus. The tools we are using is called MOSES. It is freely downloadable and use. We found, however, that the unfactored model scored over 24 in BLEU and outperforms the factored model which scored below 24 in BLEU for all cases. In terms of words being translated, however, all of factored models outperforms the unfactored model.

Download Full-text

Proceedings of the Workshop on Statistical Machine Translation - StatMT '06

10.3115/1654650 ◽

2006 ◽

Cited By ~ 1

Keyword(s):

Machine Translation ◽

Statistical Machine Translation

Download Full-text

Proceedings of the Second Workshop on Statistical Machine Translation - StatMT '07

10.3115/1626355 ◽

2007 ◽

Cited By ~ 1

Keyword(s):

Machine Translation ◽

Statistical Machine Translation

Download Full-text

Improve Statistical Machine Translation with Context-Sensitive Bilingual Semantic Embedding Model

10.3115/v1/d14-1015 ◽

2014 ◽

Cited By ~ 3

Author(s):

Haiyang Wu ◽

Daxiang Dong ◽

Xiaoguang Hu ◽

Dianhai Yu ◽

Wei He ◽

...

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Context Sensitive ◽

Semantic Embedding

Download Full-text

Synchronous Tree Sequence Substitution Grammar for Statistical Machine Translation

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2009.01317 ◽

2009 ◽

Vol 35 (10) ◽

pp. 1317-1326

Author(s):

Hong-Fei JIANG ◽

Sheng LI ◽

Min ZHANG ◽

Tie-Jun ZHAO ◽

Mu-Yun YANG

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Sequence Substitution

Download Full-text

Analysis Accuracy of Similar Word Based Clustering (EWSB) Algorithm on Machine Translator Bahasa Indonesia-Minang

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v3i3.241 ◽

2018 ◽

Vol 3 (3) ◽

Author(s):

Herry Sujaini

Keyword(s):

Machine Translation ◽

Clustering Algorithm ◽

Statistical Machine Translation ◽

Target Language ◽

Word Similarity ◽

Similar Word ◽

Word Clustering ◽

Translation Accuracy ◽

Bahasa Indonesia

Extended Word Similarity Based (EWSB) Clustering is a word clustering algorithm based on the value of words similarity obtained from the computation of a corpus. One of the benefits of clustering with this algorithm is to improve the translation of a statistical machine translation. Previous research proved that EWSB algorithm could improve the Indonesian-English translator, where the algorithm was applied to Indonesian language as target language.This paper discusses the results of a research using EWSB algorithm on a Indonesian to Minang statistical machine translator, where the algorithm is applied to Minang language as the target language. The research obtained resulted that the EWSB algorithm is quite effective when used in Minang language as the target language. The results of this study indicate that EWSB algorithm can improve the translation accuracy by 6.36%.

Download Full-text

English-Dogri Translation System using MOSES

Circulation in Computer Science ◽

10.22632/ccs-2016-251-25 ◽

2016 ◽

Vol 1 (1) ◽

pp. 45-49

Author(s):

Avinash Singh ◽

Asmeet Kour ◽

Shubhnandan S. Jamwal

Keyword(s):

Natural Language Processing ◽

Machine Translation ◽

Language Processing ◽

Statistical Machine Translation ◽

Translation System ◽

Parallel Corpus ◽

English System ◽

Machine Translation System ◽

Translation Machine ◽

Language Pair

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.

Download Full-text