Forest to String Based Statistical Machine Translation with Hybrid Word Alignments

We propose a novel language-independent approach for improving machine translation for resource-poor languages by exploiting their similarity to resource-rich ones. More precisely, we improve the translation from a resource-poor source language X_1 into a resource-rich language Y given a bi-text containing a limited number of parallel sentences for X_1-Y and a larger bi-text for X_2-Y for some resource-rich language X_2 that is closely related to X_1. This is achieved by taking advantage of the opportunities that vocabulary overlap and similarities between the languages X_1 and X_2 in spelling, word order, and syntax offer: (1) we improve the word alignments for the resource-poor language, (2) we further augment it with additional translation options, and (3) we take care of potential spelling differences through appropriate transliteration. The evaluation for Indonesian- >English using Malay and for Spanish -> English using Portuguese and pretending Spanish is resource-poor shows an absolute gain of up to 1.35 and 3.37 BLEU points, respectively, which is an improvement over the best rivaling approaches, while using much less additional data. Overall, our method cuts the amount of necessary "real'' training data by a factor of 2--5.

Download Full-text

Re-structuring, Re-labeling, and Re-aligning for Syntax-Based Machine Translation

Computational Linguistics ◽

10.1162/coli.2010.36.2.09054 ◽

2010 ◽

Vol 36 (2) ◽

pp. 247-277 ◽

Cited By ~ 11

Author(s):

Wei Wang ◽

Jonathan May ◽

Kevin Knight ◽

Daniel Marcu

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Syntactic Structure ◽

Statistical Machine Translation ◽

Training Data ◽

Word Alignment ◽

The Em Algorithm ◽

Rule Application ◽

Word Alignments ◽

Parse Trees

This article shows that the structure of bilingual material from standard parsing and alignment tools is not optimal for training syntax-based statistical machine translation (SMT) systems. We present three modifications to the MT training data to improve the accuracy of a state-of-the-art syntax MT system: re-structuring changes the syntactic structure of training parse trees to enable reuse of substructures; re-labeling alters bracket labels to enrich rule application context; and re-aligning unifies word alignment across sentences to remove bad word alignments and refine good ones. Better structures, labels, and word alignments are learned by the EM algorithm. We show that each individual technique leads to improvement as measured by BLEU, and we also show that the greatest improvement is achieved by combining them. We report an overall 1.48 BLEU improvement on the NIST08 evaluation set over a strong baseline in Chinese/English translation.

Download Full-text

Factored Statistical Machine Translation for German-English

Journal of Applied Information, Communication and Technology ◽

10.33555/ejaict.v5i1.47 ◽

2018 ◽

Vol 5 (1) ◽

pp. 37-45

Author(s):

Darryl Yunus Sulistyan

Keyword(s):

Machine Translation ◽

English Language ◽

Statistical Machine Translation ◽

New Model ◽

Language Pair

Machine Translation is a machine that is going to automatically translate given sentences in a language to other particular language. This paper aims to test the effectiveness of a new model of machine translation which is factored machine translation. We compare the performance of the unfactored system as our baseline compared to the factored model in terms of BLEU score. We test the model in German-English language pair using Europarl corpus. The tools we are using is called MOSES. It is freely downloadable and use. We found, however, that the unfactored model scored over 24 in BLEU and outperforms the factored model which scored below 24 in BLEU for all cases. In terms of words being translated, however, all of factored models outperforms the unfactored model.

Download Full-text

Proceedings of the Workshop on Statistical Machine Translation - StatMT '06

10.3115/1654650 ◽

2006 ◽

Cited By ~ 1

Keyword(s):

Machine Translation ◽

Statistical Machine Translation

Download Full-text

Proceedings of the Second Workshop on Statistical Machine Translation - StatMT '07

10.3115/1626355 ◽

2007 ◽

Cited By ~ 1

Keyword(s):

Machine Translation ◽

Statistical Machine Translation

Download Full-text

Improve Statistical Machine Translation with Context-Sensitive Bilingual Semantic Embedding Model

10.3115/v1/d14-1015 ◽

2014 ◽

Cited By ~ 3

Author(s):

Haiyang Wu ◽

Daxiang Dong ◽

Xiaoguang Hu ◽

Dianhai Yu ◽

Wei He ◽

...

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Context Sensitive ◽

Semantic Embedding

Download Full-text

Synchronous Tree Sequence Substitution Grammar for Statistical Machine Translation

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2009.01317 ◽

2009 ◽

Vol 35 (10) ◽

pp. 1317-1326

Author(s):

Hong-Fei JIANG ◽

Sheng LI ◽

Min ZHANG ◽

Tie-Jun ZHAO ◽

Mu-Yun YANG

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Sequence Substitution

Download Full-text

Analysis Accuracy of Similar Word Based Clustering (EWSB) Algorithm on Machine Translator Bahasa Indonesia-Minang

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v3i3.241 ◽

2018 ◽

Vol 3 (3) ◽

Author(s):

Herry Sujaini

Keyword(s):

Machine Translation ◽

Clustering Algorithm ◽

Statistical Machine Translation ◽

Target Language ◽

Word Similarity ◽

Similar Word ◽

Word Clustering ◽

Translation Accuracy ◽

Bahasa Indonesia

Extended Word Similarity Based (EWSB) Clustering is a word clustering algorithm based on the value of words similarity obtained from the computation of a corpus. One of the benefits of clustering with this algorithm is to improve the translation of a statistical machine translation. Previous research proved that EWSB algorithm could improve the Indonesian-English translator, where the algorithm was applied to Indonesian language as target language.This paper discusses the results of a research using EWSB algorithm on a Indonesian to Minang statistical machine translator, where the algorithm is applied to Minang language as the target language. The research obtained resulted that the EWSB algorithm is quite effective when used in Minang language as the target language. The results of this study indicate that EWSB algorithm can improve the translation accuracy by 6.36%.

Download Full-text

English-Dogri Translation System using MOSES

Circulation in Computer Science ◽

10.22632/ccs-2016-251-25 ◽

2016 ◽

Vol 1 (1) ◽

pp. 45-49

Author(s):

Avinash Singh ◽

Asmeet Kour ◽

Shubhnandan S. Jamwal

Keyword(s):

Natural Language Processing ◽

Machine Translation ◽

Language Processing ◽

Statistical Machine Translation ◽

Translation System ◽

Parallel Corpus ◽

English System ◽

Machine Translation System ◽

Translation Machine ◽

Language Pair

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.

Download Full-text