MTIL2017: Machine Translation Using Recurrent Neural Network on Statistical Machine Translation

Abstract Machine translation (MT) is the automatic translation of the source language to its target language by a computer system. In the current paper, we propose an approach of using recurrent neural networks (RNNs) over traditional statistical MT (SMT). We compare the performance of the phrase table of SMT to the performance of the proposed RNN and in turn improve the quality of the MT output. This work has been done as a part of the shared task problem provided by the MTIL2017. We have constructed the traditional MT model using Moses toolkit and have additionally enriched the language model using external data sets. Thereafter, we have ranked the phrase tables using an RNN encoder-decoder module created originally as a part of the GroundHog project of LISA lab.

Download Full-text

Automatic evaluation of the quality of machine translation of a scientific text: the results of a five-year-long experiment

E3S Web of Conferences ◽

10.1051/e3sconf/202128408001 ◽

2021 ◽

Vol 284 ◽

pp. 08001

Author(s):

Ilya Ulitkin ◽

Irina Filippova ◽

Natalia Ivanova ◽

Alexey Poroykov

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Dramatic Improvement ◽

Automatic Evaluation ◽

Neural Machine Translation ◽

Translation Quality ◽

Automatic Translation ◽

Translation Systems ◽

Qualitative Changes

We report on various approaches to automatic evaluation of machine translation quality and describe three widely used methods. These methods, i.e. methods based on string matching and n-gram models, make it possible to compare the quality of machine translation to reference translation. We employ modern metrics for automatic evaluation of machine translation quality such as BLEU, F-measure, and TER to compare translations made by Google and PROMT neural machine translation systems with translations obtained 5 years ago, when statistical machine translation and rule-based machine translation algorithms were employed by Google and PROMT, respectively, as the main translation algorithms [6]. The evaluation of the translation quality of candidate texts generated by Google and PROMT with reference translation using an automatic translation evaluation program reveal significant qualitative changes as compared with the results obtained 5 years ago, which indicate a dramatic improvement in the work of the above-mentioned online translation systems. Ways to improve the quality of machine translation are discussed. It is shown that modern systems of automatic evaluation of translation quality allow errors made by machine translation systems to be identified and systematized, which will enable the improvement of the quality of translation by these systems in the future.

Download Full-text

Improving Statistical Machine Translation by Adapting Translation Models to Translationese

Computational Linguistics ◽

10.1162/coli_a_00159 ◽

2013 ◽

Vol 39 (4) ◽

pp. 999-1023 ◽

Cited By ~ 5

Author(s):

Gennadi Lembersky ◽

Noam Ordan ◽

Shuly Wintner

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Target Language ◽

Common Assumption ◽

Parallel Corpora ◽

The Common ◽

The Right ◽

Translation Systems ◽

Parallel Texts

Translation models used for statistical machine translation are compiled from parallel corpora that are manually translated. The common assumption is that parallel texts are symmetrical: The direction of translation is deemed irrelevant and is consequently ignored. Much research in Translation Studies indicates that the direction of translation matters, however, as translated language (translationese) has many unique properties. It has already been shown that phrase tables constructed from parallel corpora translated in the same direction as the translation task outperform those constructed from corpora translated in the opposite direction. We reconfirm that this is indeed the case, but emphasize the importance of also using texts translated in the “wrong” direction. We take advantage of information pertaining to the direction of translation in constructing phrase tables by adapting the translation model to the special properties of translationese. We explore two adaptation techniques: First, we create a mixture model by interpolating phrase tables trained on texts translated in the “right” and the “wrong” directions. The weights for the interpolation are determined by minimizing perplexity. Second, we define entropy-based measures that estimate the correspondence of target-language phrases to translationese, thereby eliminating the need to annotate the parallel corpus with information pertaining to the direction of translation. We show that incorporating these measures as features in the phrase tables of statistical machine translation systems results in consistent, statistically significant improvement in the quality of the translation.

Download Full-text

An Overview of Statistical Machine Translation Tools

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i7/0201 ◽

2017 ◽

Vol 7 (7) ◽

pp. 289

Author(s):

Mir Aadil ◽

M. Asger

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Target Language ◽

Quality Of Results ◽

Source Language ◽

Translation Tools

The process Machine translation is a combination of many complex sub-processes and the quality of results of each sub-process executed in a well defined sequence determine the overall accuracy of the translation. Statistical Machine Translation approach considers each sentence in target language as a possible translation of any source language sentence. The possibility is calculated by probability and as obvious, sentence with highest probability is treated as the best translation. SMT is the most favoured approach not only because of its good results for corpus rich language pairs, but also for the tools that SMT approach has been enhanced with in past two and half decades. The paper gives a brief introduction to SMT: its steps and different tools available for each step.

Download Full-text

“Bilingual Expert” Can Find Translation Errors

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016367 ◽

2019 ◽

Vol 33 ◽

pp. 6367-6374 ◽

Cited By ~ 7

Author(s):

Kai Fan ◽

Jiayi Wang ◽

Bo Li ◽

Fengming Zhou ◽

Boxing Chen ◽

...

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Language Model ◽

Target Language ◽

Quality Estimation ◽

Parallel Corpora ◽

Translation Quality ◽

Expert Model ◽

Parallel Data ◽

Translation Errors

The performances of machine translation (MT) systems are usually evaluated by the metric BLEU when the golden references are provided. However, in the case of model inference or production deployment, golden references are usually expensively available, such as human annotation with bilingual expertise. In order to address the issue of translation quality estimation (QE) without reference, we propose a general framework for automatic evaluation of the translation output for the QE task in the Conference on Statistical Machine Translation (WMT). We first build a conditional target language model with a novel bidirectional transformer, named neural bilingual expert model, which is pre-trained on large parallel corpora for feature extraction. For QE inference, the bilingual expert model can simultaneously produce the joint latent representation between the source and the translation, and real-valued measurements of possible erroneous tokens based on the prior knowledge learned from parallel data. Subsequently, the features will further be fed into a simple Bi-LSTM predictive model for quality estimation. The experimental results show that our approach achieves the state-of-the-art performance in most public available datasets of WMT 2017/2018 QE task.

Download Full-text

Statistical Machine Translation Pada Bahasa Lampung Dialek Api Ke Bahasa Indonesia

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v4i3.2116 ◽

2020 ◽

Vol 4 (3) ◽

pp. 519

Author(s):

Permata Permata ◽

Zaenal Abidin

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Language Model ◽

Translation Model ◽

Parallel Corpus ◽

Automatic Translation ◽

Initial Stage ◽

Evaluation Phase ◽

Parallel Body ◽

Bahasa Indonesia

In this research, automatic translation of the Lampung dialect into Indonesian was carried out using the statistical machine translation (SMT) approach. Translation of the Lampung language to Indonesian can be done by using a dictionary. Another alternative is to use the Lampung parallel body corpus and its translation in Indonesian with the SMT approach. The SMT approach is carried out in several phases. Starting from the pre-processing phase which is the initial stage to prepare a parallel corpus. Then proceed with the training phase, namely the parallel corpus processing phase to obtain a language model and translation model. Then the testing phase, and ends with the evaluation phase. SMT testing uses 25 single sentences without out-of-vocabulary (OOV), 25 single sentences with OOV, 25 compound sentences without OOV and 25 compound sentences with OOV. The results of testing the translation of Lampung sentences into Indonesian shows the accuracy of the Bilingual Evaluation Undestudy (BLEU) obtained is 77.07% in 25 single sentences without out-of-vocabulary (OOV), 72.29% in 25 single sentences with OOV, 79.84% at 25 compound sentences without OOV and 80.84% at 25 compound sentences with OOV.

Download Full-text

Analysis Accuracy of Similar Word Based Clustering (EWSB) Algorithm on Machine Translator Bahasa Indonesia-Minang

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v3i3.241 ◽

2018 ◽

Vol 3 (3) ◽

Author(s):

Herry Sujaini

Keyword(s):

Machine Translation ◽

Clustering Algorithm ◽

Statistical Machine Translation ◽

Target Language ◽

Word Similarity ◽

Similar Word ◽

Word Clustering ◽

Translation Accuracy ◽

Bahasa Indonesia

Extended Word Similarity Based (EWSB) Clustering is a word clustering algorithm based on the value of words similarity obtained from the computation of a corpus. One of the benefits of clustering with this algorithm is to improve the translation of a statistical machine translation. Previous research proved that EWSB algorithm could improve the Indonesian-English translator, where the algorithm was applied to Indonesian language as target language.This paper discusses the results of a research using EWSB algorithm on a Indonesian to Minang statistical machine translator, where the algorithm is applied to Minang language as the target language. The research obtained resulted that the EWSB algorithm is quite effective when used in Minang language as the target language. The results of this study indicate that EWSB algorithm can improve the translation accuracy by 6.36%.

Download Full-text

Word-Order Issues in English-to-Urdu Statistical Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.2478/v10108-011-0007-0 ◽

2011 ◽

Vol 95 (1) ◽

pp. 87-106 ◽

Cited By ~ 3

Author(s):

Bushra Jawaid ◽

Daniel Zeman

Keyword(s):

Machine Translation ◽

Word Order ◽

Statistical Machine Translation ◽

Parse Tree ◽

Hard Problem ◽

Long Distance ◽

Translation Process ◽

English Sentence ◽

European Languages

Word-Order Issues in English-to-Urdu Statistical Machine Translation We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experiments using the Moses SMT system and discuss reordering models available in Moses. We then present our novel, Urdu-aware, yet generalizable approach based on reordering phrases in syntactic parse tree of the source English sentence. Our technique significantly improves quality of English-Urdu translation with Moses, both in terms of BLEU score and of subjective human judgments.

Download Full-text

Language model adaptation for statistical machine translation with structured query models

10.3115/1220355.1220414 ◽

2004 ◽

Cited By ~ 15

Author(s):

Bing Zhao ◽

Matthias Eck ◽

Stephan Vogel

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Language Model ◽

Model Adaptation ◽

Language Model Adaptation

Download Full-text

Optimizing Tokenization Choice for Machine Translation across Multiple Target Languages

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0025 ◽

2017 ◽

Vol 108 (1) ◽

pp. 257-269 ◽

Cited By ~ 4

Author(s):

Nasser Zalmout ◽

Nizar Habash

Keyword(s):

Machine Translation ◽

Performance Enhancement ◽

Statistical Machine Translation ◽

Target Language ◽

Source Language ◽

Context Variable ◽

Significant Performance ◽

Morphologically Rich Languages ◽

Target Languages ◽

Language Text

AbstractTokenization is very helpful for Statistical Machine Translation (SMT), especially when translating from morphologically rich languages. Typically, a single tokenization scheme is applied to the entire source-language text and regardless of the target language. In this paper, we evaluate the hypothesis that SMT performance may benefit from different tokenization schemes for different words within the same text, and also for different target languages. We apply this approach to Arabic as a source language, with five target languages of varying morphological complexity: English, French, Spanish, Russian and Chinese. Our results show that different target languages indeed require different source-language schemes; and a context-variable tokenization scheme can outperform a context-constant scheme with a statistically significant performance enhancement of about 1.4 BLEU points.

Download Full-text

Compact WFSA Based Language Model and Its Application in Statistical Machine Translation

Communications in Computer and Information Science - Natural Language Processing and Chinese Computing ◽

10.1007/978-3-642-34456-5_15 ◽

2012 ◽

pp. 154-163

Author(s):

Xiaoyin Fu ◽

Wei Wei ◽

Shixiang Lu ◽

Dengfeng Ke ◽

Bo Xu

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Language Model

Download Full-text