scholarly journals MTIL2017: Machine Translation Using Recurrent Neural Network on Statistical Machine Translation

2019 ◽  
Vol 28 (3) ◽  
pp. 447-453 ◽  
Author(s):  
Sainik Kumar Mahata ◽  
Dipankar Das ◽  
Sivaji Bandyopadhyay

Abstract Machine translation (MT) is the automatic translation of the source language to its target language by a computer system. In the current paper, we propose an approach of using recurrent neural networks (RNNs) over traditional statistical MT (SMT). We compare the performance of the phrase table of SMT to the performance of the proposed RNN and in turn improve the quality of the MT output. This work has been done as a part of the shared task problem provided by the MTIL2017. We have constructed the traditional MT model using Moses toolkit and have additionally enriched the language model using external data sets. Thereafter, we have ranked the phrase tables using an RNN encoder-decoder module created originally as a part of the GroundHog project of LISA lab.

2021 ◽  
Vol 284 ◽  
pp. 08001
Author(s):  
Ilya Ulitkin ◽  
Irina Filippova ◽  
Natalia Ivanova ◽  
Alexey Poroykov

We report on various approaches to automatic evaluation of machine translation quality and describe three widely used methods. These methods, i.e. methods based on string matching and n-gram models, make it possible to compare the quality of machine translation to reference translation. We employ modern metrics for automatic evaluation of machine translation quality such as BLEU, F-measure, and TER to compare translations made by Google and PROMT neural machine translation systems with translations obtained 5 years ago, when statistical machine translation and rule-based machine translation algorithms were employed by Google and PROMT, respectively, as the main translation algorithms [6]. The evaluation of the translation quality of candidate texts generated by Google and PROMT with reference translation using an automatic translation evaluation program reveal significant qualitative changes as compared with the results obtained 5 years ago, which indicate a dramatic improvement in the work of the above-mentioned online translation systems. Ways to improve the quality of machine translation are discussed. It is shown that modern systems of automatic evaluation of translation quality allow errors made by machine translation systems to be identified and systematized, which will enable the improvement of the quality of translation by these systems in the future.


2013 ◽  
Vol 39 (4) ◽  
pp. 999-1023 ◽  
Author(s):  
Gennadi Lembersky ◽  
Noam Ordan ◽  
Shuly Wintner

Translation models used for statistical machine translation are compiled from parallel corpora that are manually translated. The common assumption is that parallel texts are symmetrical: The direction of translation is deemed irrelevant and is consequently ignored. Much research in Translation Studies indicates that the direction of translation matters, however, as translated language (translationese) has many unique properties. It has already been shown that phrase tables constructed from parallel corpora translated in the same direction as the translation task outperform those constructed from corpora translated in the opposite direction. We reconfirm that this is indeed the case, but emphasize the importance of also using texts translated in the “wrong” direction. We take advantage of information pertaining to the direction of translation in constructing phrase tables by adapting the translation model to the special properties of translationese. We explore two adaptation techniques: First, we create a mixture model by interpolating phrase tables trained on texts translated in the “right” and the “wrong” directions. The weights for the interpolation are determined by minimizing perplexity. Second, we define entropy-based measures that estimate the correspondence of target-language phrases to translationese, thereby eliminating the need to annotate the parallel corpus with information pertaining to the direction of translation. We show that incorporating these measures as features in the phrase tables of statistical machine translation systems results in consistent, statistically significant improvement in the quality of the translation.


Author(s):  
Mir Aadil ◽  
M. Asger

The process Machine translation is a combination of many complex sub-processes and the quality of results of each sub-process executed in a well defined sequence determine the overall accuracy of the translation. Statistical Machine Translation approach considers each sentence in target language as a possible translation of any source language sentence. The possibility is calculated by probability and as obvious, sentence with highest probability is treated as the best translation. SMT is the most favoured approach not only because of its good results for corpus rich language pairs, but also for the tools that  SMT approach has been enhanced  with in past two and half decades. The paper gives a brief introduction to SMT:  its steps and different tools available for each step.


Author(s):  
Kai Fan ◽  
Jiayi Wang ◽  
Bo Li ◽  
Fengming Zhou ◽  
Boxing Chen ◽  
...  

The performances of machine translation (MT) systems are usually evaluated by the metric BLEU when the golden references are provided. However, in the case of model inference or production deployment, golden references are usually expensively available, such as human annotation with bilingual expertise. In order to address the issue of translation quality estimation (QE) without reference, we propose a general framework for automatic evaluation of the translation output for the QE task in the Conference on Statistical Machine Translation (WMT). We first build a conditional target language model with a novel bidirectional transformer, named neural bilingual expert model, which is pre-trained on large parallel corpora for feature extraction. For QE inference, the bilingual expert model can simultaneously produce the joint latent representation between the source and the translation, and real-valued measurements of possible erroneous tokens based on the prior knowledge learned from parallel data. Subsequently, the features will further be fed into a simple Bi-LSTM predictive model for quality estimation. The experimental results show that our approach achieves the state-of-the-art performance in most public available datasets of WMT 2017/2018 QE task.


2020 ◽  
Vol 4 (3) ◽  
pp. 519
Author(s):  
Permata Permata ◽  
Zaenal Abidin

In this research, automatic translation of the Lampung dialect into Indonesian was carried out using the statistical machine translation (SMT) approach. Translation of the Lampung language to Indonesian can be done by using a dictionary. Another alternative is to use the Lampung parallel body corpus and its translation in Indonesian with the SMT approach. The SMT approach is carried out in several phases. Starting from the pre-processing phase which is the initial stage to prepare a parallel corpus. Then proceed with the training phase, namely the parallel corpus processing phase to obtain a language model and translation model. Then the testing phase, and ends with the evaluation phase. SMT testing uses 25 single sentences without out-of-vocabulary (OOV), 25 single sentences with OOV, 25 compound sentences without OOV and 25 compound sentences with OOV. The results of testing the translation of Lampung sentences into Indonesian shows the accuracy of the Bilingual Evaluation Undestudy (BLEU) obtained is 77.07% in 25 single sentences without out-of-vocabulary (OOV), 72.29% in 25 single sentences with OOV, 79.84% at 25 compound sentences without OOV and 80.84% at 25 compound sentences with OOV.


Author(s):  
Herry Sujaini

Extended Word Similarity Based (EWSB) Clustering is a word clustering algorithm based on the value of words similarity obtained from the computation of a corpus. One of the benefits of clustering with this algorithm is to improve the translation of a statistical machine translation. Previous research proved that EWSB algorithm could improve the Indonesian-English translator, where the algorithm was applied to Indonesian language as target language.This paper discusses the results of a research using EWSB algorithm on a Indonesian to Minang statistical machine translator, where the algorithm is applied to Minang language as the target language. The research obtained resulted that the EWSB algorithm is quite effective when used in Minang language as the target language. The results of this study indicate that EWSB algorithm can improve the translation accuracy by 6.36%.


2011 ◽  
Vol 95 (1) ◽  
pp. 87-106 ◽  
Author(s):  
Bushra Jawaid ◽  
Daniel Zeman

Word-Order Issues in English-to-Urdu Statistical Machine Translation We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experiments using the Moses SMT system and discuss reordering models available in Moses. We then present our novel, Urdu-aware, yet generalizable approach based on reordering phrases in syntactic parse tree of the source English sentence. Our technique significantly improves quality of English-Urdu translation with Moses, both in terms of BLEU score and of subjective human judgments.


2017 ◽  
Vol 108 (1) ◽  
pp. 257-269 ◽  
Author(s):  
Nasser Zalmout ◽  
Nizar Habash

AbstractTokenization is very helpful for Statistical Machine Translation (SMT), especially when translating from morphologically rich languages. Typically, a single tokenization scheme is applied to the entire source-language text and regardless of the target language. In this paper, we evaluate the hypothesis that SMT performance may benefit from different tokenization schemes for different words within the same text, and also for different target languages. We apply this approach to Arabic as a source language, with five target languages of varying morphological complexity: English, French, Spanish, Russian and Chinese. Our results show that different target languages indeed require different source-language schemes; and a context-variable tokenization scheme can outperform a context-constant scheme with a statistically significant performance enhancement of about 1.4 BLEU points.


Sign in / Sign up

Export Citation Format

Share Document