scholarly journals Phrase Based Statistical Machine Translation Javanese-Indonesian

2021 ◽  
Vol 5 (2) ◽  
pp. 378
Author(s):  
Aufa Eka Putri Lesatari ◽  
Arie Ardiyanti ◽  
Arie Ardiyanti ◽  
Ibnu Asror ◽  
Ibnu Asror

This research aims to produce a statistical machine translation that can be implemented to perform Javanese-Indonesian translation and to know the influence of the main data sources of statistical machine translation namely parallel corpus and monolingual corpus on the quality of Javanese-Indonesian statistical machine translation. The testing was carried out by gradually adding the quantity of parallel corpus and monolingual corpus to seven configurations of Javanese-Indonesian statistical machine translation. All machine translation configuration experiments were tested with test data totaling 500 lines of Javanese sentences. Results from machine translation are evaluated automatically using Bilingual Evaluation Understudy (BLEU). Test results in seven configurations showed an increase in the evaluation value of the translation machine after the quantity of parallel corpus and monolingual corpus was added. The quantity of parallel corpus in configurations 1 and 2 increased by 3,6%, configurations 2 and 3 increased by 8,23%, configurations 3 and 7 increased by 14,92%. Additional monolingual corpus quantity in configurations 4 and 5 increased BLEU score by 0,18%, configurations 5 and 6 increased by 0,06%, configurations 6 and 7 increased by 0,24%. The test results showed that the quantity of parallel corpus and monolingual corpus could increase the evaluation value of statistical machine translation Javanese-Indonesian, but the quantity of parallel corpus had a greater influence than the quantity of monolingual corpus

2016 ◽  
Vol 1 (1) ◽  
pp. 45-49
Author(s):  
Avinash Singh ◽  
Asmeet Kour ◽  
Shubhnandan S. Jamwal

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.


2021 ◽  
Vol 8 (4) ◽  
pp. 2173-2186
Author(s):  
Quranul Alfahrezi Agigi

In this rapid technological development, there are still at least some machine translators from regional languages ​​to Indonesian. Therefore, this paper discusses to make a statistical translation machine for the Muna language into Indonesian because at least there are still at least a Muna translation machine into Indonesian. The approach used a statistically based using parallel corpus. In this study, the data taken came from a book entitled Folklore of Buton and Muna in Southeast Sulawesi and several folklore articles on the internet. The number of parallel corpus used is 1050 sentence lines and the monolingual corpus is 1351 sentence lines. The scenarios that will be carried out in this experiment are divided into two scenarios. Scenario 1 is testing on the parallel corpus (training) which is tested using the available sentence lines and these sentence lines will be added to each experiment, while the rest of the sentence lines that are owned will be used in the parallel corpus (testing). In scenario 2, the test is carried out by comparing the lines of the monolingual corpus sentences after subtracting and adding sentences. In order for scenario 2 to run, accuracy is needed in scenario 1 which is the best. The test was carried out 6 times using BLEU (Bilingual Evaluation Understudy) tools. From the results of the tests carried out, the best accuracy value is 29.83%.


Author(s):  
Rashmini Naranpanawa ◽  
Ravinga Perera ◽  
Thilakshi Fonseka ◽  
Uthayasanker Thayasivam

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.


2011 ◽  
Vol 95 (1) ◽  
pp. 87-106 ◽  
Author(s):  
Bushra Jawaid ◽  
Daniel Zeman

Word-Order Issues in English-to-Urdu Statistical Machine Translation We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experiments using the Moses SMT system and discuss reordering models available in Moses. We then present our novel, Urdu-aware, yet generalizable approach based on reordering phrases in syntactic parse tree of the source English sentence. Our technique significantly improves quality of English-Urdu translation with Moses, both in terms of BLEU score and of subjective human judgments.


2019 ◽  
Vol 28 (3) ◽  
pp. 447-453 ◽  
Author(s):  
Sainik Kumar Mahata ◽  
Dipankar Das ◽  
Sivaji Bandyopadhyay

Abstract Machine translation (MT) is the automatic translation of the source language to its target language by a computer system. In the current paper, we propose an approach of using recurrent neural networks (RNNs) over traditional statistical MT (SMT). We compare the performance of the phrase table of SMT to the performance of the proposed RNN and in turn improve the quality of the MT output. This work has been done as a part of the shared task problem provided by the MTIL2017. We have constructed the traditional MT model using Moses toolkit and have additionally enriched the language model using external data sets. Thereafter, we have ranked the phrase tables using an RNN encoder-decoder module created originally as a part of the GroundHog project of LISA lab.


2014 ◽  
Vol 3 (3) ◽  
pp. 65-72
Author(s):  
Ayana Kuandykova ◽  
Amandyk Kartbayev ◽  
Tannur Kaldybekov

2005 ◽  
Vol 31 (4) ◽  
pp. 477-504 ◽  
Author(s):  
Dragos Stefan Munteanu ◽  
Daniel Marcu

We present a novel method for discovering parallel sentences in comparable, non-parallel corpora. We train a maximum entropy classifier that, given a pair of sentences, can reliably determine whether or not they are translations of each other. Using this approach, we extract parallel data from large Chinese, Arabic, and English non-parallel newspaper corpora. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system. We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. Thus, our method can be applied with great benefit to language pairs for which only scarce resources are available.


2013 ◽  
Vol 791-793 ◽  
pp. 1622-1625
Author(s):  
Dan Han ◽  
Zhi Han Yu

In this article, we mainly introduce some basic concepts about machine translation. Machine translation means translating a natural language text to another by software. It can be divided into two categories: rule-based and corpus-based. IBM's statistical machine translation, Microsoft's multi-language machine translation project, AT & T's voice translation system and CMUs PANGLOSS system are three typical machine translation systems. Due to sentences are constructed by words continuously in Chinese. Chinese word segmentation is very essential. Three methods of Chinese word segmentation: segmentation methods based on string matching, segmentation method based on the understanding and segmentation method based on the statistics.


Sign in / Sign up

Export Citation Format

Share Document