Phrase Based Statistical Machine Translation Javanese-Indonesian

This research aims to produce a statistical machine translation that can be implemented to perform Javanese-Indonesian translation and to know the influence of the main data sources of statistical machine translation namely parallel corpus and monolingual corpus on the quality of Javanese-Indonesian statistical machine translation. The testing was carried out by gradually adding the quantity of parallel corpus and monolingual corpus to seven configurations of Javanese-Indonesian statistical machine translation. All machine translation configuration experiments were tested with test data totaling 500 lines of Javanese sentences. Results from machine translation are evaluated automatically using Bilingual Evaluation Understudy (BLEU). Test results in seven configurations showed an increase in the evaluation value of the translation machine after the quantity of parallel corpus and monolingual corpus was added. The quantity of parallel corpus in configurations 1 and 2 increased by 3,6%, configurations 2 and 3 increased by 8,23%, configurations 3 and 7 increased by 14,92%. Additional monolingual corpus quantity in configurations 4 and 5 increased BLEU score by 0,18%, configurations 5 and 6 increased by 0,06%, configurations 6 and 7 increased by 0,24%. The test results showed that the quantity of parallel corpus and monolingual corpus could increase the evaluation value of statistical machine translation Javanese-Indonesian, but the quantity of parallel corpus had a greater influence than the quantity of monolingual corpus

Download Full-text

English-Dogri Translation System using MOSES

Circulation in Computer Science ◽

10.22632/ccs-2016-251-25 ◽

2016 ◽

Vol 1 (1) ◽

pp. 45-49

Author(s):

Avinash Singh ◽

Asmeet Kour ◽

Shubhnandan S. Jamwal

Keyword(s):

Natural Language Processing ◽

Machine Translation ◽

Language Processing ◽

Statistical Machine Translation ◽

Translation System ◽

Parallel Corpus ◽

English System ◽

Machine Translation System ◽

Translation Machine ◽

Language Pair

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.

Download Full-text

Language Statistical Machine Translation Muna to Indonesia Language

JATISI (Jurnal Teknik Informatika dan Sistem Informasi) ◽

10.35957/jatisi.v8i4.1149 ◽

2021 ◽

Vol 8 (4) ◽

pp. 2173-2186

Author(s):

Quranul Alfahrezi Agigi

Keyword(s):

Machine Translation ◽

Technological Development ◽

Statistical Machine Translation ◽

The Internet ◽

Parallel Corpus ◽

Translation Machine ◽

Regional Languages

In this rapid technological development, there are still at least some machine translators from regional languages to Indonesian. Therefore, this paper discusses to make a statistical translation machine for the Muna language into Indonesian because at least there are still at least a Muna translation machine into Indonesian. The approach used a statistically based using parallel corpus. In this study, the data taken came from a book entitled Folklore of Buton and Muna in Southeast Sulawesi and several folklore articles on the internet. The number of parallel corpus used is 1050 sentence lines and the monolingual corpus is 1351 sentence lines. The scenarios that will be carried out in this experiment are divided into two scenarios. Scenario 1 is testing on the parallel corpus (training) which is tested using the available sentence lines and these sentence lines will be added to each experiment, while the rest of the sentence lines that are owned will be used in the parallel corpus (testing). In scenario 2, the test is carried out by comparing the lines of the monolingual corpus sentences after subtracting and adding sentences. In order for scenario 2 to run, accuracy is needed in scenario 1 which is the best. The test was carried out 6 times using BLEU (Bilingual Evaluation Understudy) tools. From the results of the tests carried out, the best accuracy value is 29.83%.

Download Full-text

Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation

International Journal of Asian Language Processing ◽

10.1142/s2717554520500174 ◽

2021 ◽

pp. 2050017

Author(s):

Rashmini Naranpanawa ◽

Ravinga Perera ◽

Thilakshi Fonseka ◽

Uthayasanker Thayasivam

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Translation System ◽

Rare Word ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Low Resource ◽

Word Level ◽

Morphologically Rich Languages

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.

Download Full-text

Word-Order Issues in English-to-Urdu Statistical Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.2478/v10108-011-0007-0 ◽

2011 ◽

Vol 95 (1) ◽

pp. 87-106 ◽

Cited By ~ 3

Author(s):

Bushra Jawaid ◽

Daniel Zeman

Keyword(s):

Machine Translation ◽

Word Order ◽

Statistical Machine Translation ◽

Parse Tree ◽

Hard Problem ◽

Long Distance ◽

Translation Process ◽

English Sentence ◽

European Languages

Word-Order Issues in English-to-Urdu Statistical Machine Translation We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experiments using the Moses SMT system and discuss reordering models available in Moses. We then present our novel, Urdu-aware, yet generalizable approach based on reordering phrases in syntactic parse tree of the source English sentence. Our technique significantly improves quality of English-Urdu translation with Moses, both in terms of BLEU score and of subjective human judgments.

Download Full-text

MTIL2017: Machine Translation Using Recurrent Neural Network on Statistical Machine Translation

Journal of Intelligent Systems ◽

10.1515/jisys-2018-0016 ◽

2019 ◽

Vol 28 (3) ◽

pp. 447-453 ◽

Cited By ~ 5

Author(s):

Sainik Kumar Mahata ◽

Dipankar Das ◽

Sivaji Bandyopadhyay

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Language Model ◽

Target Language ◽

Data Sets ◽

Shared Task ◽

Automatic Translation ◽

External Data ◽

Statistical Mt

Abstract Machine translation (MT) is the automatic translation of the source language to its target language by a computer system. In the current paper, we propose an approach of using recurrent neural networks (RNNs) over traditional statistical MT (SMT). We compare the performance of the phrase table of SMT to the performance of the proposed RNN and in turn improve the quality of the MT output. This work has been done as a part of the shared task problem provided by the MTIL2017. We have constructed the traditional MT model using Moses toolkit and have additionally enriched the language model using external data sets. Thereafter, we have ranked the phrase tables using an RNN encoder-decoder module created originally as a part of the GroundHog project of LISA lab.

Download Full-text

English-Kazakh Parallel Corpus For Statistical Machine Translation

International Journal on Natural Language Computing ◽

10.5121/ijnlc.2014.3306 ◽

2014 ◽

Vol 3 (3) ◽

pp. 65-72

Author(s):

Ayana Kuandykova ◽

Amandyk Kartbayev ◽

Tannur Kaldybekov

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Parallel Corpus

Download Full-text

Enhancing the Quality of Phrase-Table in Statistical Machine Translation for Less-Common and Low-Resource Languages

2018 International Conference on Asian Language Processing (IALP) ◽

10.1109/ialp.2018.8629188 ◽

2018 ◽

Author(s):

Minh-Thuan Nguyen ◽

Van TanBui ◽

Huy-Hien Vu ◽

Phuong-Thai Nguven ◽

Chi-Mai Luong

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Low Resource

Download Full-text

Improving Machine Translation Performance by Exploiting Non-Parallel Corpora

Computational Linguistics ◽

10.1162/089120105775299168 ◽

2005 ◽

Vol 31 (4) ◽

pp. 477-504 ◽

Cited By ~ 104

Author(s):

Dragos Stefan Munteanu ◽

Daniel Marcu

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Translation System ◽

Parallel Corpora ◽

Parallel Corpus ◽

Scarce Resources ◽

Parallel Data ◽

Machine Translation System ◽

Novel Method ◽

Arabic And English

We present a novel method for discovering parallel sentences in comparable, non-parallel corpora. We train a maximum entropy classifier that, given a pair of sentences, can reliably determine whether or not they are translations of each other. Using this approach, we extract parallel data from large Chinese, Arabic, and English non-parallel newspaper corpora. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system. We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. Thus, our method can be applied with great benefit to language pairs for which only scarce resources are available.

Download Full-text

Refining semi-automatic parallel corpus creation for Zulu to English statistical machine translation

2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech) ◽

10.1109/robomech.2016.7813168 ◽

2016 ◽

Author(s):

Gideon Kotze

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Parallel Corpus ◽

Corpus Creation

Download Full-text

The Critical Technology Development Status of Machine Translation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.791-793.1622 ◽

2013 ◽

Vol 791-793 ◽

pp. 1622-1625

Author(s):

Dan Han ◽

Zhi Han Yu

Keyword(s):

Machine Translation ◽

Technology Development ◽

Statistical Machine Translation ◽

Word Segmentation ◽

Translation System ◽

Chinese Word ◽

Segmentation Method ◽

Chinese Word Segmentation ◽

Critical Technology ◽

Translation Machine

In this article, we mainly introduce some basic concepts about machine translation. Machine translation means translating a natural language text to another by software. It can be divided into two categories: rule-based and corpus-based. IBM's statistical machine translation, Microsoft's multi-language machine translation project, AT & T's voice translation system and CMUs PANGLOSS system are three typical machine translation systems. Due to sentences are constructed by words continuously in Chinese. Chinese word segmentation is very essential. Three methods of Chinese word segmentation: segmentation methods based on string matching, segmentation method based on the understanding and segmentation method based on the statistics.

Download Full-text