A Novel Deep Learning Method for Obtaining Bilingual Corpus from Multilingual Website

Machine translation needs a large number of parallel sentence pairs to make sure of having a good translation performance. However, the lack of parallel corpus heavily limits machine translation for low-resources language pairs. We propose a novel method that combines the continuous word embeddings with deep learning to obtain parallel sentences. Since parallel sentences are very invaluable for low-resources language pair, we introduce cross-lingual semantic representation to induce bilingual signals. Our experiments show that we can achieve promising results under lacking external resources for low-resource languages. Finally, we construct a state-of-the-art machine translation system in low-resources language pair.

Download Full-text

English-Dogri Translation System using MOSES

Circulation in Computer Science ◽

10.22632/ccs-2016-251-25 ◽

2016 ◽

Vol 1 (1) ◽

pp. 45-49

Author(s):

Avinash Singh ◽

Asmeet Kour ◽

Shubhnandan S. Jamwal

Keyword(s):

Natural Language Processing ◽

Machine Translation ◽

Language Processing ◽

Statistical Machine Translation ◽

Translation System ◽

Parallel Corpus ◽

English System ◽

Machine Translation System ◽

Translation Machine ◽

Language Pair

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.

Download Full-text

Attention-Based Syllable Level Neural Machine Translation System for Myanmar to English Language Pair

International Journal on Natural Language Computing ◽

10.5121/ijnlc.2019.8201 ◽

2019 ◽

Vol 8 (2) ◽

pp. 01-11

Author(s):

Yi Mon Shwe Sin ◽

Khin Mar Soe

Keyword(s):

Machine Translation ◽

English Language ◽

Translation System ◽

Neural Machine Translation ◽

Machine Translation System ◽

Language Pair

Download Full-text

Improving Machine Translation Performance by Exploiting Non-Parallel Corpora

Computational Linguistics ◽

10.1162/089120105775299168 ◽

2005 ◽

Vol 31 (4) ◽

pp. 477-504 ◽

Cited By ~ 104

Author(s):

Dragos Stefan Munteanu ◽

Daniel Marcu

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Translation System ◽

Parallel Corpora ◽

Parallel Corpus ◽

Scarce Resources ◽

Parallel Data ◽

Machine Translation System ◽

Novel Method ◽

Arabic And English

We present a novel method for discovering parallel sentences in comparable, non-parallel corpora. We train a maximum entropy classifier that, given a pair of sentences, can reliably determine whether or not they are translations of each other. Using this approach, we extract parallel data from large Chinese, Arabic, and English non-parallel newspaper corpora. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system. We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. Thus, our method can be applied with great benefit to language pairs for which only scarce resources are available.

Download Full-text

State-of-the-art English to Persian Statistical Machine Translation system

The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012) ◽

10.1109/aisp.2012.6313739 ◽

2012 ◽

Cited By ~ 6

Author(s):

Amin Mansouri ◽

Heshaam Faili

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Translation System ◽

Machine Translation System

Download Full-text

N-gram based Machine Translation for English-Assamese: Two Languages with High Syntactical Dissimilarity

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b2320.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 2940-2949

Keyword(s):

Machine Translation ◽

Northeast India ◽

Translation System ◽

System Efficiency ◽

Translation Process ◽

The People ◽

Machine Translation System ◽

N Gram ◽

Assamese Language ◽

Language Pair

To bridge the language constraint of the people residing in northeastern region of India, machine translation system is a necessity. Large number of people in this region cannot access many services due to the language incomprehensibility. Among several languages spoken, Assamese is one of the major languages used in northeast India. Machine translation for Assamese language is limited compared to other languages. As a result, large number of people using Assamese language cannot avail lots of benefits associated with it. This paper has focused on the development of the English to Assamese translation system using n-gram model. The n-gram model works very well with the language pair having high dissimilarity in syntax compared to other models. The value of n has a very big role in the quality and efficiency of the system. Bilingual Evaluation Understudy (BLEU) score differs significantly with the change of the n-gram. This model uses tuples to reduce the consumption of excess memory and to accelerate the translation process. Parallel corpus has been used for training the n-gram based decoder called MARIE. The number of translation units extracted using n-gram model is much less than the translation units extracted using phrase based model. This has a high impact on system efficiency.

Download Full-text

Machine Translation System Using Deep Learning for English to Urdu

Computational Intelligence and Neuroscience ◽

10.1155/2022/7873012 ◽

2022 ◽

Vol 2022 ◽

pp. 1-11

Author(s):

Syed Abdul Basit Andrabi ◽

Abdul Wahid

Keyword(s):

Deep Learning ◽

Machine Translation ◽

Communication Technology ◽

Target Language ◽

Translation System ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Proposed Model ◽

Learning Technique ◽

Machine Translation System

Machine translation is an ongoing field of research from the last decades. The main aim of machine translation is to remove the language barrier. Earlier research in this field started with the direct word-to-word replacement of source language by the target language. Later on, with the advancement in computer and communication technology, there was a paradigm shift to data-driven models like statistical and neural machine translation approaches. In this paper, we have used a neural network-based deep learning technique for English to Urdu languages. Parallel corpus sizes of around 30923 sentences are used. The corpus contains sentences from English-Urdu parallel corpus, news, and sentences which are frequently used in day-to-day life. The corpus contains 542810 English tokens and 540924 Urdu tokens, and the proposed system is trained and tested using 70 : 30 criteria. In order to evaluate the efficiency of the proposed system, several automatic evaluation metrics are used, and the model output is also compared with the output from Google Translator. The proposed model has an average BLEU score of 45.83.

Download Full-text

Machine Translation System Using Deep Learning for Punjabi to English

Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences - Algorithms for Intelligent Systems ◽

10.1007/978-981-15-7533-4_69 ◽

2021 ◽

pp. 865-878

Author(s):

Kamal Deep ◽

Ajit Kumar ◽

Vishal Goyal

Keyword(s):

Deep Learning ◽

Machine Translation ◽

Translation System ◽

Machine Translation System

Download Full-text

Rule-Based Machine Translation for the Italian–Sardinian Language Pair

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0022 ◽

2017 ◽

Vol 108 (1) ◽

pp. 221-232

Author(s):

Francis M. Tyers ◽

Hèctor Alòs i Font ◽

Gianfranco Fronteddu ◽

Adrià Martín-Mor

Keyword(s):

Machine Translation ◽

Translation System ◽

Rule Based ◽

Romance Language ◽

Machine Translation System ◽

The Mediterranean ◽

Language Pair

AbstractThis paper describes the process of creation of the first machine translation system from Italian to Sardinian, a Romance language spoken on the island of Sardinia in the Mediterranean. The project was carried out by a team of translators and computational linguists. The article focuses on the technology used (Rule-Based Machine Translation) and on some of the rules created, as well as on the orthographic model used for Sardinian.

Download Full-text

Otedama: Fast Rule-Based Pre-Ordering for Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2016-0015 ◽

2016 ◽

Vol 106 (1) ◽

pp. 159-168 ◽

Cited By ~ 1

Author(s):

Julian Hitschler ◽

Laura Jehl ◽

Sariya Karimova ◽

Mayumi Ohta ◽

Benjamin Körner ◽

...

Keyword(s):

Open Source ◽

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Training Data ◽

Translation System ◽

Rule Based ◽

Machine Translation System ◽

Target Languages ◽

Established Technique

Abstract We present Otedama, a fast, open-source tool for rule-based syntactic pre-ordering, a well established technique in statistical machine translation. Otedama implements both a learner for pre-ordering rules, as well as a component for applying these rules to parsed sentences. Our system is compatible with several external parsers and capable of accommodating many source and all target languages in any machine translation paradigm which uses parallel training data. We demonstrate improvements on a patent translation task over a state-of-the-art English-Japanese hierarchical phrase-based machine translation system. We compare Otedama with an existing syntax-based pre-ordering system, showing comparable translation performance at a runtime speedup of a factor of 4.5-10.

Download Full-text

Cadlaws – An English–French Parallel Corpus of Legally Equivalent Documents

Mutatis Mutandis Revista Latinoamericana de Traducción ◽

10.17533/udea.mut.v14n2a10 ◽

2021 ◽

Vol 14 (2) ◽

pp. 494-508

Author(s):

Francina Sole-Mauri ◽

Pilar Sánchez-Gijón ◽

Antoni Oliver

Keyword(s):

Machine Translation ◽

Translation System ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Legal Documents ◽

Legal Traditions ◽

Corpus Construction ◽

Machine Translation System ◽

French Corpus ◽

Language Pair

This article presents Cadlaws, a new English–French corpus built from Canadian legal documents, and describes the corpus construction process and preliminary statistics obtained from it. The corpus contains over 16 million words in each language and includes unique features since it is composed of documents that are legally equivalent in both languages but not the result of a translation. The corpus is built upon enactments co-drafted by two jurists to ensure legal equality of each version and to reflect the concepts, terms and institutions of two legal traditions. In this article the corpus definition as a parallel corpus instead of a comparable one is also discussed. Cadlaws has been pre-processed for machine translation and baseline Bilingual Evaluation Understudy (bleu), a score for comparing a candidate translation of text to a gold-standard translation of a neural machine translation system. To the best of our knowledge, this is the largest parallel corpus of texts which convey the same meaning in this language pair and is freely available for non-commercial use.

Download Full-text