scholarly journals Attention-Based Syllable Level Neural Machine Translation System for Myanmar to English Language Pair

2019 ◽  
Vol 8 (2) ◽  
pp. 01-11
Author(s):  
Yi Mon Shwe Sin ◽  
Khin Mar Soe
2021 ◽  
Vol 14 (2) ◽  
pp. 494-508
Author(s):  
Francina Sole-Mauri ◽  
Pilar Sánchez-Gijón ◽  
Antoni Oliver

This article presents Cadlaws, a new English–French corpus built from Canadian legal documents, and describes the corpus construction process and preliminary statistics obtained from it. The corpus contains over 16 million words in each language and includes unique features since it is composed of documents that are legally equivalent in both languages but not the result of a translation. The corpus is built upon enactments co-drafted by two jurists to ensure legal equality of each version and to re­flect the concepts, terms and institutions of two legal traditions. In this article the corpus definition as a parallel corpus instead of a comparable one is also discussed. Cadlaws has been pre-processed for machine translation and baseline Bilingual Evaluation Understudy (bleu), a score for comparing a candidate translation of text to a gold-standard translation of a neural machine translation system. To the best of our knowledge, this is the largest parallel corpus of texts which convey the same meaning in this language pair and is freely available for non-commercial use.


This submission describes the study of linguistically motivated features to estimate the translated sentence quality at sentence level on English-Hindi language pair. Several classification algorithms are employed to build the Quality Estimation (QE) models using the extracted features. We used source language text and the MT output to extract these features. Experiments show that our proposed approach is robust and producing competitive results for the DT based QE model on neural machine translation system.


Author(s):  
Sabina Mammadzada

Several scripts have been adopted so far in Azerbaijan in different periods. Literary manuscripts and political documents developed in each of the adopted scripts are of great importance and have to be transliterated for the next generation. However, adoption of different scripts and their frequent changes lead to the emergence and dissemination of various transliteration versions on the web. This article touches upon the challenges of Azerbaijani-English transliteration process in real life and online. In this regard, the significance and dominating status of the English language on the Internet and throughout the globe is explored. Adoption of a unique transliteration standard for Azerbaijani language may contribute to the solution to this problem. The standards for the transliteration of the Azerbaijani language with others are analyzed, and the inevitability of a new approach to Azerbaijani-English transliteration is emphasized. Moreover, the article underlines future contribution of the proposed transliteration system for machine translation system of Azerbaijani-English language pair.


2020 ◽  
Vol 2020 ◽  
pp. 1-7 ◽  
Author(s):  
Thien Nguyen ◽  
Huu Nguyen ◽  
Phuoc Tran

Building the first Russian-Vietnamese neural machine translation system, we faced the problem of choosing a translation unit system on which source and target embeddings are based. Available homogeneous translation unit systems with the same translation unit on the source and target sides do not perfectly suit the investigated language pair. To solve the problem, in this paper, we propose a novel heterogeneous translation unit system, considering linguistic characteristics of the synthetic Russian language and the analytic Vietnamese language. Specifically, we decrease the embedding level on the source side by splitting token into subtokens and increase the embedding level on the target side by merging neighboring tokens into supertoken. The experiment results show that the proposed heterogeneous system improves over the existing best homogeneous Russian-Vietnamese translation system by 1.17 BLEU. Our approach could be applied to building translation bots for language pairs with different linguistic characteristics.


2016 ◽  
Vol 1 (1) ◽  
pp. 45-49
Author(s):  
Avinash Singh ◽  
Asmeet Kour ◽  
Shubhnandan S. Jamwal

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.


2016 ◽  
Vol 5 (4) ◽  
pp. 51-66 ◽  
Author(s):  
Krzysztof Wolk ◽  
Krzysztof P. Marasek

The quality of machine translation is rapidly evolving. Today one can find several machine translation systems on the web that provide reasonable translations, although the systems are not perfect. In some specific domains, the quality may decrease. A recently proposed approach to this domain is neural machine translation. It aims at building a jointly-tuned single neural network that maximizes translation performance, a very different approach from traditional statistical machine translation. Recently proposed neural machine translation models often belong to the encoder-decoder family in which a source sentence is encoded into a fixed length vector that is, in turn, decoded to generate a translation. The present research examines the effects of different training methods on a Polish-English Machine Translation system used for medical data. The European Medicines Agency parallel text corpus was used as the basis for training of neural and statistical network-based translation systems. A comparison and implementation of a medical translator is the main focus of our experiments.


2020 ◽  
Vol 30 (01) ◽  
pp. 2050002
Author(s):  
Taichi Aida ◽  
Kazuhide Yamamoto

Current methods of neural machine translation may generate sentences with different levels of quality. Methods for automatically evaluating translation output from machine translation can be broadly classified into two types: a method that uses human post-edited translations for training an evaluation model, and a method that uses a reference translation that is the correct answer during evaluation. On the one hand, it is difficult to prepare post-edited translations because it is necessary to tag each word in comparison with the original translated sentences. On the other hand, users who actually employ the machine translation system do not have a correct reference translation. Therefore, we propose a method that trains the evaluation model without using human post-edited sentences and in the test set, estimates the quality of output sentences without using reference translations. We define some indices and predict the quality of translations with a regression model. For the quality of the translated sentences, we employ the BLEU score calculated from the number of word [Formula: see text]-gram matches between the translated sentence and the reference translation. After that, we compute the correlation between quality scores predicted by our method and BLEU actually computed from references. According to the experimental results, the correlation with BLEU is the highest when XGBoost uses all the indices. Moreover, looking at each index, we find that the sentence log-likelihood and the model uncertainty, which are based on the joint probability of generating the translated sentence, are important in BLEU estimation.


To bridge the language constraint of the people residing in northeastern region of India, machine translation system is a necessity. Large number of people in this region cannot access many services due to the language incomprehensibility. Among several languages spoken, Assamese is one of the major languages used in northeast India. Machine translation for Assamese language is limited compared to other languages. As a result, large number of people using Assamese language cannot avail lots of benefits associated with it. This paper has focused on the development of the English to Assamese translation system using n-gram model. The n-gram model works very well with the language pair having high dissimilarity in syntax compared to other models. The value of n has a very big role in the quality and efficiency of the system. Bilingual Evaluation Understudy (BLEU) score differs significantly with the change of the n-gram. This model uses tuples to reduce the consumption of excess memory and to accelerate the translation process. Parallel corpus has been used for training the n-gram based decoder called MARIE. The number of translation units extracted using n-gram model is much less than the translation units extracted using phrase based model. This has a high impact on system efficiency.


Sign in / Sign up

Export Citation Format

Share Document