The Challenges of Azerbaijani Transliteration on the Multilingual Internet

Several scripts have been adopted so far in Azerbaijan in different periods. Literary manuscripts and political documents developed in each of the adopted scripts are of great importance and have to be transliterated for the next generation. However, adoption of different scripts and their frequent changes lead to the emergence and dissemination of various transliteration versions on the web. This article touches upon the challenges of Azerbaijani-English transliteration process in real life and online. In this regard, the significance and dominating status of the English language on the Internet and throughout the globe is explored. Adoption of a unique transliteration standard for Azerbaijani language may contribute to the solution to this problem. The standards for the transliteration of the Azerbaijani language with others are analyzed, and the inevitability of a new approach to Azerbaijani-English transliteration is emphasized. Moreover, the article underlines future contribution of the proposed transliteration system for machine translation system of Azerbaijani-English language pair.

Download Full-text

Attention-Based Syllable Level Neural Machine Translation System for Myanmar to English Language Pair

International Journal on Natural Language Computing ◽

10.5121/ijnlc.2019.8201 ◽

2019 ◽

Vol 8 (2) ◽

pp. 01-11

Author(s):

Yi Mon Shwe Sin ◽

Khin Mar Soe

Keyword(s):

Machine Translation ◽

English Language ◽

Translation System ◽

Neural Machine Translation ◽

Machine Translation System ◽

Language Pair

Download Full-text

English-Dogri Translation System using MOSES

Circulation in Computer Science ◽

10.22632/ccs-2016-251-25 ◽

2016 ◽

Vol 1 (1) ◽

pp. 45-49

Author(s):

Avinash Singh ◽

Asmeet Kour ◽

Shubhnandan S. Jamwal

Keyword(s):

Natural Language Processing ◽

Machine Translation ◽

Language Processing ◽

Statistical Machine Translation ◽

Translation System ◽

Parallel Corpus ◽

English System ◽

Machine Translation System ◽

Translation Machine ◽

Language Pair

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.

Download Full-text

An Overview of the Shared Task on Machine Translation in Indian Languages (MTIL) – 2017

Journal of Intelligent Systems ◽

10.1515/jisys-2018-0024 ◽

2019 ◽

Vol 28 (3) ◽

pp. 455-464 ◽

Cited By ~ 4

Author(s):

M. Anand Kumar ◽

B. Premjith ◽

Shivkaran Singh ◽

S. Rajendran ◽

K. P. Soman

Keyword(s):

Machine Translation ◽

The Internet ◽

Translation System ◽

Indian Languages ◽

Shared Task ◽

European Languages ◽

Regional Language ◽

Human Evaluation ◽

Machine Translation System ◽

Translation Systems

Abstract In recent years, the multilingual content over the internet has grown exponentially together with the evolution of the internet. The usage of multilingual content is excluded from the regional language users because of the language barrier. So, machine translation between languages is the only possible solution to make these contents available for regional language users. Machine translation is the process of translating a text from one language to another. The machine translation system has been investigated well already in English and other European languages. However, it is still a nascent stage for Indian languages. This paper presents an overview of the Machine Translation in Indian Languages shared task conducted on September 7–8, 2017, at Amrita Vishwa Vidyapeetham, Coimbatore, India. This machine translation shared task in Indian languages is mainly focused on the development of English-Tamil, English-Hindi, English-Malayalam and English-Punjabi language pairs. This shared task aims at the following objectives: (a) to examine the state-of-the-art machine translation systems when translating from English to Indian languages; (b) to investigate the challenges faced in translating between English to Indian languages; (c) to create an open-source parallel corpus for Indian languages, which is lacking. Evaluating machine translation output is another challenging task especially for Indian languages. In this shared task, we have evaluated the participant’s outputs with the help of human annotators. As far as we know, this is the first shared task which depends completely on the human evaluation.

Download Full-text

N-gram based Machine Translation for English-Assamese: Two Languages with High Syntactical Dissimilarity

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b2320.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 2940-2949

Keyword(s):

Machine Translation ◽

Northeast India ◽

Translation System ◽

System Efficiency ◽

Translation Process ◽

The People ◽

Machine Translation System ◽

N Gram ◽

Assamese Language ◽

Language Pair

To bridge the language constraint of the people residing in northeastern region of India, machine translation system is a necessity. Large number of people in this region cannot access many services due to the language incomprehensibility. Among several languages spoken, Assamese is one of the major languages used in northeast India. Machine translation for Assamese language is limited compared to other languages. As a result, large number of people using Assamese language cannot avail lots of benefits associated with it. This paper has focused on the development of the English to Assamese translation system using n-gram model. The n-gram model works very well with the language pair having high dissimilarity in syntax compared to other models. The value of n has a very big role in the quality and efficiency of the system. Bilingual Evaluation Understudy (BLEU) score differs significantly with the change of the n-gram. This model uses tuples to reduce the consumption of excess memory and to accelerate the translation process. Parallel corpus has been used for training the n-gram based decoder called MARIE. The number of translation units extracted using n-gram model is much less than the translation units extracted using phrase based model. This has a high impact on system efficiency.

Download Full-text

Rule-Based Machine Translation for the Italian–Sardinian Language Pair

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0022 ◽

2017 ◽

Vol 108 (1) ◽

pp. 221-232

Author(s):

Francis M. Tyers ◽

Hèctor Alòs i Font ◽

Gianfranco Fronteddu ◽

Adrià Martín-Mor

Keyword(s):

Machine Translation ◽

Translation System ◽

Rule Based ◽

Romance Language ◽

Machine Translation System ◽

The Mediterranean ◽

Language Pair

AbstractThis paper describes the process of creation of the first machine translation system from Italian to Sardinian, a Romance language spoken on the island of Sardinia in the Mediterranean. The project was carried out by a team of translators and computational linguists. The article focuses on the technology used (Rule-Based Machine Translation) and on some of the rules created, as well as on the orthographic model used for Sardinian.

Download Full-text

Cadlaws – An English–French Parallel Corpus of Legally Equivalent Documents

Mutatis Mutandis Revista Latinoamericana de Traducción ◽

10.17533/udea.mut.v14n2a10 ◽

2021 ◽

Vol 14 (2) ◽

pp. 494-508

Author(s):

Francina Sole-Mauri ◽

Pilar Sánchez-Gijón ◽

Antoni Oliver

Keyword(s):

Machine Translation ◽

Translation System ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Legal Documents ◽

Legal Traditions ◽

Corpus Construction ◽

Machine Translation System ◽

French Corpus ◽

Language Pair

This article presents Cadlaws, a new English–French corpus built from Canadian legal documents, and describes the corpus construction process and preliminary statistics obtained from it. The corpus contains over 16 million words in each language and includes unique features since it is composed of documents that are legally equivalent in both languages but not the result of a translation. The corpus is built upon enactments co-drafted by two jurists to ensure legal equality of each version and to reflect the concepts, terms and institutions of two legal traditions. In this article the corpus definition as a parallel corpus instead of a comparable one is also discussed. Cadlaws has been pre-processed for machine translation and baseline Bilingual Evaluation Understudy (bleu), a score for comparing a candidate translation of text to a gold-standard translation of a neural machine translation system. To the best of our knowledge, this is the largest parallel corpus of texts which convey the same meaning in this language pair and is freely available for non-commercial use.

Download Full-text

Supervised Classification Based Machine Translation Quality Estimation

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c6029.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2697-2703

Keyword(s):

Machine Translation ◽

Supervised Classification ◽

Translation System ◽

Quality Estimation ◽

Neural Machine Translation ◽

Translation Quality ◽

Sentence Level ◽

Machine Translation System ◽

Language Text ◽

Language Pair

This submission describes the study of linguistically motivated features to estimate the translated sentence quality at sentence level on English-Hindi language pair. Several classification algorithms are employed to build the Quality Estimation (QE) models using the extracted features. We used source language text and the MT output to extract these features. Experiments show that our proposed approach is robust and producing competitive results for the DT based QE model on neural machine translation system.

Download Full-text

Machine Translation System for Hindi-Dogri Language Pair

2013 International Conference on Machine Intelligence and Research Advancement ◽

10.1109/icmira.2013.89 ◽

2013 ◽

Cited By ~ 2

Author(s):

Preeti Dubey ◽

Devanand

Keyword(s):

Machine Translation ◽

Translation System ◽

Machine Translation System ◽

Language Pair

Download Full-text

Hindi Chhattisgarhi Machine Translation System Using Statistical Approach

Webology ◽

10.14704/web/v18si02/web18067 ◽

2021 ◽

Vol 18 (Special Issue 02) ◽

pp. 208-222

Author(s):

Vikas Pandey ◽

Dr.M.V. Padmavati ◽

Dr. Ramesh Kumar

Keyword(s):

Machine Translation ◽

Language Processing ◽

Statistical Approach ◽

Statistical Machine Translation ◽

Target Language ◽

Translation System ◽

Parallel Corpus ◽

Machine Translation System ◽

Unknown Words ◽

Language Pair

Machine Translation is a subfield of Natural language Processing (NLP) which uses to translate source language to target language. In this paper an attempt has been made to make a Hindi Chhattisgarhi machine translation system which is based on statistical approach. In the state of Chhattisgarh there is a long awaited need for Hindi to Chhattisgarhi machine translation system for converting Hindi into Chhattisgarhi especially for non Chhattisgarhi speaking people. In order to develop Hindi Chhattisgarhi statistical machine translation system an open source software called Moses is used. Moses is a statistical machine translation system and used to automatically train the translation model for Hindi Chhattisgarhi language pair called as parallel corpus. A collection of structured text to study linguistic properties is called corpus. This machine translation system works on parallel corpus of 40,000 Hindi-Chhattisgarhi bilingual sentences. In order to overcome translation problem related to proper noun and unknown words, a transliteration system is also embedded in it. These sentences are extracted from various domains like stories, novels, text books and news papers etc. This system is tested on 1000 sentences to check the grammatical correctness of sentences and it was found that an accuracy of 75% is achieved.

Download Full-text

wEBMT: Developing and Validating an Example-Based Machine Translation System Using the World Wide Web

Computational Linguistics ◽

10.1162/089120103322711596 ◽

2003 ◽

Vol 29 (3) ◽

pp. 421-457 ◽

Cited By ~ 13

Author(s):

Andy Way ◽

Nano Gough

Keyword(s):

World Wide Web ◽

Machine Translation ◽

World Wide ◽

Translation System ◽

Small Subset ◽

Translation Process ◽

The World ◽

Machine Translation System ◽

On Line ◽

The Web

We have developed an example-based machine translation (EBMT) system that uses the World Wide Web for two different purposes: First, we populate the system's memory with translations gathered from rule-based MT systems located on the Web. The source strings input to these systems were extracted automatically from an extremely small subset of the rule types in the Penn-II Treebank. In subsequent stages, the source, target translation pairs obtained are automatically transformed into a series of resources that render the translation process more successful. Despite the fact that the output from on-line MT systems is often faulty, we demonstrate in a number of experiments that when used to seed the memories of an EBMT system, they can in fact prove useful in generating translations of high quality in a robust fashion. In addition, we demonstrate the relative gain of EBMT in comparison to on-line systems. Second, despite the perception that the documents available on the Web are of questionable quality, we demonstrate in contrast that such resources are extremely useful in automatically postediting translation candidates proposed by our system.

Download Full-text