A machine translation system for the target language inexpert

This paper proposes a method for improving the results of a statistical Machine Translation system using boundedness, a pragmatic component of the verbal phrase’s lexical aspect. First, the paper presents manual and automatic annotation experiments for lexical aspect in English-French parallel corpora. It will be shown that this aspectual property is identified and classified with ease both by humans and by automatic systems. Second, Statistical Machine Translation experiments using the boundedness annotations are presented. These experiments show that the information regarding lexical aspect is useful to improve the output of a Machine Translation system in terms of better choices of verbal tenses in the target language, as well as better lexical choices. Ultimately, this work aims at providing a method for the automatic annotation of data with boundedness information and at contributing to Machine Translation by taking into account linguistic data.

Download Full-text

Factors Behind the Effectiveness of an Unsupervised Neural Machine Translation System between Korean and Japanese

Applied Sciences ◽

10.3390/app11167662 ◽

2021 ◽

Vol 11 (16) ◽

pp. 7662

Author(s):

Yong-Seok Choi ◽

Yo-Han Park ◽

Seung Yun ◽

Sang-Hun Kim ◽

Kong-Joo Lee

Keyword(s):

Machine Translation ◽

Word Order ◽

Target Language ◽

Translation System ◽

Generation Model ◽

Neural Machine Translation ◽

Language Differences ◽

Machine Translation System ◽

The Common ◽

Morphological System

Korean and Japanese have different writing scripts but share the same Subject-Object-Verb (SOV) word order. In this study, we pre-train a language-generation model using a Masked Sequence-to-Sequence pre-training (MASS) method on Korean and Japanese monolingual corpora. When building the pre-trained generation model, we allow the smallest number of shared vocabularies between the two languages. Then, we build an unsupervised Neural Machine Translation (NMT) system between Korean and Japanese based on the pre-trained generation model. Despite the different writing scripts and few shared vocabularies, the unsupervised NMT system performs well compared to other pairs of languages. Our interest is in the common characteristics of both languages that make the unsupervised NMT perform so well. In this study, we propose a new method to analyze cross-attentions between a source and target language to estimate the language differences from the perspective of machine translation. We calculate cross-attention measurements between Korean–Japanese and Korean–English pairs and compare their performances and characteristics. The Korean–Japanese pair has little difference in word order and a morphological system, and thus the unsupervised NMT between Korean and Japanese can be trained well even without parallel sentences and shared vocabularies.

Download Full-text

Machine Translation System Using Deep Learning for English to Urdu

Computational Intelligence and Neuroscience ◽

10.1155/2022/7873012 ◽

2022 ◽

Vol 2022 ◽

pp. 1-11

Author(s):

Syed Abdul Basit Andrabi ◽

Abdul Wahid

Keyword(s):

Deep Learning ◽

Machine Translation ◽

Communication Technology ◽

Target Language ◽

Translation System ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Proposed Model ◽

Learning Technique ◽

Machine Translation System

Machine translation is an ongoing field of research from the last decades. The main aim of machine translation is to remove the language barrier. Earlier research in this field started with the direct word-to-word replacement of source language by the target language. Later on, with the advancement in computer and communication technology, there was a paradigm shift to data-driven models like statistical and neural machine translation approaches. In this paper, we have used a neural network-based deep learning technique for English to Urdu languages. Parallel corpus sizes of around 30923 sentences are used. The corpus contains sentences from English-Urdu parallel corpus, news, and sentences which are frequently used in day-to-day life. The corpus contains 542810 English tokens and 540924 Urdu tokens, and the proposed system is trained and tested using 70 : 30 criteria. In order to evaluate the efficiency of the proposed system, several automatic evaluation metrics are used, and the model output is also compared with the output from Google Translator. The proposed model has an average BLEU score of 45.83.

Download Full-text

Word Sense Disambiguation Using Target Language Corpus in a Machine Translation System

Digital Scholarship in the Humanities ◽

10.1093/llc/fqi029 ◽

2005 ◽

Vol 20 (2) ◽

pp. 237-249 ◽

Cited By ~ 5

Author(s):

Tayebeh Mosavi Miangah ◽

Ali Delavar Khalafi

Keyword(s):

Machine Translation ◽

Word Sense Disambiguation ◽

Target Language ◽

Translation System ◽

Word Sense ◽

Sense Disambiguation ◽

Machine Translation System ◽

Language Corpus

Download Full-text

A Bicolano-to-Tagalog Transfer-Based Machine Translation System

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1062.0882s819 ◽

2019 ◽

Vol 8 (2S8) ◽

pp. 1324-1330

Keyword(s):

Machine Translation ◽

Syntactic Structure ◽

Internal Representation ◽

Target Language ◽

Translation System ◽

Source Language ◽

Three Phase ◽

Transfer Rules ◽

Machine Translation System ◽

Overall Performance

The Bicolano-Tagalog Transfer-based Machine Translation System is a unidirectional machine translator for languages Bicolano and Tagalog. The transfer-based approach is divided into three phase: Pre-Processing Analysis, Morphological Transfer, and Sentence Generation. The system analyze first the source language (Bicolano) input to create some internal representation. This includes the tokenizer, stemmer, POS tag and parser. Through transfer rules, it then typically manipulates this internal representation to transfer parsed source language syntactic structure into target language syntactic structure. Finally, the system generates Tagalog sentence from own morphological and syntactic information. Each phase will undergo training and evaluation test for the competence of end-results. Overall performance shows a 71.71% accuracy rate.

Download Full-text

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00065 ◽

2017 ◽

Vol 5 ◽

pp. 339-351 ◽

Cited By ~ 159

Author(s):

Melvin Johnson ◽

Mike Schuster ◽

Quoc V. Le ◽

Maxim Krikun ◽

Yonghui Wu ◽

...

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Target Language ◽

Translation System ◽

Single Model ◽

Neural Machine Translation ◽

Comparable Performance ◽

Machine Translation System ◽

Input Sentence ◽

Multiple Languages

We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no changes to the model architecture from a standard NMT system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT systems using a single model. On the WMT’14 benchmarks, a single multilingual model achieves comparable performance for English→French and surpasses state-of-theart results for English→German. Similarly, a single multilingual model surpasses state-of-the-art results for French→English and German→English on WMT’14 and WMT’15 benchmarks, respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. Our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and also show some interesting examples when mixing languages.

Download Full-text

Hindi Chhattisgarhi Machine Translation System Using Statistical Approach

Webology ◽

10.14704/web/v18si02/web18067 ◽

2021 ◽

Vol 18 (Special Issue 02) ◽

pp. 208-222

Author(s):

Vikas Pandey ◽

Dr.M.V. Padmavati ◽

Dr. Ramesh Kumar

Keyword(s):

Machine Translation ◽

Language Processing ◽

Statistical Approach ◽

Statistical Machine Translation ◽

Target Language ◽

Translation System ◽

Parallel Corpus ◽

Machine Translation System ◽

Unknown Words ◽

Language Pair

Machine Translation is a subfield of Natural language Processing (NLP) which uses to translate source language to target language. In this paper an attempt has been made to make a Hindi Chhattisgarhi machine translation system which is based on statistical approach. In the state of Chhattisgarh there is a long awaited need for Hindi to Chhattisgarhi machine translation system for converting Hindi into Chhattisgarhi especially for non Chhattisgarhi speaking people. In order to develop Hindi Chhattisgarhi statistical machine translation system an open source software called Moses is used. Moses is a statistical machine translation system and used to automatically train the translation model for Hindi Chhattisgarhi language pair called as parallel corpus. A collection of structured text to study linguistic properties is called corpus. This machine translation system works on parallel corpus of 40,000 Hindi-Chhattisgarhi bilingual sentences. In order to overcome translation problem related to proper noun and unknown words, a transliteration system is also embedded in it. These sentences are extracted from various domains like stories, novels, text books and news papers etc. This system is tested on 1000 sentences to check the grammatical correctness of sentences and it was found that an accuracy of 75% is achieved.

Download Full-text

Memory-Based Machine Translation and Language Modeling

Prague Bulletin of Mathematical Linguistics ◽

10.2478/v10108-009-0012-8 ◽

2009 ◽

Vol 91 (1) ◽

pp. 17-26 ◽

Cited By ~ 5

Author(s):

Antal van den Bosch ◽

Peter Berck

Keyword(s):

Open Source ◽

Machine Translation ◽

Language Model ◽

Language Modeling ◽

Target Language ◽

Translation System ◽

Reference Guide ◽

Fast Training ◽

Open Source Software Package ◽

Machine Translation System

Memory-Based Machine Translation and Language Modeling We describe a freely available open source memory-based machine translation system, mbmt. Its translation model is a fast approximate memory-based classifier, trained to map trigrams of source-language words onto trigrams of target-language words. In a second decoding step, the predicted trigrams are rearranged according to their overlap, and candidate output sequences are ranked according to a memory-based language model. We report on the scaling abilities of the memory-based approach, observing fast training and testing times, and linear scaling behavior in speed and memory costs. The system is released as an open source software package1, for which we provide a first reference guide.

Download Full-text

Efficiency of Machine Translation in Urban Discourse

Vestnik Volgogradskogo gosudarstvennogo universiteta Serija 2 Jazykoznanije ◽

10.15688/jvolsu2.2021.3.8 ◽

2021 ◽

pp. 87-98

Author(s):

Svetlana Korolkova ◽

◽

Anna Novozhilova ◽

Keyword(s):

Machine Translation ◽

Russian Language ◽

Target Language ◽

Translation System ◽

Web Browser ◽

Translation Quality ◽

Machine Translation System ◽

Language Quality ◽

Urban Discourse ◽

Website Content

This article aims to analyze the use of Yandex.Translate, an online machine translation system, in translating urban discourse texts on the web. The authors use integrative linguistic-and-pragmatic approach to assess machine translation quality in a global digital setting. The aim is to show the efficiency of a state-of-the-art machine translation system and to investigate its usefulness in practical application. The authors perform a detailed analysis of the Paris city website content, which is automatically translated from French into Russian with Yandex.Translate. The data selection is justified by the absence of official foreign versions of this website, which points to the need of machine translation engines integrated in a web browser. Less than 20% of the analysed machine-translated texts demonstrate high language quality, whereas 60% can be referred to as acceptable – the text preserves the meaning of the source but contains some errors and inaccuracies in the target language. About 20% of the machine-translated text contains blunders, which violate Russian language norms. It causes source text contents distortion and communication failures. In the end, a classification of the system errors is presented. It is also concluded that machine translation would substitute middle-skilled human translators in the future. However, the use of such systems will enforce standardisation and simplification of the target language.

Download Full-text

Sharing Attention Weights for Fast Transformer

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/735 ◽

2019 ◽

Cited By ~ 1

Author(s):

Tong Xiao ◽

Yinqiao Li ◽

Jingbo Zhu ◽

Zhengtao Yu ◽

Tongran Liu

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Experimental Results ◽

Target Language ◽

Translation System ◽

Attention Model ◽

Machine Translation System ◽

Speed Up ◽

Auto Regressive ◽

Hidden States

Recently, the Transformer machine translation system has shown strong results by stacking attention layers on both the source and target-language sides. But the inference of this model is slow due to the heavy use of dot-product attention in auto-regressive decoding. In this paper we speed up Transformer via a fast and lightweight attention model. More specifically, we share attention weights in adjacent layers and enable the efficient re-use of hidden states in a vertical manner. Moreover, the sharing policy can be jointly learned with the MT model. We test our approach on ten WMT and NIST OpenMT tasks. Experimental results show that it yields an average of 1.3X speed-up (with almost no decrease in BLEU) on top of a state-of-the-art implementation that has already adopted a cache for fast inference. Also, our approach obtains a 1.8X speed-up when it works with the AAN model. This is even 16 times faster than the baseline with no use of the attention cache.

Download Full-text