machine translation system
Recently Published Documents


TOTAL DOCUMENTS

456
(FIVE YEARS 115)

H-INDEX

10
(FIVE YEARS 4)

2022 ◽  
Vol 2022 ◽  
pp. 1-11
Author(s):  
Syed Abdul Basit Andrabi ◽  
Abdul Wahid

Machine translation is an ongoing field of research from the last decades. The main aim of machine translation is to remove the language barrier. Earlier research in this field started with the direct word-to-word replacement of source language by the target language. Later on, with the advancement in computer and communication technology, there was a paradigm shift to data-driven models like statistical and neural machine translation approaches. In this paper, we have used a neural network-based deep learning technique for English to Urdu languages. Parallel corpus sizes of around 30923 sentences are used. The corpus contains sentences from English-Urdu parallel corpus, news, and sentences which are frequently used in day-to-day life. The corpus contains 542810 English tokens and 540924 Urdu tokens, and the proposed system is trained and tested using 70 : 30 criteria. In order to evaluate the efficiency of the proposed system, several automatic evaluation metrics are used, and the model output is also compared with the output from Google Translator. The proposed model has an average BLEU score of 45.83.


Machine Translation is best alternative to traditional manual translation. The corpus of Sanskrit literature includes a rich tradition of philosophical and religious texts as well as poetry, music, drama, scientific, technical and other texts. Due to the modernization of tradition and languages, Sanskrit is not on everyone's lips. Translation makes it convenient for users to understand the unknown text. This paper presents a language Machine Translation System from Hindi to Sanskrit and Sanskrit to Hindi using a rule-based technique. We developed a machine translation tool 'anuvaad' which translates Sanskrit prose text into Hindi & vice versa. We also developed bi-lingual corpora to deal with Sanskrit and Hindi grammar rules and text applied rule based method to perform the translation. The experimental results on different 110 examples show that the proposed anuvaad tool achieves overall 93% accuracy for both types of translations. The objective of our work is to ensure confidentiality and multilingual support, which can be tedious and time consuming in case of manual translation.


2021 ◽  
Vol 11 (22) ◽  
pp. 10915
Author(s):  
Lanxin Zhao ◽  
Wanrong Gao ◽  
Jianbin Fang

The ability to automate machine translation has various applications in international commerce, medicine, travel, education, and text digitization. Due to the different grammar and lack of clear word boundaries in Chinese, it is challenging to conduct translation from word-based languages (e.g., English) to Chinese. This article has implemented a GPU-enabled deep learning machine translation system based on a domain-specific corpus. Our system takes an English text as input and uses an encoder-decoder model with an attention mechanism based on Google’s Transformer to translate the text to Chinese output. The model was trained using a simple self-designed entropy loss function and an Adam optimizer on English–Chinese bilingual text sentences from the News area of the UM-Corpus. The parallel training process of our model can be performed on common laptops, desktops, and servers with one or more GPUs. At training time, we not only track loss over training epochs but also measure the quality of our model’s translations with the BLEU score. We also provide an easy-to-use web interface for users so as to manage corpus, training projects, and trained models. The experimental results show that we can achieve a maximum BLEU score of 29.2. We can further improve this score by tuning other hyperparameters. The GPU-enabled model training runs over 15x faster than on a multi-core CPU, which facilitates us having a shorter turn-around time. As a case study, we compare the performance of our model to that of Baidu’s, which shows that our model can compete with the industry-level translation system. We argue that our deep-learning-based translation system is particularly suitable for teaching purposes and small/medium-sized enterprises.


2021 ◽  
Vol 3 (2) ◽  
pp. 34
Author(s):  
Zeshan Ali Ali

Urdu is Pakistan 's national language. However, Chinese expertise is very negligible in Pakistan and the Asian nations. Yet fewer research has been undertaken in the area of computer translation on Chinese to Urdu. In order to solve the above problems, we designed of an electronic dictionary for Chinese-Urdu, and studied the sentence-level machine translation technology which is based on deep learning. The Design of an electronic dictionary Chinese-Urdu machine translation system we collected and constructed an electronic dictionary containing 24000 entries from Chinese to Urdu. For Sentence we used English as an intermediate language, and based on the existing parallel corpus of Chinese to English and English to Urdu, we constructed a bilingual parallel corpus containing 66000 sentences from Chinese to Urdu. The Corpus has trained by using two NMT Models (LSTM,Transformer Model) and the above two translation model were compared to the desired translation, with the help of bilingual valuation understudy (BLEU) score.  On NMT, The LSTM Model is gain of 0.067 to 0.41 in BLEU score while on Transformer model, there is gain of 0.077 to 0.52 in BLEU which is better than from LSTM Model score. Furthermore, we compared the proposed model with Google and Microsoft translation.


2021 ◽  
Vol 2030 (1) ◽  
pp. 012098
Author(s):  
Ting Yang ◽  
Shinan Zhao ◽  
He Chen ◽  
Bo Chen

Author(s):  
Pavlo P. Maslianko ◽  
Yevhenii P. Sielskyi

Background. There are not many machine translation companies on the market whose products are in demand. These are, for example, free and commercial products such as “GoogleTranslate”, “DeepLTranslator”, “ModernMT”, “Apertium”, “Trident”, to name a few. To implement a more efficient and productive process for developing high-quality neural machine translation systems (NMTS), appropriate scientifically based methods of NMTS engineering are needed in order to get a high-quality and competitive product as quickly as possible. Objective. The purpose of this article is to apply the Eriksson-Penker business profile to the development and formalization of a method for system engineering of NMTS. Methods. The idea behind the neural machine translation system engineering method is to apply the Eriksson-Penker system engineering methodology and business profile to formalize an ordered way to develop NMT systems. Results. The method of developing NMT systems based on the use of system engineering techniques consists of three main stages. At the first stage, the structure of the NMT system is modelled in the form of an Eriksson-Penker business profile. At the second stage, a set of processes is determined that is specific to the class of Data Science systems, and the international CRISP-DM standard. At the third stage, verification and validation of the developed NMTS is carried out. Conclusions. The article proposes a method of system engineering of NMTS based on the modified Erickson-Penker business profile representation of the system at the meta-level, as well as international process standards of Data Science and Data Mining. The effectiveness of using this method was studied on the example of developing a bidirectional English-Ukrainian NMTS EUMT (English-Ukrainian Machine Translator) and it was found that the EUMT system is at least as good as the quality of English-Ukrainian translation of the popular Google Translate translator. The full version code of the EUMT system is published on the GitHub platform and is available at: https://github.com/EugeneSel/EUMT.


2021 ◽  
Vol 11 (16) ◽  
pp. 7662
Author(s):  
Yong-Seok Choi ◽  
Yo-Han Park ◽  
Seung Yun ◽  
Sang-Hun Kim ◽  
Kong-Joo Lee

Korean and Japanese have different writing scripts but share the same Subject-Object-Verb (SOV) word order. In this study, we pre-train a language-generation model using a Masked Sequence-to-Sequence pre-training (MASS) method on Korean and Japanese monolingual corpora. When building the pre-trained generation model, we allow the smallest number of shared vocabularies between the two languages. Then, we build an unsupervised Neural Machine Translation (NMT) system between Korean and Japanese based on the pre-trained generation model. Despite the different writing scripts and few shared vocabularies, the unsupervised NMT system performs well compared to other pairs of languages. Our interest is in the common characteristics of both languages that make the unsupervised NMT perform so well. In this study, we propose a new method to analyze cross-attentions between a source and target language to estimate the language differences from the perspective of machine translation. We calculate cross-attention measurements between Korean–Japanese and Korean–English pairs and compare their performances and characteristics. The Korean–Japanese pair has little difference in word order and a morphological system, and thus the unsupervised NMT between Korean and Japanese can be trained well even without parallel sentences and shared vocabularies.


Author(s):  
Svetlana Korolkova ◽  
◽  
Anna Novozhilova ◽  

This article aims to analyze the use of Yandex.Translate, an online machine translation system, in translating urban discourse texts on the web. The authors use integrative linguistic-and-pragmatic approach to assess machine translation quality in a global digital setting. The aim is to show the efficiency of a state-of-the-art machine translation system and to investigate its usefulness in practical application. The authors perform a detailed analysis of the Paris city website content, which is automatically translated from French into Russian with Yandex.Translate. The data selection is justified by the absence of official foreign versions of this website, which points to the need of machine translation engines integrated in a web browser. Less than 20% of the analysed machine-translated texts demonstrate high language quality, whereas 60% can be referred to as acceptable – the text preserves the meaning of the source but contains some errors and inaccuracies in the target language. About 20% of the machine-translated text contains blunders, which violate Russian language norms. It causes source text contents distortion and communication failures. In the end, a classification of the system errors is presented. It is also concluded that machine translation would substitute middle-skilled human translators in the future. However, the use of such systems will enforce standardisation and simplification of the target language.


Sign in / Sign up

Export Citation Format

Share Document