Integrating Knowledge Encoded by Linguistic Phenomena of Indian Languages with Neural Machine Translation

Machine Translation is an effort to bridge language barriers and misinterpretations, making communication more convenient through the automatic translation of languages. The quality of translations produced by corpus-based approaches predominantly depends on the availability of a large parallel corpus. Although machine translation of many Indian languages has progressively gained attention, there is very limited research on machine translation and the challenges of using various machine translation techniques for a low-resource language such as Mizo. In this article, we have implemented and compared statistical-based approaches with modern neural-based approaches for the English–Mizo language pair. We have experimented with different tokenization methods, architectures, and configurations. The performance of translations predicted by the trained models has been evaluated using automatic and human evaluation measures. Furthermore, we have analyzed the prediction errors of the models and the quality of predictions based on variations in sentence length and compared the model performance with the existing baselines.

Download Full-text

Experience of neural machine translation between Indian languages

Machine Translation ◽

10.1007/s10590-021-09263-3 ◽

2021 ◽

Author(s):

Shubham Dewangan ◽

Shreya Alva ◽

Nitish Joshi ◽

Pushpak Bhattacharyya

Keyword(s):

Machine Translation ◽

Indian Languages ◽

Neural Machine Translation

Download Full-text

Neural Machine Translation for Indian Languages

Journal of Intelligent Systems ◽

10.1515/jisys-2018-0065 ◽

2019 ◽

Vol 28 (3) ◽

pp. 465-477 ◽

Cited By ~ 3

Author(s):

Amarnath Pathak ◽

Partha Pakray

Keyword(s):

Neural Network ◽

Machine Translation ◽

Indian Languages ◽

Indian Language ◽

Neural Machine Translation ◽

Parallel Corpora ◽

Parallel Corpus ◽

Linguistic Resources ◽

Linguistic Backgrounds

Abstract Machine Translation bridges communication barriers and eases interaction among people having different linguistic backgrounds. Machine Translation mechanisms exploit a range of techniques and linguistic resources for translation prediction. Neural machine translation (NMT), in particular, seeks optimality in translation through training of neural network, using a parallel corpus having a considerable number of instances in the form of a parallel running source and target sentences. Easy availability of parallel corpora for major Indian language forms and the ability of NMT systems to better analyze context and produce fluent translation make NMT a prominent choice for the translation of Indian languages. We have trained, tested, and analyzed NMT systems for English to Tamil, English to Hindi, and English to Punjabi translations. Predicted translations have been evaluated using Bilingual Evaluation Understudy and by human evaluators to assess the quality of translation in terms of its adequacy, fluency, and correspondence with human-predicted translation.

Download Full-text

Improving neural machine translation for low-resource Indian languages using rule-based feature extraction

Neural Computing and Applications ◽

10.1007/s00521-020-04990-9 ◽

2020 ◽

Author(s):

Muskaan Singh ◽

Ravinder Kumar ◽

Inderveer Chana

Keyword(s):

Feature Extraction ◽

Machine Translation ◽

Indian Languages ◽

Rule Based ◽

Neural Machine Translation ◽

Low Resource

Download Full-text

Neural Machine Translation System for English to Indian Language Translation Using MTIL Parallel Corpus

Journal of Intelligent Systems ◽

10.1515/jisys-2019-2510 ◽

2019 ◽

Vol 28 (3) ◽

pp. 387-398 ◽

Cited By ~ 1

Author(s):

B. Premjith ◽

M. Anand Kumar ◽

K.P. Soman

Keyword(s):

Neural Networks ◽

Machine Translation ◽

Deep Neural Networks ◽

Short Term Memory ◽

Language Translation ◽

Translation System ◽

Indian Languages ◽

Indian Language ◽

Neural Machine Translation ◽

Parallel Corpora

Abstract Introduction of deep neural networks to the machine translation research ameliorated conventional machine translation systems in multiple ways, specifically in terms of translation quality. The ability of deep neural networks to learn a sensible representation of words is one of the major reasons for this improvement. Despite machine translation using deep neural architecture is showing state-of-the-art results in translating European languages, we cannot directly apply these algorithms in Indian languages mainly because of two reasons: unavailability of the good corpus and Indian languages are morphologically rich. In this paper, we propose a neural machine translation (NMT) system for four language pairs: English–Malayalam, English–Hindi, English–Tamil, and English–Punjabi. We also collected sentences from different sources and cleaned them to make four parallel corpora for each of the language pairs, and then used them to model the translation system. The encoder network in the NMT architecture was designed with long short-term memory (LSTM) networks and bi-directional recurrent neural networks (Bi-RNN). Evaluation of the obtained models was performed both automatically and manually. For automatic evaluation, the bilingual evaluation understudy (BLEU) score was used, and for manual evaluation, three metrics such as adequacy, fluency, and overall ranking were used. Analysis of the results showed the presence of lengthy sentences in English–Malayalam, and the English–Hindi corpus affected the translation. Attention mechanism was employed with a view to addressing the problem of translating lengthy sentences (sentences contain more than 50 words), and the system was able to perceive long-term contexts in the sentences.

Download Full-text