Word Sense Based Hindi-Tamil Statistical Machine Translation

Natural Language Processing ◽

10.4018/978-1-7998-0951-7.ch021 ◽

2020 ◽

pp. 410-421

Author(s):

Vimal Kumar K. ◽

Divakar Yadav

Keyword(s):

Machine Translation ◽

Language Processing ◽

Semantic Analysis ◽

Translation System ◽

Great Success ◽

Source Language ◽

Additional Information ◽

Part Of Speech ◽

Preceding Word ◽

The One

Corpus based natural language processing has emerged with great success in recent years. It is not only used for languages like English, French, Spanish, and Hindi but also is widely used for languages like Tamil, Telugu etc. This paper focuses to increase the accuracy of machine translation from Hindi to Tamil by considering the word's sense as well as its part-of-speech. This system works on word by word translation from Hindi to Tamil language which makes use of additional information such as the preceding words, the current word's part of speech and the word's sense itself. For such a translation system, the frequency of words occurring in the corpus, the tagging of the input words and the probability of the preceding word of the tagged words are required. Wordnet is used to identify various synonym for the words specified in the source language. Among these words, the one which is more relevant to the word specified in source language is considered for the translation to target language. The introduction of the additional information such as part-of-speech tag, preceding word information and semantic analysis has greatly improved the accuracy of the system.

Download Full-text

Word Sense Based Hindi-Tamil Statistical Machine Translation

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2018010102 ◽

2018 ◽

Vol 14 (1) ◽

pp. 17-27

Author(s):

Vimal Kumar K. ◽

Divakar Yadav

Keyword(s):

Machine Translation ◽

Language Processing ◽

Semantic Analysis ◽

Target Language ◽

Translation System ◽

Great Success ◽

Source Language ◽

Additional Information ◽

Part Of Speech ◽

Preceding Word

Download Full-text

English-Dogri Translation System using MOSES

Circulation in Computer Science ◽

10.22632/ccs-2016-251-25 ◽

2016 ◽

Vol 1 (1) ◽

pp. 45-49

Author(s):

Avinash Singh ◽

Asmeet Kour ◽

Shubhnandan S. Jamwal

Keyword(s):

Natural Language Processing ◽

Machine Translation ◽

Language Processing ◽

Statistical Machine Translation ◽

Translation System ◽

Parallel Corpus ◽

English System ◽

Machine Translation System ◽

Translation Machine ◽

Language Pair

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.

Download Full-text

Efficient Embedded Decoding of Neural Network Language Models in a Machine Translation System

International Journal of Neural Systems ◽

10.1142/s0129065718500077 ◽

2018 ◽

Vol 28 (09) ◽

pp. 1850007

Author(s):

Francisco Zamora-Martinez ◽

Maria Jose Castro-Bleda

Keyword(s):

Neural Network ◽

Machine Translation ◽

Language Processing ◽

Traditional Approach ◽

Computational Cost ◽

Integrated Approach ◽

Language Models ◽

Translation System ◽

Neural Net ◽

Network Language

Neural Network Language Models (NNLMs) are a successful approach to Natural Language Processing tasks, such as Machine Translation. We introduce in this work a Statistical Machine Translation (SMT) system which fully integrates NNLMs in the decoding stage, breaking the traditional approach based on [Formula: see text]-best list rescoring. The neural net models (both language models (LMs) and translation models) are fully coupled in the decoding stage, allowing to more strongly influence the translation quality. Computational issues were solved by using a novel idea based on memorization and smoothing of the softmax constants to avoid their computation, which introduces a trade-off between LM quality and computational cost. These ideas were studied in a machine translation task with different combinations of neural networks used both as translation models and as target LMs, comparing phrase-based and [Formula: see text]-gram-based systems, showing that the integrated approach seems more promising for [Formula: see text]-gram-based systems, even with nonfull-quality NNLMs.

Download Full-text

Estimating Machine Translation Quality of Any Input Sentence

International Journal of Asian Language Processing ◽

10.1142/s2717554520500022 ◽

2020 ◽

Vol 30 (01) ◽

pp. 2050002

Author(s):

Taichi Aida ◽

Kazuhide Yamamoto

Keyword(s):

Machine Translation ◽

Evaluation Model ◽

Joint Probability ◽

Translation System ◽

Neural Machine Translation ◽

Translation Quality ◽

Machine Translation System ◽

Input Sentence ◽

The One

Current methods of neural machine translation may generate sentences with different levels of quality. Methods for automatically evaluating translation output from machine translation can be broadly classified into two types: a method that uses human post-edited translations for training an evaluation model, and a method that uses a reference translation that is the correct answer during evaluation. On the one hand, it is difficult to prepare post-edited translations because it is necessary to tag each word in comparison with the original translated sentences. On the other hand, users who actually employ the machine translation system do not have a correct reference translation. Therefore, we propose a method that trains the evaluation model without using human post-edited sentences and in the test set, estimates the quality of output sentences without using reference translations. We define some indices and predict the quality of translations with a regression model. For the quality of the translated sentences, we employ the BLEU score calculated from the number of word [Formula: see text]-gram matches between the translated sentence and the reference translation. After that, we compute the correlation between quality scores predicted by our method and BLEU actually computed from references. According to the experimental results, the correlation with BLEU is the highest when XGBoost uses all the indices. Moreover, looking at each index, we find that the sentence log-likelihood and the model uncertainty, which are based on the joint probability of generating the translated sentence, are important in BLEU estimation.

Download Full-text

A Knowledge-Based Machine Translation Using AI Technique

International Journal of Software Innovation ◽

10.4018/ijsi.2018070106 ◽

2018 ◽

Vol 6 (3) ◽

pp. 79-92

Author(s):

Sahar A. El-Rahman ◽

Tarek A. El-Shishtawy ◽

Raafat A. El-Kammar

Keyword(s):

Machine Translation ◽

Target Language ◽

Translation System ◽

Module Structure ◽

Source Language ◽

Word Category ◽

Knowledge Based ◽

Language Analysis ◽

Transfer Rules

This article presents a realistic technique for the machine aided translation system. In this technique, the system dictionary is partitioned into a multi-module structure for fast retrieval of Arabic features of English words. Each module is accessed through an interface that includes the necessary morphological rules, which directs the search toward the proper sub-dictionary. Another factor that aids fast retrieval of Arabic features of words is the prediction of the word category, and accesses its sub-dictionary to retrieve the corresponding attributes. The system consists of three main parts, which are the source language analysis, the transfer rules between source language (English) and target language (Arabic), and the generation of the target language. The proposed system is able to translate, some negative forms, demonstrations, and conjunctions, and also adjust nouns, verbs, and adjectives according their attributes. Then, it adds the symptom of Arabic words to generate a correct sentence.

Download Full-text

Semantic analysis in XTRA, an English — Chinese machine translation system

Computers and Translation ◽

10.1007/bf02055234 ◽

1988 ◽

Vol 3 (2) ◽

pp. 101-120 ◽

Cited By ~ 6

Author(s):

Xiuming Huang

Keyword(s):

Machine Translation ◽

Semantic Analysis ◽

Translation System ◽

Machine Translation System

Download Full-text

Neural text normalization with adapted decoding and POS features

Natural Language Engineering ◽

10.1017/s1351324919000391 ◽

2019 ◽

Vol 25 (5) ◽

pp. 585-605

Author(s):

T. Ruzsics ◽

M. Lusetti ◽

A. Göhring ◽

T. Samardžić ◽

E. Stark

Keyword(s):

Machine Translation ◽

Computer Mediated Communication ◽

Statistical Machine Translation ◽

Language Model ◽

Translation System ◽

Swiss German ◽

Part Of Speech ◽

Word Level ◽

Speech Transcription ◽

Text Normalization

AbstractText normalization is the task of mapping noncanonical language, typical of speech transcription and computer-mediated communication, to a standardized writing. This task is especially important for languages such as Swiss German, with strong regional variation and no written standard. In this paper, we propose a novel solution for normalizing Swiss German WhatsApp messages using the encoder–decoder neural machine translation (NMT) framework. We enhance the performance of a plain character-level NMT model with the integration of a word-level language model and linguistic features in the form of part-of-speech (POS) tags. The two components are intended to improve the performance by addressing two specific issues: the former is intended to improve the fluency of the predicted sequences, whereas the latter aims at resolving cases of word-level ambiguity. Our systematic comparison shows that our proposed solution results in an improvement over a plain NMT system and also over a comparable character-level statistical machine translation system, considered the state of the art in this task till recently. We perform a thorough analysis of the compared systems’ output, showing that our two components produce indeed the intended, complementary improvements.

Download Full-text

On the Linguistic Representational Power of Neural Machine Translation Models

Computational Linguistics ◽

10.1162/coli_a_00367 ◽

2020 ◽

Vol 46 (1) ◽

pp. 1-52

Author(s):

Yonatan Belinkov ◽

Nadir Durrani ◽

Fahim Dalvi ◽

Hassan Sajjad ◽

James Glass

Keyword(s):

Machine Translation ◽

Language Processing ◽

Lexical Semantics ◽

Linguistic Information ◽

Neural Machine Translation ◽

Recent Success ◽

Part Of Speech ◽

Morphologically Rich Languages ◽

Representational Power ◽

Semantic Dependencies

Despite the recent success of deep neural networks in natural language processing and other spheres of artificial intelligence, their interpretability remains a challenge. We analyze the representations learned by neural machine translation (NMT) models at various levels of granularity and evaluate their quality through relevant extrinsic properties. In particular, we seek answers to the following questions: (i) How accurately is word structure captured within the learned representations, which is an important aspect in translating morphologically rich languages? (ii) Do the representations capture long-range dependencies, and effectively handle syntactically divergent languages? (iii) Do the representations capture lexical semantics? We conduct a thorough investigation along several parameters: (i) Which layers in the architecture capture each of these linguistic phenomena; (ii) How does the choice of translation unit (word, character, or subword unit) impact the linguistic properties captured by the underlying representations? (iii) Do the encoder and decoder learn differently and independently? (iv) Do the representations learned by multilingual NMT models capture the same amount of linguistic information as their bilingual counterparts? Our data-driven, quantitative evaluation illuminates important aspects in NMT models and their ability to capture various linguistic phenomena. We show that deep NMT models trained in an end-to-end fashion, without being provided any direct supervision during the training process, learn a non-trivial amount of linguistic information. Notable findings include the following observations: (i) Word morphology and part-of-speech information are captured at the lower layers of the model; (ii) In contrast, lexical semantics or non-local syntactic and semantic dependencies are better represented at the higher layers of the model; (iii) Representations learned using characters are more informed about word-morphology compared to those learned using subword units; and (iv) Representations learned by multilingual models are richer compared to bilingual models.

Download Full-text

Natural language processing for similar languages, varieties, and dialects: A survey

Natural Language Engineering ◽

10.1017/s1351324920000492 ◽

2020 ◽

Vol 26 (6) ◽

pp. 595-612

Author(s):

Marcos Zampieri ◽

Preslav Nakov ◽

Yves Scherrer

Keyword(s):

Natural Language Processing ◽

Data Collection ◽

Natural Language ◽

Machine Translation ◽

Computational Methods ◽

Language Processing ◽

Language Varieties ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

AbstractThere has been a lot of recent interest in the natural language processing (NLP) community in the computational processing of language varieties and dialects, with the aim to improve the performance of applications such as machine translation, speech recognition, and dialogue systems. Here, we attempt to survey this growing field of research, with focus on computational methods for processing similar languages, varieties, and dialects. In particular, we discuss the most important challenges when dealing with diatopic language variation, and we present some of the available datasets, the process of data collection, and the most common data collection strategies used to compile datasets for similar languages, varieties, and dialects. We further present a number of studies on computational methods developed and/or adapted for preprocessing, normalization, part-of-speech tagging, and parsing similar languages, language varieties, and dialects. Finally, we discuss relevant applications such as language and dialect identification and machine translation for closely related languages, language varieties, and dialects.

Download Full-text

Design and Testing of Automatic Machine Translation System Based on Chinese-English Phrase Translation

Mobile Information Systems ◽

10.1155/2021/3539155 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Jing Ning ◽

Haidong Ban

Keyword(s):

Machine Translation ◽

Language Processing ◽

Evaluation System ◽

Large Scale ◽

Automatic Machine ◽

Translation System ◽

Automatic Translation ◽

Translation Methods ◽

Translation Systems ◽

Design And Testing

With the development of linguistics and the improvement of computer performance, the effect of machine translation is getting better and better, and it is widely used. The automatic expression translation method based on the Chinese-English machine takes short sentences as the basic translation unit and makes full use of the order of short sentences. Compared with word-based statistical machine translation methods, the effect is greatly improved. The performance of machine translation is constantly improving. This article aims to study the design of phrase-based automatic machine translation systems by introducing machine translation methods and Chinese-English phrase translation, explore the design and testing of machine automatic translation systems based on the combination of Chinese-English phrase translation, and explain the role of machine automatic translation in promoting the development of translation. In this article, through the combination of machine translation experiments and machine automatic translation system design methods, the design and testing of machine automatic translation systems based on Chinese-English phrase translation combinations are studied to cultivate people's understanding of language, knowledge, and intelligence and then help solve other problems. Language processing issues promote the development of corpus linguistics. The experimental results in this article show that when the Chinese-English phrase translation probability table is changed from 82% to 51%, the BLEU translation evaluation system for the combination of Chinese-English phrases is improved. Automatic machine translation saves time and energy of translation work, which shows that machine translation shows its advantages due to its short development cycle and easy processing of large-scale corpora.

Download Full-text