Word-Level Confidence Estimation for Statistical Machine Translation Using IBM-1 Model

2021 ◽

pp. 2050017

Author(s):

Rashmini Naranpanawa ◽

Ravinga Perera ◽

Thilakshi Fonseka ◽

Uthayasanker Thayasivam

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Translation System ◽

Rare Word ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Low Resource ◽

Word Level ◽

Morphologically Rich Languages

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.

Download Full-text

Neural text normalization with adapted decoding and POS features

Natural Language Engineering ◽

10.1017/s1351324919000391 ◽

2019 ◽

Vol 25 (5) ◽

pp. 585-605

Author(s):

T. Ruzsics ◽

M. Lusetti ◽

A. Göhring ◽

T. Samardžić ◽

E. Stark

Keyword(s):

Machine Translation ◽

Computer Mediated Communication ◽

Statistical Machine Translation ◽

Language Model ◽

Translation System ◽

Swiss German ◽

Part Of Speech ◽

Word Level ◽

Speech Transcription ◽

Text Normalization

AbstractText normalization is the task of mapping noncanonical language, typical of speech transcription and computer-mediated communication, to a standardized writing. This task is especially important for languages such as Swiss German, with strong regional variation and no written standard. In this paper, we propose a novel solution for normalizing Swiss German WhatsApp messages using the encoder–decoder neural machine translation (NMT) framework. We enhance the performance of a plain character-level NMT model with the integration of a word-level language model and linguistic features in the form of part-of-speech (POS) tags. The two components are intended to improve the performance by addressing two specific issues: the former is intended to improve the fluency of the predicted sequences, whereas the latter aims at resolving cases of word-level ambiguity. Our systematic comparison shows that our proposed solution results in an improvement over a plain NMT system and also over a comparable character-level statistical machine translation system, considered the state of the art in this task till recently. We perform a thorough analysis of the compared systems’ output, showing that our two components produce indeed the intended, complementary improvements.

Download Full-text

Word-level confidence estimation for machine translation using phrase-based translation models

10.3115/1220575.1220671 ◽

2005 ◽

Cited By ~ 8

Author(s):

Nicola Ueffing ◽

Hermann Ney

Keyword(s):

Machine Translation ◽

Confidence Estimation ◽

Word Level

Download Full-text

Source Language Adaptation Approaches for Resource-Poor Machine Translation

Computational Linguistics ◽

10.1162/coli_a_00248 ◽

2016 ◽

Vol 42 (2) ◽

pp. 277-306 ◽

Cited By ~ 8

Author(s):

Pidong Wang ◽

Preslav Nakov ◽

Hwee Tou Ng

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Target Language ◽

Source Language ◽

World Languages ◽

Word Level ◽

Resource Poor ◽

Morphological Variants ◽

Cross Lingual ◽

Translation Systems

Most of the world languages are resource-poor for statistical machine translation; still, many of them are actually related to some resource-rich language. Thus, we propose three novel, language-independent approaches to source language adaptation for resource-poor statistical machine translation. Specifically, we build improved statistical machine translation models from a resource-poor language POOR into a target language TGT by adapting and using a large bitext for a related resource-rich language RICH and the same target language TGT. We assume a small POOR–TGT bitext from which we learn word-level and phrase-level paraphrases and cross-lingual morphological variants between the resource-rich and the resource-poor language. Our work is of importance for resource-poor machine translation because it can provide a useful guideline for people building machine translation systems for resource-poor languages. Our experiments for Indonesian/Malay–English translation show that using the large adapted resource-rich bitext yields 7.26 BLEU points of improvement over the unadapted one and 3.09 BLEU points over the original small bitext. Moreover, combining the small POOR–TGT bitext with the adapted bitext outperforms the corresponding combinations with the unadapted bitext by 1.93–3.25 BLEU points. We also demonstrate the applicability of our approaches to other languages and domains.

Download Full-text

Find the errors, get the better: Enhancing machine translation via word confidence estimation

Natural Language Engineering ◽

10.1017/s1351324917000080 ◽

2017 ◽

Vol 23 (4) ◽

pp. 617-639 ◽

Cited By ~ 2

Author(s):

NGOC-QUANG LUONG ◽

LAURENT BESACIER ◽

BENJAMIN LECOUTEUX

Keyword(s):

Machine Translation ◽

Search Space ◽

Ranking Method ◽

Quality Prediction ◽

Confidence Estimation ◽

Word Level ◽

First Pass ◽

The One ◽

Search Graph

AbstractThis paper presents two novel ideas of improving the Machine Translation (MT) quality by applying the word-level quality prediction for the second pass of decoding. In this manner, the word scores estimated by word confidence estimation systems help to reconsider the MT hypotheses for selecting a better candidate rather than accepting the current sub-optimal one. In the first attempt, the selection scope is limited to the MTN-best list, in which our proposed re-ranking features are combined with those of the decoder for re-scoring. Then, the search space is enlarged over the entire search graph, storing many more hypotheses generated during the first pass of decoding. Over all paths containing words of theN-best list, we propose an algorithm to strengthen or weaken them depending on the estimated word quality. In both methods, the highest score candidate after the search becomes the official translation. The results obtained show that both approaches advance the MT quality over the one-pass baseline, and the search graph re-decoding achieves more gains (in BLEU score) thanN-best List Re-ranking method.

Download Full-text

Word-Level Confidence Estimation for Machine Translation

Computational Linguistics ◽

10.1162/coli.2007.33.1.9 ◽

2007 ◽

Vol 33 (1) ◽

pp. 9-40 ◽

Cited By ~ 35

Author(s):

Nicola Ueffing ◽

Hermann Ney

Keyword(s):

Machine Translation ◽

Posterior Probability ◽

Direct Methods ◽

Translation System ◽

Posterior Probabilities ◽

Confidence Estimation ◽

Confidence Measures ◽

Word Level ◽

System Output ◽

Translation Systems

This article introduces and evaluates several different word-level confidence measures for machine translation. These measures provide a method for labeling each word in an automatically generated translation as correct or incorrect. All approaches to confidence estimation presented here are based on word posterior probabilities. Different concepts of word posterior probabilities as well as different ways of calculating them will be introduced and compared. They can be divided into two categories: System-based methods that explore knowledge provided by the translation system that generated the translations, and direct methods that are independent of the translation system. The system-based techniques make use of system output, such as word graphs or N-best lists. The word posterior probability is determined by summing the probabilities of the sentences in the translation hypothesis space that contains the target word. The direct confidence measures take other knowledge sources, such as word or phrase lexica, into account. They can be applied to output from nonstatistical machine translation systems as well. Experimental assessment of the different confidence measures on various translation tasks and in several language pairs will be presented. Moreover,the application of confidence measures for rescoring of translation hypotheses will be investigated.

Download Full-text

Estimating word-level quality of statistical machine translation output using monolingual information alone

Natural Language Engineering ◽

10.1017/s1351324919000111 ◽

2019 ◽

Vol 26 (1) ◽

pp. 73-94

Author(s):

Arda Tezcan ◽

Véronique Hoste ◽

Lieve Macken

Keyword(s):

Machine Translation ◽

Network Architecture ◽

State Of The Art ◽

Statistical Machine Translation ◽

Neural Network Architecture ◽

Quality Estimation ◽

Word Level ◽

Syntactic Features ◽

Grammatical Errors

AbstractVarious studies show that statistical machine translation (SMT) systems suffer from fluency errors, especially in the form of grammatical errors and errors related to idiomatic word choices. In this study, we investigate the effectiveness of using monolingual information contained in the machine-translated text to estimate word-level quality of SMT output. We propose a recurrent neural network architecture which uses morpho-syntactic features and word embeddings as word representations within surface and syntactic n-grams. We test the proposed method on two language pairs and for two tasks, namely detecting fluency errors and predicting overall post-editing effort. Our results show that this method is effective for capturing all types of fluency errors at once. Moreover, on the task of predicting post-editing effort, while solely relying on monolingual information, it achieves on-par results with the state-of-the-art quality estimation systems which use both bilingual and monolingual information.

Download Full-text

Factored Statistical Machine Translation for German-English

Journal of Applied Information, Communication and Technology ◽

10.33555/ejaict.v5i1.47 ◽

2018 ◽

Vol 5 (1) ◽

pp. 37-45

Author(s):

Darryl Yunus Sulistyan

Keyword(s):

Machine Translation ◽

English Language ◽

Statistical Machine Translation ◽

New Model ◽

Language Pair

Machine Translation is a machine that is going to automatically translate given sentences in a language to other particular language. This paper aims to test the effectiveness of a new model of machine translation which is factored machine translation. We compare the performance of the unfactored system as our baseline compared to the factored model in terms of BLEU score. We test the model in German-English language pair using Europarl corpus. The tools we are using is called MOSES. It is freely downloadable and use. We found, however, that the unfactored model scored over 24 in BLEU and outperforms the factored model which scored below 24 in BLEU for all cases. In terms of words being translated, however, all of factored models outperforms the unfactored model.

Download Full-text

Proceedings of the Workshop on Statistical Machine Translation - StatMT '06

10.3115/1654650 ◽

2006 ◽

Cited By ~ 1

Keyword(s):

Machine Translation ◽

Statistical Machine Translation

Download Full-text

Proceedings of the Second Workshop on Statistical Machine Translation - StatMT '07

10.3115/1626355 ◽

2007 ◽

Cited By ~ 1

Keyword(s):

Machine Translation ◽

Statistical Machine Translation

Download Full-text

Word-Level Confidence Estimation for Statistical Machine Translation Using IBM-1 Model

Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation

Neural text normalization with adapted decoding and POS features

Word-level confidence estimation for machine translation using phrase-based translation models

Source Language Adaptation Approaches for Resource-Poor Machine Translation

Find the errors, get the better: Enhancing machine translation via word confidence estimation

Word-Level Confidence Estimation for Machine Translation

Estimating word-level quality of statistical machine translation output using monolingual information alone

Factored Statistical Machine Translation for German-English

Proceedings of the Workshop on Statistical Machine Translation - StatMT '06

Proceedings of the Second Workshop on Statistical Machine Translation - StatMT '07

Export Citation Format