Word-level confidence estimation for machine translation using phrase-based translation models

AbstractThis paper presents two novel ideas of improving the Machine Translation (MT) quality by applying the word-level quality prediction for the second pass of decoding. In this manner, the word scores estimated by word confidence estimation systems help to reconsider the MT hypotheses for selecting a better candidate rather than accepting the current sub-optimal one. In the first attempt, the selection scope is limited to the MTN-best list, in which our proposed re-ranking features are combined with those of the decoder for re-scoring. Then, the search space is enlarged over the entire search graph, storing many more hypotheses generated during the first pass of decoding. Over all paths containing words of theN-best list, we propose an algorithm to strengthen or weaken them depending on the estimated word quality. In both methods, the highest score candidate after the search becomes the official translation. The results obtained show that both approaches advance the MT quality over the one-pass baseline, and the search graph re-decoding achieves more gains (in BLEU score) thanN-best List Re-ranking method.

Download Full-text

Word-Level Confidence Estimation for Machine Translation

Computational Linguistics ◽

10.1162/coli.2007.33.1.9 ◽

2007 ◽

Vol 33 (1) ◽

pp. 9-40 ◽

Cited By ~ 35

Author(s):

Nicola Ueffing ◽

Hermann Ney

Keyword(s):

Machine Translation ◽

Posterior Probability ◽

Direct Methods ◽

Translation System ◽

Posterior Probabilities ◽

Confidence Estimation ◽

Confidence Measures ◽

Word Level ◽

System Output ◽

Translation Systems

This article introduces and evaluates several different word-level confidence measures for machine translation. These measures provide a method for labeling each word in an automatically generated translation as correct or incorrect. All approaches to confidence estimation presented here are based on word posterior probabilities. Different concepts of word posterior probabilities as well as different ways of calculating them will be introduced and compared. They can be divided into two categories: System-based methods that explore knowledge provided by the translation system that generated the translations, and direct methods that are independent of the translation system. The system-based techniques make use of system output, such as word graphs or N-best lists. The word posterior probability is determined by summing the probabilities of the sentences in the translation hypothesis space that contains the target word. The direct confidence measures take other knowledge sources, such as word or phrase lexica, into account. They can be applied to output from nonstatistical machine translation systems as well. Experimental assessment of the different confidence measures on various translation tasks and in several language pairs will be presented. Moreover,the application of confidence measures for rescoring of translation hypotheses will be investigated.

Download Full-text

Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation

International Journal of Asian Language Processing ◽

10.1142/s2717554520500174 ◽

2021 ◽

pp. 2050017

Author(s):

Rashmini Naranpanawa ◽

Ravinga Perera ◽

Thilakshi Fonseka ◽

Uthayasanker Thayasivam

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Translation System ◽

Rare Word ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Low Resource ◽

Word Level ◽

Morphologically Rich Languages

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.

Download Full-text

Confidence estimation for machine translation

10.3115/1220355.1220401 ◽

2004 ◽

Cited By ~ 52

Author(s):

John Blatz ◽

Erin Fitzgerald ◽

George Foster ◽

Simona Gandrabur ◽

Cyril Goutte ◽

...

Keyword(s):

Machine Translation ◽

Confidence Estimation

Download Full-text

UAlacant word-level machine translation quality estimation system at WMT 2015

10.18653/v1/w15-3036 ◽

2015 ◽

Cited By ~ 1

Author(s):

Miquel Esplà-Gomis ◽

Felipe Sánchez-Martínez ◽

Mikel Forcada

Keyword(s):

Machine Translation ◽

Quality Estimation ◽

Translation Quality ◽

Word Level ◽

Estimation System

Download Full-text

Using sub-word-level information for confidence estimation with conditional random field models

10.21437/interspeech.2012-613 ◽

2012 ◽

Author(s):

Matthew S. Seigel ◽

Phillip C. Woodland

Keyword(s):

Random Field ◽

Conditional Random Field ◽

Confidence Estimation ◽

Word Level ◽

Level Information

Download Full-text

Multi-Domain Neural Machine Translation with Word-Level Adaptive Layer-wise Domain Mixing

10.18653/v1/2020.acl-main.165 ◽

2020 ◽

Author(s):

Haoming Jiang ◽

Chen Liang ◽

Chong Wang ◽

Tuo Zhao

Keyword(s):

Machine Translation ◽

Neural Machine Translation ◽

Word Level

Download Full-text

Hybrid Combination of Machine Translation with Part-of-Speech Analysis

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.416-417.1552 ◽

2013 ◽

Vol 416-417 ◽

pp. 1552-1557

Author(s):

Xiao Xu Hu

Keyword(s):

Machine Translation ◽

Linguistic Knowledge ◽

Main Method ◽

Advantages And Disadvantages ◽

Part Of Speech ◽

Word Level ◽

The Arts ◽

Sentence Level ◽

Hybrid Framework ◽

The Rich

Hypothesis combination is a main method to improve the performance of machine translation (MT) system. The state-of-the-arts strategies include sentence-level and word-level methods, which has its own advantages and disadvantages. And, the current strategies mainly depends on the statistical method with little guidance from the rich linguistic knowledge. This paper propose hybrid framework to combine the ability of the sentence-level and word-level methods. In word-level stage, the method select the well translated words according to its part-of-speech and translation ability of this part-of-speech of the MT system which generate this word. The experimental results with different MT systems proves the effectiveness of this approach.

Download Full-text

Word-Level Error Correction in Non-autoregressive Neural Machine Translation

Communications in Computer and Information Science - Neural Information Processing ◽

10.1007/978-3-030-63820-7_83 ◽

2020 ◽

pp. 726-733

Author(s):

Ziyue Guo ◽

Hongxu Hou ◽

Nier Wu ◽

Shuo Sun

Keyword(s):

Error Correction ◽

Machine Translation ◽

Neural Machine Translation ◽

Word Level

Download Full-text

Word-level confidence estimation for machine translation using phrase-based translation models

Word-Level Confidence Estimation for Statistical Machine Translation Using IBM-1 Model

Find the errors, get the better: Enhancing machine translation via word confidence estimation