scholarly journals Word-level confidence estimation for machine translation using phrase-based translation models

Author(s):  
Nicola Ueffing ◽  
Hermann Ney
2017 ◽  
Vol 23 (4) ◽  
pp. 617-639 ◽  
Author(s):  
NGOC-QUANG LUONG ◽  
LAURENT BESACIER ◽  
BENJAMIN LECOUTEUX

AbstractThis paper presents two novel ideas of improving the Machine Translation (MT) quality by applying the word-level quality prediction for the second pass of decoding. In this manner, the word scores estimated by word confidence estimation systems help to reconsider the MT hypotheses for selecting a better candidate rather than accepting the current sub-optimal one. In the first attempt, the selection scope is limited to the MTN-best list, in which our proposed re-ranking features are combined with those of the decoder for re-scoring. Then, the search space is enlarged over the entire search graph, storing many more hypotheses generated during the first pass of decoding. Over all paths containing words of theN-best list, we propose an algorithm to strengthen or weaken them depending on the estimated word quality. In both methods, the highest score candidate after the search becomes the official translation. The results obtained show that both approaches advance the MT quality over the one-pass baseline, and the search graph re-decoding achieves more gains (in BLEU score) thanN-best List Re-ranking method.


2007 ◽  
Vol 33 (1) ◽  
pp. 9-40 ◽  
Author(s):  
Nicola Ueffing ◽  
Hermann Ney

This article introduces and evaluates several different word-level confidence measures for machine translation. These measures provide a method for labeling each word in an automatically generated translation as correct or incorrect. All approaches to confidence estimation presented here are based on word posterior probabilities. Different concepts of word posterior probabilities as well as different ways of calculating them will be introduced and compared. They can be divided into two categories: System-based methods that explore knowledge provided by the translation system that generated the translations, and direct methods that are independent of the translation system. The system-based techniques make use of system output, such as word graphs or N-best lists. The word posterior probability is determined by summing the probabilities of the sentences in the translation hypothesis space that contains the target word. The direct confidence measures take other knowledge sources, such as word or phrase lexica, into account. They can be applied to output from nonstatistical machine translation systems as well. Experimental assessment of the different confidence measures on various translation tasks and in several language pairs will be presented. Moreover,the application of confidence measures for rescoring of translation hypotheses will be investigated.


Author(s):  
Rashmini Naranpanawa ◽  
Ravinga Perera ◽  
Thilakshi Fonseka ◽  
Uthayasanker Thayasivam

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.


Author(s):  
John Blatz ◽  
Erin Fitzgerald ◽  
George Foster ◽  
Simona Gandrabur ◽  
Cyril Goutte ◽  
...  

2013 ◽  
Vol 416-417 ◽  
pp. 1552-1557
Author(s):  
Xiao Xu Hu

Hypothesis combination is a main method to improve the performance of machine translation (MT) system. The state-of-the-arts strategies include sentence-level and word-level methods, which has its own advantages and disadvantages. And, the current strategies mainly depends on the statistical method with little guidance from the rich linguistic knowledge. This paper propose hybrid framework to combine the ability of the sentence-level and word-level methods. In word-level stage, the method select the well translated words according to its part-of-speech and translation ability of this part-of-speech of the MT system which generate this word. The experimental results with different MT systems proves the effectiveness of this approach.


Sign in / Sign up

Export Citation Format

Share Document