scholarly journals Find the errors, get the better: Enhancing machine translation via word confidence estimation

2017 ◽  
Vol 23 (4) ◽  
pp. 617-639 ◽  
Author(s):  
NGOC-QUANG LUONG ◽  
LAURENT BESACIER ◽  
BENJAMIN LECOUTEUX

AbstractThis paper presents two novel ideas of improving the Machine Translation (MT) quality by applying the word-level quality prediction for the second pass of decoding. In this manner, the word scores estimated by word confidence estimation systems help to reconsider the MT hypotheses for selecting a better candidate rather than accepting the current sub-optimal one. In the first attempt, the selection scope is limited to the MTN-best list, in which our proposed re-ranking features are combined with those of the decoder for re-scoring. Then, the search space is enlarged over the entire search graph, storing many more hypotheses generated during the first pass of decoding. Over all paths containing words of theN-best list, we propose an algorithm to strengthen or weaken them depending on the estimated word quality. In both methods, the highest score candidate after the search becomes the official translation. The results obtained show that both approaches advance the MT quality over the one-pass baseline, and the search graph re-decoding achieves more gains (in BLEU score) thanN-best List Re-ranking method.

2007 ◽  
Vol 33 (1) ◽  
pp. 9-40 ◽  
Author(s):  
Nicola Ueffing ◽  
Hermann Ney

This article introduces and evaluates several different word-level confidence measures for machine translation. These measures provide a method for labeling each word in an automatically generated translation as correct or incorrect. All approaches to confidence estimation presented here are based on word posterior probabilities. Different concepts of word posterior probabilities as well as different ways of calculating them will be introduced and compared. They can be divided into two categories: System-based methods that explore knowledge provided by the translation system that generated the translations, and direct methods that are independent of the translation system. The system-based techniques make use of system output, such as word graphs or N-best lists. The word posterior probability is determined by summing the probabilities of the sentences in the translation hypothesis space that contains the target word. The direct confidence measures take other knowledge sources, such as word or phrase lexica, into account. They can be applied to output from nonstatistical machine translation systems as well. Experimental assessment of the different confidence measures on various translation tasks and in several language pairs will be presented. Moreover,the application of confidence measures for rescoring of translation hypotheses will be investigated.


2020 ◽  
Vol 51 (3) ◽  
pp. 544-560 ◽  
Author(s):  
Kimberly A. Murphy ◽  
Emily A. Diehm

Purpose Morphological interventions promote gains in morphological knowledge and in other oral and written language skills (e.g., phonological awareness, vocabulary, reading, and spelling), yet we have a limited understanding of critical intervention features. In this clinical focus article, we describe a relatively novel approach to teaching morphology that considers its role as the key organizing principle of English orthography. We also present a clinical example of such an intervention delivered during a summer camp at a university speech and hearing clinic. Method Graduate speech-language pathology students provided a 6-week morphology-focused orthographic intervention to children in first through fourth grade ( n = 10) who demonstrated word-level reading and spelling difficulties. The intervention focused children's attention on morphological families, teaching how morphology is interrelated with phonology and etymology in English orthography. Results Comparing pre- and posttest scores, children demonstrated improvement in reading and/or spelling abilities, with the largest gains observed in spelling affixes within polymorphemic words. Children and their caregivers reacted positively to the intervention. Therefore, data from the camp offer preliminary support for teaching morphology within the context of written words, and the intervention appears to be a feasible approach for simultaneously increasing morphological knowledge, reading, and spelling. Conclusion Children with word-level reading and spelling difficulties may benefit from a morphology-focused orthographic intervention, such as the one described here. Research on the approach is warranted, and clinicians are encouraged to explore its possible effectiveness in their practice. Supplemental Material https://doi.org/10.23641/asha.12290687


1993 ◽  
Vol 8 (1-2) ◽  
pp. 49-58 ◽  
Author(s):  
Pamela W. Jordan ◽  
Bonnie J. Dorr ◽  
John W. Benoit

Author(s):  
Rashmini Naranpanawa ◽  
Ravinga Perera ◽  
Thilakshi Fonseka ◽  
Uthayasanker Thayasivam

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.


2020 ◽  
Vol 30 (01) ◽  
pp. 2050002
Author(s):  
Taichi Aida ◽  
Kazuhide Yamamoto

Current methods of neural machine translation may generate sentences with different levels of quality. Methods for automatically evaluating translation output from machine translation can be broadly classified into two types: a method that uses human post-edited translations for training an evaluation model, and a method that uses a reference translation that is the correct answer during evaluation. On the one hand, it is difficult to prepare post-edited translations because it is necessary to tag each word in comparison with the original translated sentences. On the other hand, users who actually employ the machine translation system do not have a correct reference translation. Therefore, we propose a method that trains the evaluation model without using human post-edited sentences and in the test set, estimates the quality of output sentences without using reference translations. We define some indices and predict the quality of translations with a regression model. For the quality of the translated sentences, we employ the BLEU score calculated from the number of word [Formula: see text]-gram matches between the translated sentence and the reference translation. After that, we compute the correlation between quality scores predicted by our method and BLEU actually computed from references. According to the experimental results, the correlation with BLEU is the highest when XGBoost uses all the indices. Moreover, looking at each index, we find that the sentence log-likelihood and the model uncertainty, which are based on the joint probability of generating the translated sentence, are important in BLEU estimation.


2020 ◽  
Vol 11 (5) ◽  
pp. 900-909 ◽  
Author(s):  
Aglaé Velasco Gonzalez ◽  
Dennis Görlich ◽  
Boris Buerke ◽  
Nico Münnich ◽  
Cristina Sauerland ◽  
...  

Abstract Complete recanalization after a single retrieval maneuver is an interventional goal in acute ischemic stroke and an independent factor for good clinical outcome. Anatomical biomarkers for predicting clot removal difficulties have not been comprehensively analyzed and await unused. We retrospectively evaluated 200 consecutive patients who suffered acute stroke and occlusion of the anterior circulation and were treated with mechanical thrombectomy through a balloon guide catheter (BGC). The primary objective was to evaluate the influence of carotid tortuosity and BGC positioning on the one-pass Modified Thrombolysis in Cerebral Infarction Scale (mTICI) 3 rate, and secondarily, the influence of communicating arteries on the angiographic results. After the first-pass mTICI 3, recanalization fell from 51 to 13%. The regression models and decision tree (supervised machine learning) results concurred: carotid tortuosity was the main constraint on efficacy, reducing the likelihood of mTICI 3 after one pass to 30%. BGC positioning was relevant only in carotid arteries without elongation: BGCs located in the distal internal carotid artery (ICA) had a 70% probability of complete recanalization after one pass, dropping to 43% if located in the proximal ICA. These findings demonstrate that first-pass mTICI 3 is influenced by anatomical and interventional factors capable of being anticipated, enabling the BGC technique to be adapted to patient’s anatomy to enhance effectivity.


Author(s):  
John Blatz ◽  
Erin Fitzgerald ◽  
George Foster ◽  
Simona Gandrabur ◽  
Cyril Goutte ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document