Optimizing Non-Decomposable Evaluation Metrics for Neural Machine Translation

2017 ◽  
Vol 32 (4) ◽  
pp. 796-804
Author(s):  
Shi-Qi Shen ◽  
Yang Liu ◽  
Mao-Song Sun
2021 ◽  
Author(s):  
Bo Sun ◽  
Fei Zhang ◽  
Jing Yuan ◽  
Zhao Wei ◽  
Shu Ting

Abstract Background: As people prefer to obtain medical knowledge online, medical intelligence question-answer systems based on question matching have attracted more and more attention, especially in China. However, due to the lack of paraphrase corpus of medical question, the development of this field is limited.Objective: We propose a method for paraphrase generation which suitable for the Chinese medical field and use deep learning models instead of artificial evaluation for the first time. The method is designed to be able to automatically construct high quality Chinese medical paraphrase.Methods: Validation experiments were carried out on two Chinese paraphrase data (one is general data, the other is medical data). Neural machine translation is used to generated paraphrase, that is, translate a sentence into other languages, and then reverse-translate it back to the original language to get the corresponding paraphrase. BLUE, ROUGEs, are used as quantitative evaluation metrics. Three deep text matching models are used to evaluate the generated paraphrase, instead of manual. Precision, Recall, F1 and AUC are used as qualitative evaluation metrics.Results: 49908 and 4062 paraphrases were generated on the two datasets, and the generated efficiency was 97.03% and 98.38%, respectively. For the data in the two fields, the generated and original paraphrase pairs are very similar at the quantitative and qualitative evaluation metrics, especially the medical field. Take medical data as example, BLUE of generated and original paraphrase pairs are 0.556 and 0.626, respectively; the mean difference of AUC between the two groups was 0.015. Conclusions: We first propose a paraphrase generation method based on neural machine translation and use deep text matching model instead of manual evaluation to evaluate the generated paraphrase. By analyzing the evaluation metrics, it can be concluded that:the paraphrase generated method has reached or even exceeded the level of artificial construction at the semantic level, especially in medical field; the deep text matching model can replace manual evaluation and realize automated paraphrase generation. This is of great significance to the development of Chinese medical paraphrase generation.


2019 ◽  
Vol 28 (4) ◽  
pp. 1-29 ◽  
Author(s):  
Michele Tufano ◽  
Cody Watson ◽  
Gabriele Bavota ◽  
Massimiliano Di Penta ◽  
Martin White ◽  
...  

Procedia CIRP ◽  
2021 ◽  
Vol 96 ◽  
pp. 9-14
Author(s):  
Uwe Dombrowski ◽  
Alexander Reiswich ◽  
Raphael Lamprecht

2020 ◽  
Vol 34 (4) ◽  
pp. 325-346
Author(s):  
John E. Ortega ◽  
Richard Castro Mamani ◽  
Kyunghyun Cho

Sign in / Sign up

Export Citation Format

Share Document