UNSTRUCTURED
Objective: To determine the linguistic/textual features of English health educational materials for predicting the probabilistic distribution of critical conceptual mistakes in neural machine translations (Google Translate: English to Chinese) of public-oriented online health resources on infectious diseases and viruses.
Methods: We collected 200 English source texts on infectious diseases and their human translations to Chinese from HON. Net certified health education websites. Human translations were compared with machine translations (Google Translate) by native Chinese speakers to identify critical conceptual mistakes. To overcome overfitting issues of machine learning with small, high-dimensional datasets, Bayesian machine learning classifiers (relevance vector machine, RVM) was trained (70% and 30% train/test data split; 5-fold cross-validation) on English source texts classified as linked or not with machine translation outputs containing critical conceptual mistakes, to identify possible source text features causing clinically significant machine translation errors. We compared the performance of RVM with the combined features through separate optimization (CFSO: 21), to RVM trained on the original combined features (OCF: 135) (20 structural; 115 semantic features), combined features through joint optimization (CFJO: 48); optimized structural features (OTF: 5), and optimized semantic features (OSF: 16). In addition, RVM (CFSO) was compared to classifiers using individual standard (currently available) parameters to measure English complexity (Flesch Reading Ease FRE; Gunning Fog Index - GFI; SMOG Readability Index-SMOG).
Results: The AUC, sensitivity, specificity and accuracy of RVM MLCs trained on different features sets were: CFSO (AUC: 0.685; sensitivity: 0.73, specificity: 0.63; accuracy: 0.68); OCF (AUC: 0.7; sensitivity: 0.42, specificity: 0.8; accuracy: 0.625); CFJO (AUC: 0.690; sensitivity: 0.54, specificity: 0.73; accuracy: 0.64); OTF (AUC: 0.587; sensitivity: 0.58, specificity: 0.53; accuracy: 0.55); OSF (AUC: 0.679; sensitivity: 0.58, specificity: 0.67; accuracy: 0.625). The best-performing model was RVM trained on the combined features through separate optimisation (CFSO) (16% of the original combined features). RVM (CFSO) outperformed binary classifiers (BCs) using standard English readability tests. The accuracy, sensitivity, specificity of the three BCs were FRE (accuracy 0.457; sensitivity 0.903, specificity 0.011); GFI (accuracy 0.5735; sensitivity 0.685, specificity 0.462); SMOG (accuracy 0.568; sensitivity 0.674, specificity 0.462).
Conclusion: Our study found that machine-generated Chinese medical translation errors were not caused by difficult medical jargon or a lack of readability of source language information. It was certain English structures (passive voices; sentences starting with conjunctions), semantic polysemy (different meanings of a word when used in common versus specialized domains) which tend to cause critical conceptual mistakes in neural machine translation systems (English to Chinese) of health education information on infectious diseases.