Predicting the Linguistic Accessibility of Chinese Health Translations: Using Machine Learning Algorithms (Preprint)
BACKGROUND Linguistic accessibility has important impact on the reception and utilisation of translated health resources among multicultural and multilingual populations. Linguistic understandability of health translation has been under-studied. OBJECTIVE Our study aimed to develop novel machine learning models for the study of the linguistic accessibility of health translations comparing Chinese translations of the World Health Organisation health materials with original Chinese health resources developed by the Chinese health authorities. METHODS Using natural language processing tools for the assessment of the readability of Chinese materials, we explored and compared the readability of Chinese health translations from the World Health Organisation with original Chinese materials from China Centre for Disease Control and Prevention. RESULTS Pairwise adjusted t test showed that three new machine learning models achieved statistically significant improvement over the baseline logistic regression in terms of AUC: C5.0 decision tree (p=0.000, 95% CI: -0.249, -0.152), random forest (p=0.000, 95% CI: 0.139, 0.239) and XGBoost Tree (p=0.000, 95% CI: 0.099, 0.193). There was however no significant difference between C5.0 decision tree and random forest (p=0.513). Extreme gradient boost tree was the best model having achieved statistically significant improvement over the C5.0 model (p=0.003) and the Random Forest model (p=0.006) at the adjusted Bonferroni p value at 0.008. CONCLUSIONS The development of machine learning algorithms significantly improved the accuracy and reliability of current approaches to the evaluation of the linguistic accessibility of Chinese health information, especially Chinese health translations in relation to original health resources. Although the new algorithms developed were based on Chinese health resources, they can be adapted for other languages to advance current research in accessible health translation, communication, and promotion.