Predicting Health Educational Material Understandability using Machine Learning Algorithms (Preprint)

Mapping Intimacies ◽

10.2196/preprints.28413 ◽

2021 ◽

Author(s):

Meng Ji ◽

Yanmeng Liu ◽

Mengdan Zhao ◽

Ziqing Lyu ◽

Boren Zhang ◽

...

Keyword(s):

Machine Learning ◽

Cost Effectiveness ◽

Health Literacy ◽

Health Education ◽

Health Information ◽

Learning Algorithms ◽

Medical Knowledge ◽

Machine Learning Algorithms ◽

Health Resource ◽

Education Resources

BACKGROUND Improving the understandability of health information can significantly increase the cost-effectiveness and efficiency of health education programs for vulnerable populations. There is a pressing need to develop clinically informed computerized tools to enable rapid, reliable assessment of the linguistic understandability of specialized health and medical education resources. OBJECTIVE This paper fills a critical gap in current patient-oriented health resource development, which requires reliable, accurate evaluation instruments to increase the efficiency, cost-effectiveness of health education resource evaluation. We aim to translate internationally endorsed clinical guidelines, Patient Education Materials Assessment Tool (PEMAT) to machine learning algorithms to facilitate the evaluation of the understandability of health resources for international students at Australian universities. METHODS Based on international patient health resource assessment guidelines, we developed machine learning algorithms to predict the linguistic understandability of health texts for Australian college students (aged 25-30) from non-English speaking backgrounds. We compared extreme gradient boosting, random forest, neural networks, C5 decision tree for automated health information understandability evaluation. The five machine learning models achieved statistically better results compared to the baseline logistic regression model. We also evaluated the impact of each linguistic feature on the performance of each of the five models. RESULTS It was found that information evidentness, relevance to educational purposes and logical sequence were consistently more important than numeracy skills and medical knowledge when assessing the linguistic understandability of health education resources for international tertiary students with adequate English skills (IELT test score mean 6.5) and high health literacy (mean 16.5 in the Short Assessment of Health Literacy-English test). The results challenged traditional views that lack of medical knowledge and numerical skills constituted the barriers to the understanding of health educational materials. CONCLUSIONS Machine learning algorithms were developed to predict health information understandability for international college students aged 25-30. 13 natural language features and 5 evaluation dimensions were identified and compared in terms of their impact on the performance of the models. Health information understandability varies according to the demographic profiles of the target readers, and for international tertiary students, improving health information evidentness, relevance and logic is critical.

Download Full-text

Predicting Health Material Cognitive Accessibility Using Multidimensional Semantic Features and Readability Tools as Predicators (Preprint)

10.2196/preprints.29175 ◽

2021 ◽

Author(s):

Meng Ji ◽

Yanmeng Liu ◽

Tianyong Hao

Keyword(s):

Machine Learning ◽

Health Education ◽

Health Information ◽

Domain Knowledge ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Semantic Features ◽

Integrated Models ◽

Advanced Education ◽

Cognitive Accessibility

BACKGROUND Much of current health information understandability research uses medical readability formula (MRF) to assess the cognitive difficulty of health education resources. This is based on an implicit assumption that medical domain knowledge represented by uncommon words or jargons form the sole barriers to health information access among the public. Our study challenged this by showing that for readers from non-English speaking backgrounds with higher education attainment, semantic features of English health texts rather than medical jargons can explain the lack of cognitive access of health materials among readers with better understanding of health terms, yet limited exposure to English health education materials. OBJECTIVE Our study explored combined MRF and multidimensional semantic features (MSF) for developing machine learning algorithms to predict the actual level of cognitive accessibility of English health materials on health risks and diseases for specific populations. We compare algorithms to evaluate the cognitive accessibility of specialised health information for non-native English speaker with advanced education levels yet very limited exposure to English health education environments. METHODS We used 108 semantic features to measure the content complexity and accessibility of original English resources. Using 1000 English health texts collected from international health organization websites, rated by international tertiary students, we compared machine learning (decision tree, SVM, discriminant analysis, ensemble tree and logistic regression) after automatic hyperparameter optimization (grid search for the best combination of hyperparameters of minimal classification errors). We applied 10-fold cross-validation on the whole dataset for the model training and testing, calculated the AUC, sensitivity, specificity, and accuracy as the measured of the model performance. RESULTS Using two sets of predictor features: widely tested MRF and MSF proposed in our study, we developed and compared three sets of machine learning algorithms: the first set of algorithms used MRF as predictors only, the second set of algorithms used MSF as predictors only, and the last set of algorithms used both MRF and MSF as integrated models. The results showed that the integrated models outperformed in terms of AUC, sensitivity, accuracy, and specificity. CONCLUSIONS Our study showed that cognitive accessibility of English health texts is not limited to word length and sentence length conventionally measured by MRF. We compared machine learning algorithms combing MRF and MSF to explore the cognitive accessibility of health information from syntactic and semantic perspectives. The results showed the strength of integrated models in terms of statistically increased AUC, sensitivity, and accuracy to predict health resource accessibility for the target readership, indicating that both MRF and MSF contribute to the comprehension of health information, and that for readers with advanced education, semantic features outweigh syntax and domain knowledge.

Download Full-text

Predicting Health Material Cognitive Accessibility for Non-Native English Speakers Using Multidimensional Semantic Features as Predictors of Machine Learning Algorithms (Preprint)

10.2196/preprints.25110 ◽

2021 ◽

Author(s):

Meng Ji ◽

Yanmeng Liu ◽

Tianyong Hao

Keyword(s):

Machine Learning ◽

Health Education ◽

Decision Tree ◽

Health Information ◽

Domain Knowledge ◽

Learning Algorithms ◽

Native English Speakers ◽

Machine Learning Algorithms ◽

Semantic Features ◽

Cognitive Accessibility

BACKGROUND Much of current health information understandability research uses medical readability formula to assess the cognitive difficulty of health education resources. This is based on an implicit assumption that medical domain knowledge represented by uncommon words or jargons form the sole barriers to health information access among the public. Our study challenged this by showing that for readers from non-English speaking backgrounds with higher education attainment, semantic features of English health texts which underpin the knowledge structure of English health texts, rather than medical jargons can explain the cognitive accessibility of health materials among readers with better understanding of English health terms, yet very limited exposure to English-based health education environments and traditions. OBJECTIVE Our study explored multidimensional semantic features for developing machine learning algorithms to predict the perceived level of cognitive accessibility of English health materials on health risks and diseases for young adults enrolled in Australian tertiary institutes. We compared algorithms to evaluate the cognitive accessibility of health information for non-native English speaker with advanced education levels yet very limited exposure to English health education environments. METHODS We used 108 semantic features to measure the content complexity and accessibility of original English resources. Using 1000 English health texts collected from Australian and international health organization websites, rated by overseas tertiary students, we compared machine learning (decision tree, SVM, ensemble tree, logistic regression) after hyperparameter optimization (grid search for the best hyperparameter combination of minimal classification errors). We applied 10-fold cross-validation on the whole dataset for the model training and testing, calculated the AUC, sensitivity, specificity, and accuracy as the measurement of the model performance. RESULTS We developed, compared four machine learning algorithms using multidimensional semantic features as predictors. The results showed that ensemble tree (LogitBoost) outperformed in terms of AUC (0.97), sensitivity (0.966), specificity (0.972) and accuracy (0.969). Decision tree followed closely with an AUC (0.924), sensitivity (0.912), specificity (0.9358), and accuracy (0.924), and SVM with an AUC (0.8946), sensitivity (0.8952), specificity (0.894), accuracy (0.8946). Decision tree, ensemble tree, SVM achieved statistically significant improvement over logistic regression in AUC, specificity, accuracy. As the best performing algorithm, ensemble tree reached statistically significant improvement over SVM in AUC, specificity, accuracy, and a statistically significant improvement over decision tree in sensitivity. CONCLUSIONS Our study showed that cognitive accessibility of English health texts is not limited to word length and sentence length as had been conventionally measured by the medical readability formula. We compared machine learning algorithms based on semantic features to explore the cognitive accessibility of health information for non-native English speakers. The results showed the new models reached statistically increased AUC, sensitivity, and accuracy to predict health resource accessibility for the target readership. Our study illustrated that semantic features such as cognitive abilities related semantic features, communicative actions and processes, power relationships in healthcare settings, and lexical familiarity and diversity of health texts are large contributors to the comprehension of health information and that for readers such as international students, semantic features of health texts which outweigh syntax and domain knowledge.

Download Full-text

Predicting Health Information Suitability for Children Using Machine-Learning Assisted Selection of Semantic Features (Preprint)

10.2196/preprints.30115 ◽

2021 ◽

Author(s):

Wenxiu Xie ◽

Meng Ji ◽

Yanmeng Liu ◽

Tianyong Hao ◽

Chi-Yin Chow

Keyword(s):

Machine Learning ◽

Health Literacy ◽

Health Information ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Online Health Information ◽

Semantic Features ◽

Young Readers ◽

Selection Of ◽

Automatic Feature Selection

BACKGROUND Suitability of health resources for specific readerships represents a critical yet underexplored area of research in health informatics, despite its importance in health literacy and health education. High relevance of health information can improve the suitability and readability of online health educational resources for young readers. It has an important role in developing the health literacy of children with increasing exposure to online health information. Existing research on health resource evaluation is limited to the analysis of the morphological and syntactic complexity. Besides, empirical instruments do not exist to evaluate the suitability of online health information for children. OBJECTIVE We aimed to develop algorithms to predict suitability of online health information for this understudied user group, using a small number of semantic features to provide accurate and convenient tools for automatic prediction of the suitability of online health information for children. METHODS Combining machine learning and linguistic insights, we identified semantic features to predict the suitability of online health information for children, as an emerging and large readership on online health information. The selection of natural language features as predicator variables of algorithms went through initial automatic feature selection using Ridge Classifier, support vector machine, extreme gradient boost, followed by revision by linguists, education experts based on effective health information design. We compared algorithms using the automatically selected features (19) and linguistically enhanced features (20), using the initial features (115) as the baseline. RESULTS Using 5-fold cross-validation, comparing with the baseline (115 features), the Gaussian Naive Bayes model (20 features) achieved statistically higher mean sensitivity (P =0.0206, 95% CI: -0.016, 0.1929); mean specificity (P = 0.0205, 95% CI: -0.016, 0.199); mean AUC (P =0.017, 95% CI: -0.007, 0.140); mean Macro F1 (P =0.0061, 95% CI: 0.016, 0.167). The statistically improved performance of the final model (20 features) stands in contrast with the statistically insignificant changes between the original feature set (115) and the automatically selected features (19): mean sensitivity (P =0.134, 95% CI: -0.1699, 0.0681), mean specificity (P = 0.1001, 95% CI: -0.1389, 0.4017); mean AUC (P =0.0082, 95% CI: 0.0059, 0.1126), and mean macro F1 (P = 0.9796, 95% CI: -0.0555, 0.0548). This demonstrates the importance and effectiveness of combing automatic feature selection and expert-based linguistic revision to develop most effective machine learning algorithms from high-dimensional datasets. CONCLUSIONS Our study developed machine learning algorithms for evaluating health information suitability for children, an important readership who is having increasing reliance on online health information for developing their health literacy. User-adaptive automatic assessment of online health contents holds much promise for distant and remote health education among young readers. Our study leveraged the precision, adaptability of machine learning algorithms and insights from health linguistics to help advance this significant yet understudied area of research.

Download Full-text

Miss Predicting Readability of Health Educational Resources for Children Using Semantic Features

International Linguistics Research ◽

10.30560/ilr.v4n2p10 ◽

2021 ◽

Vol 4 (2) ◽

pp. p10

Author(s):

Yanmeng Liu

Keyword(s):

Machine Learning ◽

Health Education ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Ensemble Classifier ◽

Machine Learning Algorithms ◽

Support Vector ◽

Semantic Features ◽

K Nearest Neighbors ◽

Education Resources

The success of health education resources largely depends on their readability, as the health information can only be understood and accepted by the target readers when the information is uttered with proper reading difficulty. Unlike other populations, children feature limited knowledge and underdeveloped reading comprehension, which poses more challenges for the readability research on health education resources. This research aims to explore the readability prediction of health education resources for children by using semantic features to develop machine learning algorithms. A data-driven method was applied in this research:1000 health education articles were collected from international health organization websites, and they were grouped into resources for kids and resources for non-kids according to their sources. Moreover, 73 semantic features were used to train five machine learning algorithms (decision tree, support vector machine, k-nearest neighbors algorithm, ensemble classifier, and logistic regression). The results showed that the k-nearest neighbors algorithm and ensemble classifier outperformed in terms of area under the operating characteristic curve sensitivity, specificity, and accuracy and achieved good performance in predicting whether the readability of health education resources is suitable for children or not.

Download Full-text

Predicting Health Material Accessibility: Development of Machine Learning Algorithms

JMIR Medical Informatics ◽

10.2196/29175 ◽

2021 ◽

Vol 9 (9) ◽

pp. e29175

Author(s):

Meng Ji ◽

Yanmeng Liu ◽

Tianyong Hao

Keyword(s):

Machine Learning ◽

Health Education ◽

Decision Tree ◽

Health Information ◽

Domain Knowledge ◽

Learning Algorithms ◽

Nonnative English Speakers ◽

Machine Learning Algorithms ◽

Semantic Features ◽

Cognitive Accessibility

Background Current health information understandability research uses medical readability formulas to assess the cognitive difficulty of health education resources. This is based on an implicit assumption that medical domain knowledge represented by uncommon words or jargon form the sole barriers to health information access among the public. Our study challenged this by showing that, for readers from non-English speaking backgrounds with higher education attainment, semantic features of English health texts that underpin the knowledge structure of English health texts, rather than medical jargon, can explain the cognitive accessibility of health materials among readers with better understanding of English health terms yet limited exposure to English-based health education environments and traditions. Objective Our study explores multidimensional semantic features for developing machine learning algorithms to predict the perceived level of cognitive accessibility of English health materials on health risks and diseases for young adults enrolled in Australian tertiary institutes. We compared algorithms to evaluate the cognitive accessibility of health information for nonnative English speakers with advanced education levels yet limited exposure to English health education environments. Methods We used 113 semantic features to measure the content complexity and accessibility of original English resources. Using 1000 English health texts collected from Australian and international health organization websites rated by overseas tertiary students, we compared machine learning (decision tree, support vector machine [SVM], ensemble tree, and logistic regression) after hyperparameter optimization (grid search for the best hyperparameter combination of minimal classification errors). We applied 10-fold cross-validation on the whole data set for the model training and testing, and calculated the area under the operating characteristic curve (AUC), sensitivity, specificity, and accuracy as the measurement of the model performance. Results We developed and compared 4 machine learning algorithms using multidimensional semantic features as predictors. The results showed that ensemble tree (LogitBoost) outperformed in terms of AUC (0.97), sensitivity (0.966), specificity (0.972), and accuracy (0.969). Decision tree (AUC 0.924, sensitivity 0.912, specificity 0.9358, and accuracy 0.924) and SVM (AUC 0.8946, sensitivity 0.8952, specificity 0.894, and accuracy 0.8946) followed closely. Decision tree, ensemble tree, and SVM achieved statistically significant improvement over logistic regression in AUC, specificity, and accuracy. As the best performing algorithm, ensemble tree reached statistically significant improvement over SVM in AUC, specificity, and accuracy, and statistically significant improvement over decision tree in sensitivity. Conclusions Our study shows that cognitive accessibility of English health texts is not limited to word length and sentence length as had been conventionally measured by medical readability formulas. We compared machine learning algorithms based on semantic features to explore the cognitive accessibility of health information for nonnative English speakers. The results showed the new models reached statistically increased AUC, sensitivity, and accuracy to predict health resource accessibility for the target readership. Our study illustrated that semantic features such as cognitive ability–related semantic features, communicative actions and processes, power relationships in health care settings, and lexical familiarity and diversity of health texts are large contributors to the comprehension of health information; for readers such as international students, semantic features of health texts outweigh syntax and domain knowledge.

Download Full-text

Correction: Predicting Health Material Accessibility: Development of Machine Learning Algorithms (Preprint)

10.2196/preprints.33385 ◽

2021 ◽

Author(s):

Meng Ji ◽

Yanmeng Liu ◽

Tianyong Hao

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Decision Tree ◽

Health Information ◽

Learning Algorithms ◽

Ensemble Classifier ◽

Machine Learning Algorithms ◽

Support Vector ◽

Semantic Features ◽

Cognitive Accessibility

BACKGROUND Current health information understandability research uses medical readability formulas to assess the cognitive difficulty of health education resources. This is based on an implicit assumption that medical domain knowledge represented by uncommon words or jargon form the sole barriers to health information access among the public. Our study challenged this by showing that, for readers from non-English speaking backgrounds with higher education attainment, semantic features of English health texts that underpin the knowledge structure of English health texts, rather than medical jargon, can explain the cognitive accessibility of health materials among readers with better understanding of English health terms yet limited exposure to English-based health education environments and traditions. OBJECTIVE Our study explores multidimensional semantic features for developing machine learning algorithms to predict the perceived level of cognitive accessibility of English health materials on health risks and diseases for young adults enrolled in Australian tertiary institutes. We compared algorithms to evaluate the cognitive accessibility of health information for nonnative English speakers with advanced education levels yet limited exposure to English health education environments. METHODS We used 113 semantic features to measure the content complexity and accessibility of original English resources. Using 1000 English health texts collected from Australian and international health organization websites rated by overseas tertiary students, we compared machine learning (decision tree, support vector machine, ensemble classifier, and logistic regression) after hyperparameter optimization (grid search for the best hyperparameter combination of minimal classification errors). We applied 5-fold cross-validation on the whole data set for the model training and testing; and calculated the area under the operating characteristic curve (AUC), sensitivity, specificity, and accuracy as the measurement of the model performance. RESULTS We developed and compared 4 machine learning algorithms using multidimensional semantic features as predictors. The results showed that ensemble classifier (LogitBoost) outperformed in terms of AUC (0.858), sensitivity (0.787), specificity (0.813), and accuracy (0.802). Support vector machine (AUC 0.848, sensitivity 0.783, specificity 0.791, and accuracy 0.786) and decision tree (AUC 0.754, sensitivity 0.7174, specificity 0.7424, and accuracy 0.732) followed. Ensemble classifier (LogitBoost), support vector machine, and decision tree achieved statistically significant improvement over logistic regression in AUC, sensitivity, specificity, and accuracy. Support vector machine reached statistically significant improvement over decision tree in AUC and accuracy. As the best performing algorithm, ensemble classifier (LogitBoost) reached statistically significant improvement over decision tree in AUC, sensitivity, specificity, and accuracy. CONCLUSIONS Our study shows that cognitive accessibility of English health texts is not limited to word length and sentence length as had been conventionally measured by medical readability formulas. We compared machine learning algorithms based on semantic features to explore the cognitive accessibility of health information for nonnative English speakers. The results showed the new models reached statistically increased AUC, sensitivity, and accuracy to predict health resource accessibility for the target readership. Our study illustrated that semantic features such as cognitive ability–related semantic features, communicative actions and processes, power relationships in health care settings, and lexical familiarity and diversity of health texts are large contributors to the comprehension of health information; for readers such as international students, semantic features of health texts outweigh syntax and domain knowledge.

Download Full-text

Healthcare Analytics: Overcoming the Barriers to Health Information Using Machine Learning Algorithms

Advances in Intelligent Systems and Computing - Image Processing and Capsule Networks ◽

10.1007/978-3-030-51859-2_44 ◽

2020 ◽

pp. 484-496

Author(s):

A. Veena ◽

S. Gowrishankar

Keyword(s):

Machine Learning ◽

Health Information ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Healthcare Analytics ◽

Barriers To Health

Download Full-text

Supplemental Material for One Model to Rule Them All? Using Machine Learning Algorithms to Determine the Number of Factors in Exploratory Factor Analysis

Psychological Methods ◽

10.1037/met0000262.supp ◽

2020 ◽

Keyword(s):

Machine Learning ◽

Factor Analysis ◽

Exploratory Factor Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Number Of Factors

Download Full-text

Forecasting US movies box office performances in Turkey using machine learning algorithms

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189120 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6579-6590

Author(s):

Sandy Çağlıyor ◽

Başar Öztayşi ◽

Selime Sezgin

Keyword(s):

Machine Learning ◽

Global Economy ◽

Learning Algorithms ◽

Forecast Model ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

High Stakes ◽

Box Office ◽

Industry Forecast ◽

The Impact

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.

Download Full-text

Intelligent system of English composition scoring model based on improved machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189235 ◽

2020 ◽

pp. 1-11

Author(s):

Jie Liu ◽

Lin Lin ◽

Xiufang Liang

Keyword(s):

Machine Learning ◽

Evaluation System ◽

Intelligent System ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Assessment System ◽

English Composition ◽

Region Extraction ◽

Constraint Model

The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.

Download Full-text