Correction: Predicting Health Material Accessibility: Development of Machine Learning Algorithms (Preprint)

Support Vector ◽

Semantic Features ◽

BACKGROUND Current health information understandability research uses medical readability formulas to assess the cognitive difficulty of health education resources. This is based on an implicit assumption that medical domain knowledge represented by uncommon words or jargon form the sole barriers to health information access among the public. Our study challenged this by showing that, for readers from non-English speaking backgrounds with higher education attainment, semantic features of English health texts that underpin the knowledge structure of English health texts, rather than medical jargon, can explain the cognitive accessibility of health materials among readers with better understanding of English health terms yet limited exposure to English-based health education environments and traditions. OBJECTIVE Our study explores multidimensional semantic features for developing machine learning algorithms to predict the perceived level of cognitive accessibility of English health materials on health risks and diseases for young adults enrolled in Australian tertiary institutes. We compared algorithms to evaluate the cognitive accessibility of health information for nonnative English speakers with advanced education levels yet limited exposure to English health education environments. METHODS We used 113 semantic features to measure the content complexity and accessibility of original English resources. Using 1000 English health texts collected from Australian and international health organization websites rated by overseas tertiary students, we compared machine learning (decision tree, support vector machine, ensemble classifier, and logistic regression) after hyperparameter optimization (grid search for the best hyperparameter combination of minimal classification errors). We applied 5-fold cross-validation on the whole data set for the model training and testing; and calculated the area under the operating characteristic curve (AUC), sensitivity, specificity, and accuracy as the measurement of the model performance. RESULTS We developed and compared 4 machine learning algorithms using multidimensional semantic features as predictors. The results showed that ensemble classifier (LogitBoost) outperformed in terms of AUC (0.858), sensitivity (0.787), specificity (0.813), and accuracy (0.802). Support vector machine (AUC 0.848, sensitivity 0.783, specificity 0.791, and accuracy 0.786) and decision tree (AUC 0.754, sensitivity 0.7174, specificity 0.7424, and accuracy 0.732) followed. Ensemble classifier (LogitBoost), support vector machine, and decision tree achieved statistically significant improvement over logistic regression in AUC, sensitivity, specificity, and accuracy. Support vector machine reached statistically significant improvement over decision tree in AUC and accuracy. As the best performing algorithm, ensemble classifier (LogitBoost) reached statistically significant improvement over decision tree in AUC, sensitivity, specificity, and accuracy. CONCLUSIONS Our study shows that cognitive accessibility of English health texts is not limited to word length and sentence length as had been conventionally measured by medical readability formulas. We compared machine learning algorithms based on semantic features to explore the cognitive accessibility of health information for nonnative English speakers. The results showed the new models reached statistically increased AUC, sensitivity, and accuracy to predict health resource accessibility for the target readership. Our study illustrated that semantic features such as cognitive ability–related semantic features, communicative actions and processes, power relationships in health care settings, and lexical familiarity and diversity of health texts are large contributors to the comprehension of health information; for readers such as international students, semantic features of health texts outweigh syntax and domain knowledge.

Predicting Health Material Cognitive Accessibility for Non-Native English Speakers Using Multidimensional Semantic Features as Predictors of Machine Learning Algorithms (Preprint)

10.2196/preprints.25110 ◽

2021 ◽

Author(s):

Meng Ji ◽

Yanmeng Liu ◽

Tianyong Hao

Keyword(s):

Machine Learning ◽

Health Education ◽

Decision Tree ◽

Health Information ◽

Domain Knowledge ◽

Learning Algorithms ◽

Native English Speakers ◽

Semantic Features ◽

BACKGROUND Much of current health information understandability research uses medical readability formula to assess the cognitive difficulty of health education resources. This is based on an implicit assumption that medical domain knowledge represented by uncommon words or jargons form the sole barriers to health information access among the public. Our study challenged this by showing that for readers from non-English speaking backgrounds with higher education attainment, semantic features of English health texts which underpin the knowledge structure of English health texts, rather than medical jargons can explain the cognitive accessibility of health materials among readers with better understanding of English health terms, yet very limited exposure to English-based health education environments and traditions. OBJECTIVE Our study explored multidimensional semantic features for developing machine learning algorithms to predict the perceived level of cognitive accessibility of English health materials on health risks and diseases for young adults enrolled in Australian tertiary institutes. We compared algorithms to evaluate the cognitive accessibility of health information for non-native English speaker with advanced education levels yet very limited exposure to English health education environments. METHODS We used 108 semantic features to measure the content complexity and accessibility of original English resources. Using 1000 English health texts collected from Australian and international health organization websites, rated by overseas tertiary students, we compared machine learning (decision tree, SVM, ensemble tree, logistic regression) after hyperparameter optimization (grid search for the best hyperparameter combination of minimal classification errors). We applied 10-fold cross-validation on the whole dataset for the model training and testing, calculated the AUC, sensitivity, specificity, and accuracy as the measurement of the model performance. RESULTS We developed, compared four machine learning algorithms using multidimensional semantic features as predictors. The results showed that ensemble tree (LogitBoost) outperformed in terms of AUC (0.97), sensitivity (0.966), specificity (0.972) and accuracy (0.969). Decision tree followed closely with an AUC (0.924), sensitivity (0.912), specificity (0.9358), and accuracy (0.924), and SVM with an AUC (0.8946), sensitivity (0.8952), specificity (0.894), accuracy (0.8946). Decision tree, ensemble tree, SVM achieved statistically significant improvement over logistic regression in AUC, specificity, accuracy. As the best performing algorithm, ensemble tree reached statistically significant improvement over SVM in AUC, specificity, accuracy, and a statistically significant improvement over decision tree in sensitivity. CONCLUSIONS Our study showed that cognitive accessibility of English health texts is not limited to word length and sentence length as had been conventionally measured by the medical readability formula. We compared machine learning algorithms based on semantic features to explore the cognitive accessibility of health information for non-native English speakers. The results showed the new models reached statistically increased AUC, sensitivity, and accuracy to predict health resource accessibility for the target readership. Our study illustrated that semantic features such as cognitive abilities related semantic features, communicative actions and processes, power relationships in healthcare settings, and lexical familiarity and diversity of health texts are large contributors to the comprehension of health information and that for readers such as international students, semantic features of health texts which outweigh syntax and domain knowledge.

Predicting Health Material Accessibility: Development of Machine Learning Algorithms

JMIR Medical Informatics ◽

10.2196/29175 ◽

2021 ◽

Vol 9 (9) ◽

pp. e29175

Author(s):

Meng Ji ◽

Yanmeng Liu ◽

Tianyong Hao

Keyword(s):

Machine Learning ◽

Health Education ◽

Decision Tree ◽

Health Information ◽

Domain Knowledge ◽

Learning Algorithms ◽

Nonnative English Speakers ◽

Semantic Features ◽

Background Current health information understandability research uses medical readability formulas to assess the cognitive difficulty of health education resources. This is based on an implicit assumption that medical domain knowledge represented by uncommon words or jargon form the sole barriers to health information access among the public. Our study challenged this by showing that, for readers from non-English speaking backgrounds with higher education attainment, semantic features of English health texts that underpin the knowledge structure of English health texts, rather than medical jargon, can explain the cognitive accessibility of health materials among readers with better understanding of English health terms yet limited exposure to English-based health education environments and traditions. Objective Our study explores multidimensional semantic features for developing machine learning algorithms to predict the perceived level of cognitive accessibility of English health materials on health risks and diseases for young adults enrolled in Australian tertiary institutes. We compared algorithms to evaluate the cognitive accessibility of health information for nonnative English speakers with advanced education levels yet limited exposure to English health education environments. Methods We used 113 semantic features to measure the content complexity and accessibility of original English resources. Using 1000 English health texts collected from Australian and international health organization websites rated by overseas tertiary students, we compared machine learning (decision tree, support vector machine [SVM], ensemble tree, and logistic regression) after hyperparameter optimization (grid search for the best hyperparameter combination of minimal classification errors). We applied 10-fold cross-validation on the whole data set for the model training and testing, and calculated the area under the operating characteristic curve (AUC), sensitivity, specificity, and accuracy as the measurement of the model performance. Results We developed and compared 4 machine learning algorithms using multidimensional semantic features as predictors. The results showed that ensemble tree (LogitBoost) outperformed in terms of AUC (0.97), sensitivity (0.966), specificity (0.972), and accuracy (0.969). Decision tree (AUC 0.924, sensitivity 0.912, specificity 0.9358, and accuracy 0.924) and SVM (AUC 0.8946, sensitivity 0.8952, specificity 0.894, and accuracy 0.8946) followed closely. Decision tree, ensemble tree, and SVM achieved statistically significant improvement over logistic regression in AUC, specificity, and accuracy. As the best performing algorithm, ensemble tree reached statistically significant improvement over SVM in AUC, specificity, and accuracy, and statistically significant improvement over decision tree in sensitivity. Conclusions Our study shows that cognitive accessibility of English health texts is not limited to word length and sentence length as had been conventionally measured by medical readability formulas. We compared machine learning algorithms based on semantic features to explore the cognitive accessibility of health information for nonnative English speakers. The results showed the new models reached statistically increased AUC, sensitivity, and accuracy to predict health resource accessibility for the target readership. Our study illustrated that semantic features such as cognitive ability–related semantic features, communicative actions and processes, power relationships in health care settings, and lexical familiarity and diversity of health texts are large contributors to the comprehension of health information; for readers such as international students, semantic features of health texts outweigh syntax and domain knowledge.

Predicting Health Material Cognitive Accessibility Using Multidimensional Semantic Features and Readability Tools as Predicators (Preprint)

10.2196/preprints.29175 ◽

2021 ◽

Author(s):

Meng Ji ◽

Yanmeng Liu ◽

Tianyong Hao

Keyword(s):

Machine Learning ◽

Health Education ◽

Health Information ◽

Domain Knowledge ◽

Learning Algorithms ◽

Semantic Features ◽

Integrated Models ◽

Advanced Education ◽

BACKGROUND Much of current health information understandability research uses medical readability formula (MRF) to assess the cognitive difficulty of health education resources. This is based on an implicit assumption that medical domain knowledge represented by uncommon words or jargons form the sole barriers to health information access among the public. Our study challenged this by showing that for readers from non-English speaking backgrounds with higher education attainment, semantic features of English health texts rather than medical jargons can explain the lack of cognitive access of health materials among readers with better understanding of health terms, yet limited exposure to English health education materials. OBJECTIVE Our study explored combined MRF and multidimensional semantic features (MSF) for developing machine learning algorithms to predict the actual level of cognitive accessibility of English health materials on health risks and diseases for specific populations. We compare algorithms to evaluate the cognitive accessibility of specialised health information for non-native English speaker with advanced education levels yet very limited exposure to English health education environments. METHODS We used 108 semantic features to measure the content complexity and accessibility of original English resources. Using 1000 English health texts collected from international health organization websites, rated by international tertiary students, we compared machine learning (decision tree, SVM, discriminant analysis, ensemble tree and logistic regression) after automatic hyperparameter optimization (grid search for the best combination of hyperparameters of minimal classification errors). We applied 10-fold cross-validation on the whole dataset for the model training and testing, calculated the AUC, sensitivity, specificity, and accuracy as the measured of the model performance. RESULTS Using two sets of predictor features: widely tested MRF and MSF proposed in our study, we developed and compared three sets of machine learning algorithms: the first set of algorithms used MRF as predictors only, the second set of algorithms used MSF as predictors only, and the last set of algorithms used both MRF and MSF as integrated models. The results showed that the integrated models outperformed in terms of AUC, sensitivity, accuracy, and specificity. CONCLUSIONS Our study showed that cognitive accessibility of English health texts is not limited to word length and sentence length conventionally measured by MRF. We compared machine learning algorithms combing MRF and MSF to explore the cognitive accessibility of health information from syntactic and semantic perspectives. The results showed the strength of integrated models in terms of statistically increased AUC, sensitivity, and accuracy to predict health resource accessibility for the target readership, indicating that both MRF and MSF contribute to the comprehension of health information, and that for readers with advanced education, semantic features outweigh syntax and domain knowledge.

Encrypted DNP3 Traffic Classification Using Supervised Machine Learning Algorithms

Machine Learning and Knowledge Extraction ◽

10.3390/make1010022 ◽

2019 ◽

Vol 1 (1) ◽

pp. 384-399 ◽

Cited By ~ 2

Author(s):

Thais de Toledo ◽

Nunzio Torrisi

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Decision Tree ◽

Smart Grids ◽

Learning Algorithms ◽

Electric Utility ◽

Supervised Machine Learning ◽

Support Vector ◽

Communication Link

The Distributed Network Protocol (DNP3) is predominately used by the electric utility industry and, consequently, in smart grids. The Peekaboo attack was created to compromise DNP3 traffic, in which a man-in-the-middle on a communication link can capture and drop selected encrypted DNP3 messages by using support vector machine learning algorithms. The communication networks of smart grids are a important part of their infrastructure, so it is of critical importance to keep this communication secure and reliable. The main contribution of this paper is to compare the use of machine learning techniques to classify messages of the same protocol exchanged in encrypted tunnels. The study considers four simulated cases of encrypted DNP3 traffic scenarios and four different supervised machine learning algorithms: Decision tree, nearest-neighbor, support vector machine, and naive Bayes. The results obtained show that it is possible to extend a Peekaboo attack over multiple substations, using a decision tree learning algorithm, and to gather significant information from a system that communicates using encrypted DNP3 traffic.

A new ML-based approach to enhance student engagement in online environment

PLoS ONE ◽

10.1371/journal.pone.0258788 ◽

2021 ◽

Vol 16 (11) ◽

pp. e0258788

Author(s):

Sarra Ayouni ◽

Fahima Hajjej ◽

Mohamed Maddeh ◽

Shaha Al-Otaibi

Keyword(s):

Neural Network ◽

Machine Learning ◽

Artificial Neural Network ◽

Support Vector Machine ◽

Student Engagement ◽

Decision Tree ◽

Learning Algorithms ◽

Support Vector ◽

Online Environment

The educational research is increasingly emphasizing the potential of student engagement and its impact on performance, retention and persistence. This construct has emerged as an important paradigm in the higher education field for many decades. However, evaluating and predicting the student’s engagement level in an online environment remains a challenge. The purpose of this study is to suggest an intelligent predictive system that predicts the student’s engagement level and then provides the students with feedback to enhance their motivation and dedication. Three categories of students are defined depending on their engagement level (Not Engaged, Passively Engaged, and Actively Engaged). We applied three different machine-learning algorithms, namely Decision Tree, Support Vector Machine and Artificial Neural Network, to students’ activities recorded in Learning Management System reports. The results demonstrate that machine learning algorithms could predict the student’s engagement level. In addition, according to the performance metrics of the different algorithms, the Artificial Neural Network has a greater accuracy rate (85%) compared to the Support Vector Machine (80%) and Decision Tree (75%) classification techniques. Based on these results, the intelligent predictive system sends feedback to the students and alerts the instructor once a student’s engagement level decreases. The instructor can identify the students’ difficulties during the course and motivate them through e-mail reminders, course messages, or scheduling an online meeting.

Miss Predicting Readability of Health Educational Resources for Children Using Semantic Features

International Linguistics Research ◽

10.30560/ilr.v4n2p10 ◽

2021 ◽

Vol 4 (2) ◽

pp. p10

Author(s):

Yanmeng Liu

Keyword(s):

Machine Learning ◽

Health Education ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Ensemble Classifier ◽

Support Vector ◽

Semantic Features ◽

K Nearest Neighbors ◽

Education Resources

The success of health education resources largely depends on their readability, as the health information can only be understood and accepted by the target readers when the information is uttered with proper reading difficulty. Unlike other populations, children feature limited knowledge and underdeveloped reading comprehension, which poses more challenges for the readability research on health education resources. This research aims to explore the readability prediction of health education resources for children by using semantic features to develop machine learning algorithms. A data-driven method was applied in this research:1000 health education articles were collected from international health organization websites, and they were grouped into resources for kids and resources for non-kids according to their sources. Moreover, 73 semantic features were used to train five machine learning algorithms (decision tree, support vector machine, k-nearest neighbors algorithm, ensemble classifier, and logistic regression). The results showed that the k-nearest neighbors algorithm and ensemble classifier outperformed in terms of area under the operating characteristic curve sensitivity, specificity, and accuracy and achieved good performance in predicting whether the readability of health education resources is suitable for children or not.

Prediction of longitudinal facial crack in steel thin slabs funnel mold using different machine learning algorithms

International Journal of Innovation Science ◽

10.1108/ijis-09-2020-0172 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Kushalkumar Thakkar ◽

Suhas Suresh Ambekar ◽

Manoj Hudnurkar

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Decision Tree ◽

Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Manufacturing Companies ◽

Content Type ◽

Steel Manufacturing

Purpose Longitudinal facial cracks (LFC) are one of the major defects occurring in the continuous-casting stage of thin slab caster using funnel molds. Longitudinal cracks occur mainly owing to non-uniform cooling, varying thermal conductivity along mold length and use of high superheat during casting, improper casting powder characteristics. These defects are difficult to capture and are visible only in the final stages of a process or even at the customer end. Besides, there is a seasonality associated with this defect where defect intensity increases during the winter season. To address the issue, a model-based on data analytics is developed. Design/methodology/approach Around six-month data of steel manufacturing process is taken and around 60 data collection point is analyzed. The model uses different classification machine learning algorithms such as logistic regression, decision tree, ensemble methods of a decision tree, support vector machine and Naïve Bays (for different cut off level) to investigate data. Findings Proposed research framework shows that most of models give good results between cut off level 0.6–0.8 and random forest, gradient boosting for decision trees and support vector machine model performs better compared to other model. Practical implications Based on predictions of model steel manufacturing companies can identify the optimal operating range where this defect can be reduced. Originality/value An analytical approach to identify LFC defects provides objective models for reduction of LFC defects. By reducing LFC defects, quality of steel can be improved.

A Support Vector Machine and Decision Tree Based Breast Cancer Prediction

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1752.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2972-2976

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Support Vector Machine ◽

Decision Tree ◽

Learning Algorithms ◽

Classification Model ◽

Supervised Machine Learning ◽

Misclassification Rate ◽

Support Vector

The first step in diagnosis of a breast cancer is the identification of the disease. Early detection of the breast cancer is significant to reduce the mortality rate due to breast cancer. Machine learning algorithms can be used in identification of the breast cancer. The supervised machine learning algorithms such as Support Vector Machine (SVM) and the Decision Tree are widely used in classification problems, such as the identification of breast cancer. In this study, a machine learning model is proposed by employing learning algorithms namely, the support vector machine and decision tree. The kaggle data repository consisting of 569 observations of malignant and benign observations is used to develop the proposed model. Finally, the model is evaluated using accuracy, confusion matrix precision and recall as metrics for evaluation of performance on the test set. The analysis result showed that, the support vector machine (SVM) has better accuracy and less number of misclassification rate and better precision than the decision tree algorithm. The average accuracy of the support vector machine (SVM) is 91.92 % and that of the decision tree classification model is 87.12 %.

Machine Learning Models for Finger Bend Evaluation using Implemented Low cost Flex Sensor

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35742 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 3605-3611

Author(s):

Pratyush Kaware

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Low Cost ◽

Learning Algorithms ◽

Cost Effective ◽

Support Vector ◽

Learning Models ◽

Machine Learning Models

In this paper a cost-effective sensor has been implemented to read finger bend signals, by attaching the sensor to a finger, so as to classify them based on the degree of bent as well as the joint about which the finger was being bent. This was done by testing with various machine learning algorithms to get the most accurate and consistent classifier. Finally, we found that Support Vector Machine was the best algorithm suited to classify our data, using we were able predict live state of a finger, i.e., the degree of bent and the joints involved. The live voltage values from the sensor were transmitted using a NodeMCU micro-controller which were converted to digital and uploaded on a database for analysis.

Book Genre Categorization Using Machine Learning Algorithms (K-Nearest Neighbor, Support Vector Machine and Logistic Regression) using Customized Dataset

International Journal of Computer Science and Mobile Computing ◽

10.47760/ijcsmc.2021.v10i03.002 ◽

2021 ◽

Vol 10 (3) ◽

pp. 14-25

Author(s):

Parilkumar Shiroya

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Logistic Regression ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor