An Ensemble Prediction Model for Potential Student Recommendation Using Machine Learning

Student performance prediction has become a hot research topic. Most of the existing prediction models are built by a machine learning method. They are interested in prediction accuracy but pay less attention to interpretability. We propose a stacking ensemble model to predict and analyze student performance in academic competition. In this model, student performance is classified into two symmetrical categorical classes. To improve accuracy, three machine learning algorithms, including support vector machine (SVM), random forest, and AdaBoost are established in the first level and then integrated by logistic regression via stacking. A feature importance analysis was applied to identify important variables. The experimental data were collected from four academic years in Hankou University. According to comparative studies on five evaluation metrics (precision, recall, F1, error, and area under the receiver operating characteristic curve ( AUC ) in this analysis, the proposed model generally performs better than compared models. The important variables identified from the analysis are interpretable, they can be used as guidance to select potential students.

Download Full-text

DEVELOPMENT AND VALIDATION OF A MODEL FOR THE PREDICTION OF MORTALITY IN CHILDREN UNDER FIVE YEARS WITH CLINICAL PNEUMONIA IN RURAL GAMBIA

10.1101/2021.08.04.21260737 ◽

2021 ◽

Author(s):

Alexander Jarde ◽

David Jeffries ◽

Grant A Mackenzie

Keyword(s):

Machine Learning ◽

Predictive Model ◽

Prediction Models ◽

Characteristic Curve ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Methods ◽

Final Model ◽

Machine Learning Methods ◽

Prediction Of Mortality

Background: Pneumonia is the leading cause of death in children aged 1-59 months. Prediction models for child pneumonia mortality have been developed using regression methods but their performance is insufficient for clinical use. Methods: We used a variety of machine learning methods to develop a predictive model for mortality in children with clinical pneumonia enrolled in population-based surveillance in the Basse Health and Demographic Surveillance System in rural Gambia (n=11,012). Four machine learning algorithms (support vector machine, random forest, artifical neural network, and regularized logistic regression) were implemented, fitting all possible combinations of two or more of 16 selected features. Models were shortlisted based on their training set performance , the number of included features, and the reliability of feature measurement. The final model was selected considering its clinical interpretability. Results: When we applied the final model to the test set (55 deaths), the area under the Receiver Operating Characteristic Curve was 0.88 (95% confidence interval: 0.84, 0.91), sensitivity was 0.78 and specificity was 0.77. Conclusions: Our evaluation of multiple machine learning methods combined with minimal and pragmatic feature selection led to a predictive model with very good performance. We plan further validation of our model in different populations.

Download Full-text

Development of Machine Learning Models for Prediction of Smoking Cessation Outcome

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18052584 ◽

2021 ◽

Vol 18 (5) ◽

pp. 2584

Author(s):

Cheng-Chien Lai ◽

Wei-Hsin Huang ◽

Betty Chia-Chen Chang ◽

Lee-Ching Hwang

Keyword(s):

Machine Learning ◽

Smoking Cessation ◽

Success Rate ◽

Prediction Models ◽

Smoking Status ◽

Medical Center ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Support Vector ◽

Smoking Cessation Outcome

Predictors for success in smoking cessation have been studied, but a prediction model capable of providing a success rate for each patient attempting to quit smoking is still lacking. The aim of this study is to develop prediction models using machine learning algorithms to predict the outcome of smoking cessation. Data was acquired from patients underwent smoking cessation program at one medical center in Northern Taiwan. A total of 4875 enrollments fulfilled our inclusion criteria. Models with artificial neural network (ANN), support vector machine (SVM), random forest (RF), logistic regression (LoR), k-nearest neighbor (KNN), classification and regression tree (CART), and naïve Bayes (NB) were trained to predict the final smoking status of the patients in a six-month period. Sensitivity, specificity, accuracy, and area under receiver operating characteristic (ROC) curve (AUC or ROC value) were used to determine the performance of the models. We adopted the ANN model which reached a slightly better performance, with a sensitivity of 0.704, a specificity of 0.567, an accuracy of 0.640, and an ROC value of 0.660 (95% confidence interval (CI): 0.617–0.702) for prediction in smoking cessation outcome. A predictive model for smoking cessation was constructed. The model could aid in providing the predicted success rate for all smokers. It also had the potential to achieve personalized and precision medicine for treatment of smoking cessation.

Download Full-text

Linear Support Vector Machines for Prediction of Student Performance in School-Based Education

Mathematical Problems in Engineering ◽

10.1155/2020/4761468 ◽

2020 ◽

Vol 2020 ◽

pp. 1-7

Author(s):

Nalindren Naicker ◽

Timothy Adeliyi ◽

Jeanette Wing

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Student Performance ◽

State Of The Art ◽

Learning Algorithms ◽

The State ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Support Vector ◽

Vector Machines

Educational Data Mining (EDM) is a rich research field in computer science. Tools and techniques in EDM are useful to predict student performance which gives practitioners useful insights to develop appropriate intervention strategies to improve pass rates and increase retention. The performance of the state-of-the-art machine learning classifiers is very much dependent on the task at hand. Investigating support vector machines has been used extensively in classification problems; however, the extant of literature shows a gap in the application of linear support vector machines as a predictor of student performance. The aim of this study was to compare the performance of linear support vector machines with the performance of the state-of-the-art classical machine learning algorithms in order to determine the algorithm that would improve prediction of student performance. In this quantitative study, an experimental research design was used. Experiments were set up using feature selection on a publicly available dataset of 1000 alpha-numeric student records. Linear support vector machines benchmarked with ten categorical machine learning algorithms showed superior performance in predicting student performance. The results of this research showed that features like race, gender, and lunch influence performance in mathematics whilst access to lunch was the primary factor which influences reading and writing performance.

Download Full-text

Prediction of Cardiac Arrest in the Emergency Department Based on Machine Learning and Sequential Characteristics: Model Development and Retrospective Clinical Validation Study (Preprint)

10.2196/preprints.15932 ◽

2019 ◽

Author(s):

Sungjun Hong ◽

Sungjoo Lee ◽

Jeonghoon Lee ◽

Won Chul Cha ◽

Kyunga Kim

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Cardiac Arrest ◽

Prediction Model ◽

Prediction Models ◽

Characteristic Curve ◽

Machine Learning Algorithms ◽

Clinical Usefulness ◽

Class Imbalance Problem ◽

Data Set

BACKGROUND The development and application of clinical prediction models using machine learning in clinical decision support systems is attracting increasing attention. OBJECTIVE The aims of this study were to develop a prediction model for cardiac arrest in the emergency department (ED) using machine learning and sequential characteristics and to validate its clinical usefulness. METHODS This retrospective study was conducted with ED patients at a tertiary academic hospital who suffered cardiac arrest. To resolve the class imbalance problem, sampling was performed using propensity score matching. The data set was chronologically allocated to a development cohort (years 2013 to 2016) and a validation cohort (year 2017). We trained three machine learning algorithms with repeated 10-fold cross-validation. RESULTS The main performance parameters were the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). The random forest algorithm (AUROC 0.97; AUPRC 0.86) outperformed the recurrent neural network (AUROC 0.95; AUPRC 0.82) and the logistic regression algorithm (AUROC 0.92; AUPRC=0.72). The performance of the model was maintained over time, with the AUROC remaining at least 80% across the monitored time points during the 24 hours before event occurrence. CONCLUSIONS We developed a prediction model of cardiac arrest in the ED using machine learning and sequential characteristics. The model was validated for clinical usefulness by chronological visualization focused on clinical usability.

Download Full-text

Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India

BMC Public Health ◽

10.1186/s12889-021-11829-y ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Susan Idicula-Thomas ◽

Ulka Gawde ◽

Prabhat Jha

Keyword(s):

Machine Learning ◽

Verbal Autopsy ◽

Causes Of Death ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Gradient Boosting ◽

Support Vector ◽

Diarrhoeal Diseases

Abstract Background Machine learning (ML) algorithms have been successfully employed for prediction of outcomes in clinical research. In this study, we have explored the application of ML-based algorithms to predict cause of death (CoD) from verbal autopsy records available through the Million Death Study (MDS). Methods From MDS, 18826 unique childhood deaths at ages 1–59 months during the time period 2004–13 were selected for generating the prediction models of which over 70% of deaths were caused by six infectious diseases (pneumonia, diarrhoeal diseases, malaria, fever of unknown origin, meningitis/encephalitis, and measles). Six popular ML-based algorithms such as support vector machine, gradient boosting modeling, C5.0, artificial neural network, k-nearest neighbor, classification and regression tree were used for building the CoD prediction models. Results SVM algorithm was the best performer with a prediction accuracy of over 0.8. The highest accuracy was found for diarrhoeal diseases (accuracy = 0.97) and the lowest was for meningitis/encephalitis (accuracy = 0.80). The top signs/symptoms for classification of these CoDs were also extracted for each of the diseases. A combination of signs/symptoms presented by the deceased individual can effectively lead to the CoD diagnosis. Conclusions Overall, this study affirms that verbal autopsy tools are efficient in CoD diagnosis and that automated classification parameters captured through ML could be added to verbal autopsies to improve classification of causes of death.

Download Full-text

Novel Privacy Preserving Non-Invasive Sensing-Based Diagnoses of Pneumonia Disease Leveraging Deep Network Model

Sensors ◽

10.3390/s22020461 ◽

2022 ◽

Vol 22 (2) ◽

pp. 461

Author(s):

Mujeeb Ur Rehman ◽

Arslan Shafique ◽

Kashif Hesham Khan ◽

Sohail Khalid ◽

Abdullah Alhumaidi Alotaibi ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Medical Records ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

X Ray ◽

Non Invasive ◽

Proposed Model ◽

Pneumonia Diagnosis ◽

Better Than

This article presents non-invasive sensing-based diagnoses of pneumonia disease, exploiting a deep learning model to make the technique non-invasive coupled with security preservation. Sensing and securing healthcare and medical images such as X-rays that can be used to diagnose viral diseases such as pneumonia is a challenging task for researchers. In the past few years, patients’ medical records have been shared using various wireless technologies. The wireless transmitted data are prone to attacks, resulting in the misuse of patients’ medical records. Therefore, it is important to secure medical data, which are in the form of images. The proposed work is divided into two sections: in the first section, primary data in the form of images are encrypted using the proposed technique based on chaos and convolution neural network. Furthermore, multiple chaotic maps are incorporated to create a random number generator, and the generated random sequence is used for pixel permutation and substitution. In the second part of the proposed work, a new technique for pneumonia diagnosis using deep learning, in which X-ray images are used as a dataset, is proposed. Several physiological features such as cough, fever, chest pain, flu, low energy, sweating, shaking, chills, shortness of breath, fatigue, loss of appetite, and headache and statistical features such as entropy, correlation, contrast dissimilarity, etc., are extracted from the X-ray images for the pneumonia diagnosis. Moreover, machine learning algorithms such as support vector machines, decision trees, random forests, and naive Bayes are also implemented for the proposed model and compared with the proposed CNN-based model. Furthermore, to improve the CNN-based proposed model, transfer learning and fine tuning are also incorporated. It is found that CNN performs better than other machine learning algorithms as the accuracy of the proposed work when using naive Bayes and CNN is 89% and 97%, respectively, which is also greater than the average accuracy of the existing schemes, which is 90%. Further, K-fold analysis and voting techniques are also incorporated to improve the accuracy of the proposed model. Different metrics such as entropy, correlation, contrast, and energy are used to gauge the performance of the proposed encryption technology, while precision, recall, F1 score, and support are used to evaluate the effectiveness of the proposed machine learning-based model for pneumonia diagnosis. The entropy and correlation of the proposed work are 7.999 and 0.0001, respectively, which reflects that the proposed encryption algorithm offers a higher security of the digital data. Moreover, a detailed comparison with the existing work is also made and reveals that both the proposed models work better than the existing work.

Download Full-text

A machine learning approach to predict ethnicity using personal name and census location in Canada

PLoS ONE ◽

10.1371/journal.pone.0241239 ◽

2020 ◽

Vol 15 (11) ◽

pp. e0241239

Author(s):

Kai On Wong ◽

Osmar R. Zaïane ◽

Faith G. Davis ◽

Yutaka Yasui

Keyword(s):

Machine Learning ◽

First Nations ◽

Predictive Value ◽

Large Scale ◽

Performance Metrics ◽

Characteristic Curve ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approach ◽

Machine Learning Approach

Background Canada is an ethnically-diverse country, yet its lack of ethnicity information in many large databases impedes effective population research and interventions. Automated ethnicity classification using machine learning has shown potential to address this data gap but its performance in Canada is largely unknown. This study conducted a large-scale machine learning framework to predict ethnicity using a novel set of name and census location features. Methods Using census 1901, the multiclass and binary class classification machine learning pipelines were developed. The 13 ethnic categories examined were Aboriginal (First Nations, Métis, Inuit, and all-combined)), Chinese, English, French, Irish, Italian, Japanese, Russian, Scottish, and others. Machine learning algorithms included regularized logistic regression, C-support vector, and naïve Bayes classifiers. Name features consisted of the entire name string, substrings, double-metaphones, and various name-entity patterns, while location features consisted of the entire location string and substrings of province, district, and subdistrict. Predictive performance metrics included sensitivity, specificity, positive predictive value, negative predictive value, F1, Area Under the Curve for Receiver Operating Characteristic curve, and accuracy. Results The census had 4,812,958 unique individuals. For multiclass classification, the highest performance achieved was 76% F1 and 91% accuracy. For binary classifications for Chinese, French, Italian, Japanese, Russian, and others, the F1 ranged 68–95% (median 87%). The lower performance for English, Irish, and Scottish (F1 ranged 63–67%) was likely due to their shared cultural and linguistic heritage. Adding census location features to the name-based models strongly improved the prediction in Aboriginal classification (F1 increased from 50% to 84%). Conclusions The automated machine learning approach using only name and census location features can predict the ethnicity of Canadians with varying performance by specific ethnic categories.

Download Full-text

Detection of Online Fake News Using Blending Ensemble Learning

Scientific Programming ◽

10.1155/2021/3434458 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Arvin Hansrajh ◽

Timothy T. Adeliyi ◽

Jeanette Wing

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Machine Learning Algorithms ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Fake News ◽

Learning Models ◽

Linear Discriminant ◽

Proposed Model ◽

Machine Learning Models

The exponential growth in fake news and its inherent threat to democracy, public trust, and justice has escalated the necessity for fake news detection and mitigation. Detecting fake news is a complex challenge as it is intentionally written to mislead and hoodwink. Humans are not good at identifying fake news. The detection of fake news by humans is reported to be at a rate of 54% and an additional 4% is reported in the literature as being speculative. The significance of fighting fake news is exemplified during the present pandemic. Consequently, social networks are ramping up the usage of detection tools and educating the public in recognising fake news. In the literature, it was observed that several machine learning algorithms have been applied to the detection of fake news with limited and mixed success. However, several advanced machine learning models are not being applied, although recent studies are demonstrating the efﬁcacy of the ensemble machine learning approach; hence, the purpose of this study is to assist in the automated detection of fake news. An ensemble approach is adopted to help resolve the identified gap. This study proposed a blended machine learning ensemble model developed from logistic regression, support vector machine, linear discriminant analysis, stochastic gradient descent, and ridge regression, which is then used on a publicly available dataset to predict if a news report is true or not. The proposed model will be appraised with the popular classical machine learning models, while performance metrics such as AUC, ROC, recall, accuracy, precision, and f1-score will be used to measure the performance of the proposed model. Results presented showed that the proposed model outperformed other popular classical machine learning models.

Download Full-text

A Novel Approach of Weighted Support Vector Machine with Applied Chance Theory for Forecasting Air Pollution Phenomenon in Egypt

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026818500013 ◽

2018 ◽

Vol 17 (01) ◽

pp. 1850001 ◽

Cited By ~ 4

Author(s):

Nabil Mohamed Eldakhly ◽

Magdy Aboul-Ela ◽

Areeg Abdalla

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Prediction Models ◽

Learning Algorithms ◽

Management Control ◽

Air Pollutant ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Chance Theory

The particulate matter air pollutant of diameter less than 10 micrometers (PM[Formula: see text]), a category of pollutants including solid and liquid particles, can be a health hazard for several reasons: it can harm lung tissues and throat, aggravate asthma and increase respiratory illness. Accurate prediction models of PM[Formula: see text] concentrations are essential for proper management, control, and making public warning strategies. Therefore, machine learning techniques have the capability to develop methods or tools that can be used to discover unseen patterns from given data to solve a particular task or problem. The chance theory has advanced concepts pertinent to treat cases where both randomness and fuzziness play simultaneous roles at one time. The main objective is to study the modification of a single machine learning algorithm — support vector machine (SVM) — applying the chance weight of the target variable, based on the chance theory, to the corresponding dataset point to be superior to the ensemble machine learning algorithms. The results of this study are outperforming of the SVM algorithms when modifying and combining with the right theory/technique, especially the chance theory over other modern ensemble learning algorithms.

Download Full-text

Prediction of Recurrence after Transsphenoidal Surgery for Cushing’s Disease: The Use of Machine Learning Algorithms

Neuroendocrinology ◽

10.1159/000496753 ◽

2019 ◽

Vol 108 (3) ◽

pp. 201-210 ◽

Cited By ~ 7

Author(s):

Yifan Liu ◽

Xiaohai Liu ◽

Xinyu Hong ◽

Penghao Liu ◽

Xinjie Bao ◽

...

Keyword(s):

Machine Learning ◽

Cushing’S Disease ◽

Predictive Models ◽

Transsphenoidal Surgery ◽

Characteristic Curve ◽

Serum Cortisol ◽

Machine Learning Algorithms ◽

Cushing's Disease ◽

Morning Serum Cortisol ◽

Better Than

Background: There are no reliable predictive models for recurrence after transsphenoidal surgery (TSS) for Cushing’s disease (CD). Objectives: This study aimed to develop machine learning (ML)-based predictive models for CD recurrence after initial TSS and to evaluate their performance. Method: A total of 354 CD patients were included in this retrospective, supervised learning, data mining study. Predictive models for recurrence were developed according to 17 variables using 7 algorithms. Models were evaluated based on the area under the receiver operating characteristic curve (AUC). Results: All patients were followed up for over 12 months (mean ± SD 43.80 ± 35.61). The recurrence rate was 13.0%. Age (p < 0.001), postoperative morning serum cortisol nadir (p = 0.002), and postoperative (p < 0.001) and preoperative (p = 0.04) morning adrenocorticotropin (ACTH) level were significantly related to recurrence. AUCs of the 7 models ranged from 0.608 to 0.781. The best performance (AUC = 0.781, 95% CI 0.706, 0.856) appeared when 8 variables were introduced to the random forest (RF) algorithm, which was much better than that of logistic regression (AUC = 0.684, p = 0.008) and that of using only postoperative morning serum cortisol (AUC = 0.635, p < 0.001). According to the feature selection algorithms, the top 3 predictors were age, postoperative serum cortisol, and postoperative ACTH. Conclusions: Using ML-based models for prediction of the recurrence after initial TSS for CD is feasible, and RF performs best. The performance of most of ML-based models was significantly better than that of some conventional models.

Download Full-text