Predicting all-cause 90-day hospital readmission for dental patients using machine learning methods

Abstract Introduction Hospital readmission rates are an indicator of the health care quality provided by hospitals. Applying machine learning (ML) to a hospital readmission database offers the potential to identify patients at the highest risk for readmission. However, few studies applied ML methods to predict hospital readmission. This study sought to assess ML as a tool to develop prediction models for all-cause 90-day hospital readmission for dental patients. Methods Using the 2013 Nationwide Readmissions Database (NRD), the study identified 9260 cases for all-cause 90-day index admission for dental patients. Five ML classification algorithms including decision tree, logistic regression, support vector machine, k-nearest neighbors, and artificial neural network (ANN) were implemented to build predictive models. The model performance was estimated and compared by using area under the receiver operating characteristic curve (AUC), and accuracy, sensitivity, specificity, and precision. Results Hospital readmission within 90 days occurred in 1746 cases (18.9%). Total charges, number of diagnosis, age, number of chronic conditions, length of hospital stays, number of procedures, primary expected payer, and severity of illness emerged as the top eight important features in all-cause 90-day hospital readmission. All models had similar performance with ANN (AUC = 0.743) slightly outperforming the rest. Conclusion This study demonstrates a potential annual saving of over $500 million if all of the 90-day readmission cases could be prevented for 21 states represented in the NRD. Among the methods used, the prediction model built by ANN exhibited the best performance. Further testing using ANN and other methods can help to assess important readmission risk factors and to target interventions to those at the greatest risk.

Download Full-text

Forecasting the risk at infractions: an ensemble comparison of machine learning approach

Industrial Management & Data Systems ◽

10.1108/imds-10-2020-0603 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Lei Li ◽

Desheng Wu

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Short Term Memory ◽

Model Performance ◽

Large Data ◽

Support Vector ◽

Learning Approaches ◽

Content Type ◽

Day To Day Operations ◽

Prediction Approach

PurposeThe infraction of securities regulations (ISRs) of listed firms in their day-to-day operations and management has become one of common problems. This paper proposed several machine learning approaches to forecast the risk at infractions of listed corporates to solve financial problems that are not effective and precise in supervision.Design/methodology/approachThe overall proposed research framework designed for forecasting the infractions (ISRs) include data collection and cleaning, feature engineering, data split, prediction approach application and model performance evaluation. We select Logistic Regression, Naïve Bayes, Random Forest, Support Vector Machines, Artificial Neural Network and Long Short-Term Memory Networks (LSTMs) as ISRs prediction models.FindingsThe research results show that prediction performance of proposed models with the prior infractions provides a significant improvement of the ISRs than those without prior, especially for large sample set. The results also indicate when judging whether a company has infractions, we should pay attention to novel artificial intelligence methods, previous infractions of the company, and large data sets.Originality/valueThe findings could be utilized to address the problems of identifying listed corporates' ISRs at hand to a certain degree. Overall, results elucidate the value of the prior infraction of securities regulations (ISRs). This shows the importance of including more data sources when constructing distress models and not only focus on building increasingly more complex models on the same data. This is also beneficial to the regulatory authorities.

Download Full-text

Classification models using circulating neutrophil transcripts can detect unruptured intracranial aneurysm

Journal of Translational Medicine ◽

10.1186/s12967-020-02550-2 ◽

2020 ◽

Vol 18 (1) ◽

Author(s):

Kerry E. Poppenberg ◽

Vincent M. Tutino ◽

Lu Li ◽

Muhammad Waqas ◽

Armond June ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Models ◽

Model Performance ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Training Cohort ◽

Network Analyses ◽

Machine Learning Methods

Abstract Background Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.

Download Full-text

An Ensemble Prediction Model for Potential Student Recommendation Using Machine Learning

Symmetry ◽

10.3390/sym12050728 ◽

2020 ◽

Vol 12 (5) ◽

pp. 728 ◽

Cited By ~ 2

Author(s):

Lijuan Yan ◽

Yanshen Liu

Keyword(s):

Machine Learning ◽

Student Performance ◽

Prediction Models ◽

Characteristic Curve ◽

Machine Learning Algorithms ◽

Ensemble Prediction ◽

Support Vector ◽

Proposed Model ◽

Importance Analysis ◽

Better Than

Student performance prediction has become a hot research topic. Most of the existing prediction models are built by a machine learning method. They are interested in prediction accuracy but pay less attention to interpretability. We propose a stacking ensemble model to predict and analyze student performance in academic competition. In this model, student performance is classified into two symmetrical categorical classes. To improve accuracy, three machine learning algorithms, including support vector machine (SVM), random forest, and AdaBoost are established in the first level and then integrated by logistic regression via stacking. A feature importance analysis was applied to identify important variables. The experimental data were collected from four academic years in Hankou University. According to comparative studies on five evaluation metrics (precision, recall, F1, error, and area under the receiver operating characteristic curve ( AUC ) in this analysis, the proposed model generally performs better than compared models. The important variables identified from the analysis are interpretable, they can be used as guidance to select potential students.

Download Full-text

Prediction of Dansgaard-Oeschger events using machine learning

10.5194/egusphere-egu21-9699 ◽

2021 ◽

Author(s):

Nuno Moniz ◽

Susana Barbosa

Keyword(s):

Machine Learning ◽

Time Series ◽

Prediction Models ◽

Learning Algorithms ◽

Ice Core ◽

Model Performance ◽

Predictive Performance ◽

Oxygen Isotopic Composition ◽

Machine Learning Algorithms ◽

Support Vector

<p>The Dansgaard-Oeschger (DO) events are one of the most striking examples of abrupt climate change in the Earth's history, representing temperature oscillations of about 8 to 16 degrees Celsius within a few decades. DO events have been studied extensively in paleoclimatic records, particularly in ice core proxies. Examples include the Greenland NGRIP record of oxygen isotopic composition.<br>This work addresses the anticipation of DO events using machine learning algorithms. We consider the NGRIP time series from 20 to 60 kyr b2k with the GICC05 timescale and 20-year temporal resolution. Forecasting horizons range from 0 (nowcasting) to 400 years. We adopt three different machine learning algorithms (random forests, support vector machines, and logistic regression) in training windows of 5 kyr. We perform validation on subsequent test windows of 5 kyr, based on timestamps of previous DO events' classification in Greenland by Rasmussen et al. (2014). We perform experiments with both sliding and growing windows.<br>Results show that predictions on sliding windows are better overall, indicating that modelling is affected by non-stationary characteristics of the time series. The three algorithms' predictive performance is similar, with a slightly better performance of random forest models for shorter forecast horizons. The prediction models' predictive capability decreases as the forecasting horizon grows more extensive but remains reasonable up to 120 years. Model performance deprecation is mostly related to imprecision in accurately determining the start and end time of events and identifying some periods as DO events when such is not valid.</p>

Download Full-text

Use of electronic medical records in development and validation of risk prediction models of hospital readmission: systematic review

BMJ ◽

10.1136/bmj.m958 ◽

2020 ◽

pp. m958 ◽

Cited By ~ 7

Author(s):

Elham Mahmoudi ◽

Neil Kamdar ◽

Noa Kim ◽

Gabriella Gonzales ◽

Karandeep Singh ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Language Processing ◽

Predictive Models ◽

Hospital Readmission ◽

Prediction Models ◽

Cox Regression ◽

Patient Specific ◽

Day Hospital ◽

Significant Difference

Abstract Objective To provide focused evaluation of predictive modeling of electronic medical record (EMR) data to predict 30 day hospital readmission. Design Systematic review. Data source Ovid Medline, Ovid Embase, CINAHL, Web of Science, and Scopus from January 2015 to January 2019. Eligibility criteria for selecting studies All studies of predictive models for 28 day or 30 day hospital readmission that used EMR data. Outcome measures Characteristics of included studies, methods of prediction, predictive features, and performance of predictive models. Results Of 4442 citations reviewed, 41 studies met the inclusion criteria. Seventeen models predicted risk of readmission for all patients and 24 developed predictions for patient specific populations, with 13 of those being developed for patients with heart conditions. Except for two studies from the UK and Israel, all were from the US. The total sample size for each model ranged between 349 and 1 195 640. Twenty five models used a split sample validation technique. Seventeen of 41 studies reported C statistics of 0.75 or greater. Fifteen models used calibration techniques to further refine the model. Using EMR data enabled final predictive models to use a wide variety of clinical measures such as laboratory results and vital signs; however, use of socioeconomic features or functional status was rare. Using natural language processing, three models were able to extract relevant psychosocial features, which substantially improved their predictions. Twenty six studies used logistic or Cox regression models, and the rest used machine learning methods. No statistically significant difference (difference 0.03, 95% confidence interval −0.0 to 0.07) was found between average C statistics of models developed using regression methods (0.71, 0.68 to 0.73) and machine learning (0.74, 0.71 to 0.77). Conclusions On average, prediction models using EMR data have better predictive performance than those using administrative data. However, this improvement remains modest. Most of the studies examined lacked inclusion of socioeconomic features, failed to calibrate the models, neglected to conduct rigorous diagnostic testing, and did not discuss clinical impact.

Download Full-text

Machine learning-based mortality prediction model for heat-related illness

Scientific Reports ◽

10.1038/s41598-021-88581-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yohei Hirano ◽

Yutaka Kondo ◽

Toru Hifumi ◽

Shoji Yokobori ◽

Jun Kanda ◽

...

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Prediction Models ◽

Characteristic Curve ◽

Vital Signs ◽

Mortality Prediction ◽

Apache Ii ◽

Apache Ii Score ◽

Support Vector ◽

Mortality Prediction Model

AbstractIn this study, we aimed to develop and validate a machine learning-based mortality prediction model for hospitalized heat-related illness patients. After 2393 hospitalized patients were extracted from a multicentered heat-related illness registry in Japan, subjects were divided into the training set for development (n = 1516, data from 2014, 2017–2019) and the test set (n = 877, data from 2020) for validation. Twenty-four variables including characteristics of patients, vital signs, and laboratory test data at hospital arrival were trained as predictor features for machine learning. The outcome was death during hospital stay. In validation, the developed machine learning models (logistic regression, support vector machine, random forest, XGBoost) demonstrated favorable performance for outcome prediction with significantly increased values of the area under the precision-recall curve (AUPR) of 0.415 [95% confidence interval (CI) 0.336–0.494], 0.395 [CI 0.318–0.472], 0.426 [CI 0.346–0.506], and 0.528 [CI 0.442–0.614], respectively, compared to that of the conventional acute physiology and chronic health evaluation (APACHE)-II score of 0.287 [CI 0.222–0.351] as a reference standard. The area under the receiver operating characteristic curve (AUROC) values were also high over 0.92 in all models, although there were no statistical differences compared to APACHE-II. This is the first demonstration of the potential of machine learning-based mortality prediction models for heat-related illnesses.

Download Full-text

Machine learning to predict distal caries in mandibular second molars associated with impacted third molars

Scientific Reports ◽

10.1038/s41598-021-95024-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sung-Hwi Hur ◽

Eun-Young Lee ◽

Min-Kyung Kim ◽

Somi Kim ◽

Ji-Yeon Kang ◽

...

Keyword(s):

Machine Learning ◽

Decision Making ◽

Clinical Decision Making ◽

Prediction Models ◽

Contact Point ◽

Characteristic Curve ◽

Gradient Boosting ◽

Support Vector ◽

Third Molars ◽

Extreme Gradient Boosting

AbstractImpacted mandibular third molars (M3M) are associated with the occurrence of distal caries on the adjacent mandibular second molars (DCM2M). In this study, we aimed to develop and validate five machine learning (ML) models designed to predict the occurrence of DCM2Ms due to the proximity with M3Ms and determine the relative importance of predictive variables for DCM2Ms that are important for clinical decision making. A total of 2642 mandibular second molars adjacent to M3Ms were analyzed and DCM2Ms were identified in 322 cases (12.2%). The models were trained using logistic regression, random forest, support vector machine, artificial neural network, and extreme gradient boosting ML methods and were subsequently validated using testing datasets. The performance of the ML models was significantly superior to that of single predictors. The area under the receiver operating characteristic curve of the machine learning models ranged from 0.88 to 0.89. Six features (sex, age, contact point at the cementoenamel junction, angulation of M3Ms, Winter's classification, and Pell and Gregory classification) were identified as relevant predictors. These prediction models could be used to detect patients at a high risk of developing DCM2M and ultimately contribute to caries prevention and treatment decision-making for impacted M3Ms.

Download Full-text

Classification Models using Circulating Neutrophil Transcripts Can Detect Unruptured Intracranial Aneurysm

10.21203/rs.3.rs-17161/v2 ◽

2020 ◽

Author(s):

Kerry E Poppenberg ◽

Vincent M Tutino ◽

Lu Li ◽

Muhammad Waqas ◽

Armond June ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Models ◽

Model Performance ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Training Cohort ◽

Network Analyses ◽

Machine Learning Methods

Abstract Background: Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods: Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n=94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n=40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results: Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC)=0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions: We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.

Download Full-text

Machine Learning Approach to Predict Risk of 90-Day Hospital Readmissions in Patients With Atrial Fibrillation: Implications for Quality Improvement in Healthcare

Health Services Research and Managerial Epidemiology ◽

10.1177/2333392820961887 ◽

2020 ◽

Vol 7 ◽

pp. 233339282096188

Author(s):

Man Hung ◽

Eric S. Hon ◽

Evelyn Lauren ◽

Julie Xu ◽

Gary Judd ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Support Vector Machine ◽

Catheter Ablation ◽

Hospital Readmission ◽

Hospital Readmissions ◽

Support Vector ◽

Day Hospital ◽

Learning Methods ◽

Machine Learning Methods

Background: Atrial fibrillation (AF) in the elderly population is projected to increase over the next several decades. Catheter ablation shows promise as a treatment option and is becoming increasingly available. We examined 90-day hospital readmission for AF patients undergoing catheter ablation and utilized machine learning methods to explore the risk factors associated with these readmission trends. Methods: Data from the 2013 Nationwide Readmissions Database on AF cases were used to predict 90-day readmissions for AF with catheter ablation. Multiple machine learning methods such as k-Nearest Neighbors, Decision Tree, and Support Vector Machine were employed to determine variable importance and build risk prediction models. Accuracy, precision, sensitivity, specificity, and area under the curve were compared for each model. Results: The 90-day hospital readmission rate was 17.6%; the average age of the patients was 64.9 years; 62.9% of patients were male. Important variables in predicting 90-day hospital readmissions in patients with AF undergoing catheter ablation included the age of the patient, number of diagnoses on the patient’s record, and the total number of discharges from a hospital. The k-Nearest Neighbor had the best performance with a prediction accuracy of 85%. This was closely followed by Decision Tree, but Support Vector Machine was less ideal. Conclusions: Machine learning methods can produce accurate models in predicting hospital readmissions for patients with AF. The likelihood of readmission to the hospital increases as the patient age, total number of hospital discharges, and total number of patient diagnoses increase. Findings from this study can inform quality improvement in healthcare and in achieving patient-centered care.

Download Full-text

Exploration of Machine Learning for Hyperuricemia Prediction Models Based on Basic Health Checkup Tests

Journal of Clinical Medicine ◽

10.3390/jcm8020172 ◽

2019 ◽

Vol 8 (2) ◽

pp. 172 ◽

Cited By ~ 6

Author(s):

Sangwoo Lee ◽

Eun Choe ◽

Boram Park

Keyword(s):

Machine Learning ◽

Uric Acid ◽

Prediction Models ◽

Characteristic Curve ◽

Support Vector ◽

K Nearest Neighbor ◽

Health Checkup ◽

Classification Rate ◽

Data Set ◽

Acid Status

Background: Machine learning (ML) is a promising methodology for classification and prediction applications in healthcare. However, this method has not been practically established for clinical data. Hyperuricemia is a biomarker of various chronic diseases. We aimed to predict uric acid status from basic healthcare checkup test results using several ML algorithms and to evaluate the performance. Methods: We designed a prediction model for hyperuricemia using a comprehensive health checkup database designed by the classification of ML algorithms, such as discrimination analysis, K-nearest neighbor, naïve Bayes (NBC), support vector machine, decision tree, and random forest classification (RFC). The performance of each algorithm was evaluated and compared with the performance of a conventional logistic regression (CLR) algorithm by receiver operating characteristic curve analysis. Results: Of the 38,001 participants, 7705 were hyperuricemic. For the maximum sensitivity criterion, NBC showed the highest sensitivity (0.73), and RFC showed the second highest (0.66); for the maximum balanced classification rate (BCR) criterion, RFC showed the highest BCR (0.68), and NBC showed the second highest (0.66) among the various ML algorithms for predicting uric acid status. In a comparison to the performance of NBC (area under the curve (AUC) = 0.669, 95% confidence intervals (CI) = 0.669–0.675) and RFC (AUC = 0.775, 95% CI 0.770–0.780) with a CLR algorithm (AUC = 0.568, 95% CI = 0.563–0.571), NBC and RFC showed significantly better performance (p < 0.001). Conclusions: The ML model was superior to the CLR model for the prediction of hyperuricemia. Future studies are needed to determine the best-performing ML algorithms based on data set characteristics. We believe that this study will be informative for studies using ML tools in clinical research.

Download Full-text