Predicting hospitalization following psychiatric crisis care using machine learning

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.

Download Full-text

Predicting Hospitalization following Psychiatric Crisis Care using Machine Learning

10.21203/rs.2.12338/v1 ◽

2019 ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Predictor Variables ◽

Gradient Boosting ◽

K Nearest Neighbors ◽

Psychiatric Crisis ◽

Crisis Care

Abstract Background: It is difficult to accurately predict whether a patient on the verge of a potential psychiatric crisis will need to be hospitalized. Machine learning may be helpful to improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate and compare the accuracy of ten machine learning algorithms including the commonly used generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact, and explore the most important predictor variables of hospitalization. Methods: Data from 2,084 patients with at least one reported psychiatric crisis care contact included in the longitudinal Amsterdam Study of Acute Psychiatry were used. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared. We also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis. Target variable for the prediction models was whether or not the patient was hospitalized in the 12 months following inclusion in the study. The 39 predictor variables were related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts. Results: We found Gradient Boosting to perform the best (AUC=0.774) and K-Nearest Neighbors performing the least (AUC=0.702). The performance of GLM/logistic regression (AUC=0.76) was above average among the tested algorithms. Gradient Boosting outperformed GLM/logistic regression and K-Nearest Neighbors, and GLM outperformed K-Nearest Neighbors in a Net Reclassification Improvement analysis, although the differences between Gradient Boosting and GLM/logistic regression were small. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions: Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was modest. Future studies may consider to combine multiple algorithms in an ensemble model for optimal performance and to mitigate the risk of choosing suboptimal performing algorithms.

Download Full-text

PigLeg: prediction of swine phenotype using machine learning

PeerJ ◽

10.7717/peerj.8764 ◽

2020 ◽

Vol 8 ◽

pp. e8764 ◽

Cited By ~ 2

Author(s):

Siroj Bakoev ◽

Lyubov Getmantseva ◽

Maria Kolosova ◽

Olga Kostyunina ◽

Duane R. Chartier ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithms ◽

Average Daily Gain ◽

Nearest Neighbors ◽

The State ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbors ◽

Leg Weakness

Industrial pig farming is associated with negative technological pressure on the bodies of pigs. Leg weakness and lameness are the sources of significant economic loss in raising pigs. Therefore, it is important to identify the predictors of limb condition. This work presents assessments of the state of limbs using indicators of growth and meat characteristics of pigs based on machine learning algorithms. We have evaluated and compared the accuracy of prediction for nine ML classification algorithms (Random Forest, K-Nearest Neighbors, Artificial Neural Networks, C50Tree, Support Vector Machines, Naive Bayes, Generalized Linear Models, Boost, and Linear Discriminant Analysis) and have identified the Random Forest and K-Nearest Neighbors as the best-performing algorithms for predicting pig leg weakness using a small set of simple measurements that can be taken at an early stage of animal development. Measurements of Muscle Thickness, Back Fat amount, and Average Daily Gain were found to be significant predictors of the conformation of pig limbs. Our work demonstrates the utility and relative ease of using machine learning algorithms to assess the state of limbs in pigs based on growth rate and meat characteristics.

Download Full-text

Comparative analysis of multiple classification models to improve PM10 prediction performance

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i3.pp2500-2507 ◽

2021 ◽

Vol 11 (3) ◽

pp. 2500

Author(s):

Yong-Jin Jung ◽

Kyoung-Woo Cho ◽

Jong-Sung Lee ◽

Chang-Heon Oh

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Particulate Matter ◽

Prediction Models ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Occurrence Rate ◽

Classification Models ◽

Comparison Results ◽

Multiple Classification

With the increasing requirement of high accuracy for particulate matter prediction, various attempts have been made to improve prediction accuracy by applying machine learning algorithms. However, the characteristics of particulate matter and the problem of the occurrence rate by concentration make it difficult to train prediction models, resulting in poor prediction. In order to solve this problem, in this paper, we proposed multiple classification models for predicting particulate matter concentrations required for prediction by dividing them into AQI-based classes. We designed multiple classification models using logistic regression, decision tree, SVM and ensemble among the various machine learning algorithms. The comparison results of the performance of the four classification models through error matrices confirmed the f-score of 0.82 or higher for all the models other than the logistic regression model.

Download Full-text

A Study of Predictive Models for Early Outcomes of Post-Prostatectomy Incontinence: Machine Learning Approach vs. Logistic Regression Analysis Approach

Applied Sciences ◽

10.3390/app11136225 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6225

Author(s):

Seongkeun Park ◽

Jieun Byun

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Odds Ratio ◽

Prediction Models ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

P Value ◽

Early Recovery ◽

Consistent Group ◽

Recovery Group

Background: Post-prostatectomy incontinence (PPI) is a major complication that can significantly decrease quality of life. Approximately 20% of patients experience consistent PPI as long as 1 year after radical prostatectomy (RP). This study develops a preoperative predictive model and compares its diagnostic performance with conventional tools. Methods: A total of 166 prostate cancer patients who underwent magnetic resonance imaging (MRI) and RP were evaluated. According to the date of the RP, patients were divided into a development cohort (n = 109) and a test cohort (n = 57). Patients were classified as PPI early-recovery or consistent on the basis of pad usage for incontinence at 3 months after RP. Uni- and multi-variable logistic regression analyses were performed to identify associates of PPI early recovery. Four well-known machine learning algorithms (k-nearest neighbor, decision tree, support-vector machine (SVM), and random forest) and a logistic regression model were used to build prediction models for recovery from PPI using preoperative clinical and imaging data. The performances of the prediction models were assessed internally and externally using sensitivity, specificity, accuracy, and area-under-the-curve values and estimated probabilities and the actual proportion of cases of recovery from PPI within 3 months were compared using a chi-squared test. Results: Clinical and imaging findings revealed that age (70.1 years old for the PPI early-recovery group vs. 72.8 years old for the PPI consistent group), membranous urethral length (MUL; 15.7 mm for the PPI early-recovery group vs. 13.9 mm for the PPI consistent group), and obturator internal muscle (18.2 mm for the PPI early-recovery group vs. 17.5 mm for the PPI consistent group) were significantly different between the PPI early-recovery and consistent groups (all p-values < 0.05). Multivariate analysis confirmed that age (odds ratio = 1.07, 95% confidence interval = 1.02–1.14, p-value = 0.007) and MUL (odds ratio = 0.87, 95% confidence interval = 0.80–0.95, p-value = 0.002) were significant independent factors for early recovery. The prediction model using machine learning algorithms showed superior diagnostic performance compared with conventional logistic regression (AUC = 0.59 ± 0.07), especially SVM (AUC = 0.65 ± 0.07). Moreover, all models showed good calibration between the estimated probability and actual observed proportion of cases of recovery from PPI within 3 months. Conclusions: Preoperative clinical data and anatomic features on preoperative MRI can be used to predict early recovery from PPI after RP, and machine learning algorithms provide greater diagnostic accuracy compared with conventional statistical approaches.

Download Full-text

Wind Power Prediction Based on Three Machine-Learning Algorithms: Decision Tree, K-Nearest Neighbors and Random Forest

Proceedings of the Fifteenth International Conference on Management Science and Engineering Management - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-3-030-79203-9_38 ◽

2021 ◽

pp. 490-499

Author(s):

Tingting Liu ◽

Lurong Fan

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Wind Power ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Power Prediction ◽

K Nearest Neighbors ◽

Wind Power Prediction

Download Full-text

Miss Predicting Readability of Health Educational Resources for Children Using Semantic Features

International Linguistics Research ◽

10.30560/ilr.v4n2p10 ◽

2021 ◽

Vol 4 (2) ◽

pp. p10

Author(s):

Yanmeng Liu

Keyword(s):

Machine Learning ◽

Health Education ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Ensemble Classifier ◽

Machine Learning Algorithms ◽

Support Vector ◽

Semantic Features ◽

K Nearest Neighbors ◽

Education Resources

The success of health education resources largely depends on their readability, as the health information can only be understood and accepted by the target readers when the information is uttered with proper reading difficulty. Unlike other populations, children feature limited knowledge and underdeveloped reading comprehension, which poses more challenges for the readability research on health education resources. This research aims to explore the readability prediction of health education resources for children by using semantic features to develop machine learning algorithms. A data-driven method was applied in this research:1000 health education articles were collected from international health organization websites, and they were grouped into resources for kids and resources for non-kids according to their sources. Moreover, 73 semantic features were used to train five machine learning algorithms (decision tree, support vector machine, k-nearest neighbors algorithm, ensemble classifier, and logistic regression). The results showed that the k-nearest neighbors algorithm and ensemble classifier outperformed in terms of area under the operating characteristic curve sensitivity, specificity, and accuracy and achieved good performance in predicting whether the readability of health education resources is suitable for children or not.

Download Full-text

APPLICATION OF MACHINE LEARNING ALGORITHMS FOR CLASSIFICATION OF WEED VARIETIES

Bulletin Series of Physics & Mathematical Sciences ◽

10.51889/2021-3.1728-7901.10 ◽

2021 ◽

Vol 75 (3) ◽

pp. 83-93

Author(s):

Zh. A. Buribayev ◽

◽

Zh. E. Amirgaliyeva ◽

A.S. Ataniyazova ◽

Z. M. Melis ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Agricultural Land ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Weed Detection ◽

K Nearest Neighbors ◽

Data Set

The article considers the relevance of the introduction of intelligent weed detection systems, in order to save herbicides and pesticides, as well as to obtain environmentally friendly products. A brief review of the researchers' scientific works is carried out, which describes the methods of identification, classification and discrimination of weeds developed by them based on machine learning algorithms, convolutional neural networks and deep learning algorithms. This research paper presents a program for detecting pests of agricultural land using the algorithms K-Nearest Neighbors, Random Forest and Decision Tree. The data set is collected from 4 types of weeds, such as amaranthus, ambrosia, bindweed and bromus. According to the results of the assessment, the accuracy of weed detection by the classifiers K-Nearest Neighbors, Random Forest and Decision Tree was 83.3%, 87.5%, and 80%. Quantitative results obtained on real data demonstrate that the proposed approach can provide good results in classifying low-resolution images of weeds.

Download Full-text

Tutorial: Applying Machine Learning in Behavioral Research

10.31234/osf.io/9w6a3 ◽

2020 ◽

Author(s):

Stephanie Turgeon ◽

Marc Lanovaz

Keyword(s):

Machine Learning ◽

Behavior Analysis ◽

Gradient Descent ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Stochastic Gradient Descent ◽

Support Vector ◽

K Nearest Neighbors ◽

Research Questions

Machine learning algorithms hold promise in revolutionizing how educators and clinicians make decisions. However, researchers in behavior analysis have been slow to adopt this methodology to further develop their understanding of human behavior and improve the application of the science to problems of applied significance. One potential explanation for the scarcity of research is that machine learning is not typically taught as part of training programs in behavior analysis. This tutorial aims to address this barrier by promoting increased research using machine learning in behavior analysis. We present how to apply the random forest, support vector machine, stochastic gradient descent, and k-nearest neighbors algorithms on a small dataset to better identify parents who would benefit from a behavior analytic interactive web training. These step-by-step applications should allow researchers to implement machine learning algorithms with novel research questions and datasets.

Download Full-text

Clinical Classifiers to Identify Ascending Aortic Dilatation in Patients With Bicuspid Versus Tricuspid Aortic Valves

10.21203/rs.3.rs-957446/v1 ◽

2021 ◽

Author(s):

Bamba Gaye ◽

Maxime Vignac ◽

Jesper R. Gådin ◽

Magalie Ladouceur ◽

Kenneth Caidahl ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Aortic Valve ◽

Regression Models ◽

Prediction Models ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Multidimensional Data ◽

Aortic Dilatation ◽

Logistic Regression Models

Abstract Objective: We aimed to develop clinical classifiers to identify prevalent ascending aortic dilatation in patients with BAV and tricuspid aortic valve (TAV). Methods: This study included BAV (n=543) and TAV (n=491) patients with aortic valve disease and/or ascending aortic dilatation but devoid of coronary artery disease undergoing cardiothoracic surgery. We applied machine learning algorithms and classic logistic regression models, using multiple variable selection methodologies to identify predictors of high risk of ascending aortic dilatation (ascending aorta with a diameter above 40 mm). Analyses included comprehensive multidimensional data (i.e., valve morphology, clinical data, family history of cardiovascular diseases, prevalent diseases, demographic, lifestyle and medication). Results: BAV patients were younger (60.4±12.4 years) than TAV patients (70.4±9.1 years), and had a higher frequency of aortic dilatation (45.3% vs. 28.9% for BAV and TAV, respectively. P<0.001). The aneurysm prediction models showed mean AUC values above 0.8 for TAV patients, with the absence of aortic stenosis being the main predictor, followed by diabetes and high sensitivity C-Reactive Protein. Using the same clinical measures in BAV patients our prediction model resulted in AUC values between 0.5-0.55, not useful for prediction of aortic dilatation. The classification results were consistent for all machine learning algorithms and classic logistic regression models. Conclusions: Cardiovascular risk profiles appear to be more predictive of aortopathy in TAV patients than in patients with BAV. This adds evidence to the fact that BAV- and TAV-associated aortopathy involve different pathways to aneurysm formation and highlights the need for specific aneurysm preventions in these patients. Further, our results highlight that machine learning approaches do not outperform classical prediction methods in addressing complex interactions and non-linear relations between variables.

Download Full-text

Predictive modeling for 14-day unplanned hospital readmission risk by using machine learning algorithms

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01639-y ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Yu-Tai Lo ◽

Jay Chie-hen Liao ◽

Mei-Hua Chen ◽

Chia-Ming Chang ◽

Cheng-Te Li

Keyword(s):

Machine Learning ◽

High Risk ◽

Prediction Models ◽

Transitional Care ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Unplanned Readmission ◽

Extreme Gradient Boosting ◽

Readmission Risk

Abstract Background Early unplanned hospital readmissions are associated with increased harm to patients, increased medical costs, and negative hospital reputation. With the identification of at-risk patients, a crucial step toward improving care, appropriate interventions can be adopted to prevent readmission. This study aimed to build machine learning models to predict 14-day unplanned readmissions. Methods We conducted a retrospective cohort study on 37,091 consecutive hospitalized adult patients with 55,933 discharges between September 1, 2018, and August 31, 2019, in an 1193-bed university hospital. Patients who were aged < 20 years, were admitted for cancer-related treatment, participated in clinical trial, were discharged against medical advice, died during admission, or lived abroad were excluded. Predictors for analysis included 7 categories of variables extracted from hospital’s medical record dataset. In total, four machine learning algorithms, namely logistic regression, random forest, extreme gradient boosting, and categorical boosting, were used to build classifiers for prediction. The performance of prediction models for 14-day unplanned readmission risk was evaluated using precision, recall, F1-score, area under the receiver operating characteristic curve (AUROC), and area under the precision–recall curve (AUPRC). Results In total, 24,722 patients were included for the analysis. The mean age of the cohort was 57.34 ± 18.13 years. The 14-day unplanned readmission rate was 1.22%. Among the 4 machine learning algorithms selected, Catboost had the best average performance in fivefold cross-validation (precision: 0.9377, recall: 0.5333, F1-score: 0.6780, AUROC: 0.9903, and AUPRC: 0.7515). After incorporating 21 most influential features in the Catboost model, its performance improved (precision: 0.9470, recall: 0.5600, F1-score: 0.7010, AUROC: 0.9909, and AUPRC: 0.7711). Conclusions Our models reliably predicted 14-day unplanned readmissions and were explainable. They can be used to identify patients with a high risk of unplanned readmission based on influential features, particularly features related to diagnoses. The operation of the models with physiological indicators also corresponded to clinical experience and literature. Identifying patients at high risk with these models can enable early discharge planning and transitional care to prevent readmissions. Further studies should include additional features that may enable further sensitivity in identifying patients at a risk of early unplanned readmissions.

Download Full-text