scholarly journals Predicting unplanned medical visits among patients with diabetes: translation from machine learning to clinical implementation

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Arielle Selya ◽  
Drake Anshutz ◽  
Emily Griese ◽  
Tess L. Weber ◽  
Benson Hsu ◽  
...  

Abstract Background Diabetes is a medical and economic burden in the United States. In this study, a machine learning predictive model was developed to predict unplanned medical visits among patients with diabetes, and findings were used to design a clinical intervention in the sponsoring healthcare organization. This study presents a case study of how predictive analytics can inform clinical actions, and describes practical factors that must be incorporated in order to translate research into clinical practice. Methods Data were drawn from electronic medical records (EMRs) from a large healthcare organization in the Northern Plains region of the US, from adult (≥ 18 years old) patients with type 1 or type 2 diabetes who received care at least once during the 3-year period. A variety of machine-learning classification models were run using standard EMR variables as predictors (age, body mass index (BMI), systolic blood pressure (BP), diastolic BP, low-density lipoprotein, high-density lipoprotein (HDL), glycohemoglobin (A1C), smoking status, number of diagnoses and number of prescriptions). The best-performing model after cross-validation testing was analyzed to identify strongest predictors. Results The best-performing model was a linear-basis support vector machine, which achieved a balanced accuracy (average of sensitivity and specificity) of 65.7%. This model outperformed a conventional logistic regression by 0.4 percentage points. A sensitivity analysis identified BP and HDL as the strongest predictors, such that disrupting these variables with random noise decreased the model’s overall balanced accuracy by 1.3 and 1.4 percentage points, respectively. These recommendations, along with stakeholder engagement, behavioral economics strategies, and implementation science principles helped to inform the design of a clinical intervention targeting behavioral changes. Conclusion Our machine-learning predictive model more accurately predicted unplanned medical visits among patients with diabetes, relative to conventional models. Post-hoc analysis of the model was used for hypothesis generation, namely that HDL and BP are the strongest contributors to unplanned medical visits among patients with diabetes. These findings were translated into a clinical intervention now being piloted at the sponsoring healthcare organization. In this way, this predictive model can be used in moving from prediction to implementation and improved diabetes care management in clinical settings.

2020 ◽  
Author(s):  
Arielle Selya ◽  
Drake Anshutz ◽  
Emily Griese ◽  
Tess L Weber ◽  
Benson Hsu ◽  
...  

Abstract Background: Diabetes is common and an economic burden in the United States. In this study, a machine learning predictive model was developed to predict unplanned medical visits among patients with diabetes. Methods: Data were drawn from electronic medical records (EMRs) from a large healthcare organization in the Northern Plans region of the US, from adult (≥18 years old) patients with type 1 or type 2 diabetes who received care at least once during the 3 year period. A variety of machine-learning classification models were run using standard EMR variables as predictors (age, body mass index (BMI), Systolic blood pressure (BP), Diastolic BP, low-density lipoprotein (LDL), high-density lipoprotein (HDL), glycohemoglobin (A1C), smoking status, number of diagnoses and number of prescriptions). The best-performing model after cross-validation testing was analyzed to identify strongest predictors.Results: The best-performing model was a radial-basis support vector machine, which achieved a prediction accuracy (average of sensitivity and specificity) of 66.2%. This outperformed a conventional logistic regression by 1.5 percentage points. High BP and low HDL were identified as the strongest predictors, such that eliminating these from the model decreased its overall prediction accuracy by 1.9 and 1.8 percentage points, respectively.Conclusion: Our machine-learning predictive model more accurately predicted unplanned medical visits among patients with diabetes, relative to conventional models. Post-hoc analysis of the model was used for hypothesis generation, namely that HDL and BP are the strongest contributors to unplanned medical visits among patients with diabetes. In this way, this predictive model can be used in moving from prediction to implementation and improved diabetes care management in clinical settings.


2021 ◽  
Vol 23 (Supplement_6) ◽  
pp. vi143-vi144
Author(s):  
Omaditya Khanna ◽  
Anahita Fathi Kazerooni ◽  
Jose A Garcia ◽  
Chiharu Sako ◽  
Sherjeel Arif ◽  
...  

Abstract PURPOSE Although WHO grade I meningiomas are considered ‘benign’ tumors, an elevated Ki-67 is one crucial factor that has been shown to influence clinical outcomes. In this study, we use standard pre-operative MRI and develop a machine learning (ML) model to predict the Ki-67 in WHO grade I meningiomas. METHODS A retrospective analysis was performed of 306 patients that underwent surgical resection. The mean and median Ki-67 of tumor specimens were 4.84 ± 4.03% (range: 0.3–33.6) and 3.7% (Q1:2.3%, Q3:6%), respectively. Pre-operative MRI was used to perform radiomic feature extraction (N=2,520) followed by ML modeling using least absolute shrinkage and selection operator (LASSO) wrapped with support vector machine (SVM) through nested cross-validation on a discovery cohort (N=230), to stratify tumors based on Ki-67 < 5% and ≥ 5%. A replication cohort (N=76) was kept ‘unseen’ in order to provide insights regarding the generalizability of our predictive model. RESULTS A total of 60 radiomic features extracted from seven different MRI sequences were used in the final model. With this model, an AUC of 0.84 (95% CI: 0.78-0.90), with associated sensitivity and specificity of 84.1% and 73.3%, respectively, were achieved in the discovery cohort. The selected features in the trained predictive model were then applied to the subjects of the replication cohort and the model was applied independently in this cohort. An AUC of 0.83 (95% CI: 0.73-0.94), with a sensitivity of 82.6% and specificity of 85.5% was obtained for this independent testing. Furthermore, the model performed commendably when applied to all skull base and non-skull base tumors in our patient cohort, evidenced by comparable AUC values of 0.86 and 0.83, respectively. CONCLUSION The results of this study may provide enhanced diagnostics to the surgeon pre-operatively such that it can guide surgical strategy and individual patient treatment paradigms.


2022 ◽  
Author(s):  
Mostafa Rezapour ◽  
Lucas Hansen

Abstract In late December 2019, the novel coronavirus (Sars-Cov-2) and the resulting disease COVID-19 were first identified in Wuhan China. The disease slipped through containment measures, with the first known case in the United States being identified on January 20th, 2020. In this paper, we utilize survey data from the Inter-university Consortium for Political and Social Research and apply several statistical and machine learning models and techniques such as Decision Trees, Multinomial Logistic Regression, Naive Bayes, k-Nearest Neighbors, Support Vector Machines, Neural Networks, Random Forests, Gradient Tree Boosting, XGBoost, CatBoost, LightGBM, Synthetic Minority Oversampling, and Chi-Squared Test to analyze the impacts the COVID-19 pandemic has had on the mental health of frontline workers in the United States. Through the interpretation of the many models applied to the mental health survey data, we have concluded that the most important factor in predicting the mental health decline of a frontline worker is the healthcare role the individual is in (Nurse, Emergency Room Staff, Surgeon, etc.), followed by the amount of sleep the individual has had in the last week, the amount of COVID-19 related news an individual has consumed on average in a day, the age of the worker, and the usage of alcohol and cannabis.


Author(s):  
Kamila Bonifacio ◽  
Michael Maes ◽  
Carine de Farias ◽  
Andressa Matsumoto ◽  
Crisieli Tomereli ◽  
...  

Purpose: To investigate the alterations in nitro-oxidative stress (OS) and antioxidant status in adolescents with metabolic syndrome (MetS) and whether these alterations occur independently from effects of overweight or obesity.Methods: Blood was collected in 47 adolescents with MetS and 94 adolescents without MetS as assessed with the International Diabetes Federation criteria. The International Obesity Task Force (IOTF) criteria were used to classify the subjects into those with overweight or obesity. We measured nitro-oxidative biomarkers including nitric oxide metabolites (NOx), lipid hydroperoxides (LOOH), and malondialdehyde (MDA), and antioxidant biomarkers, i.e. total radical-trapping antioxidant parameter (TRAP), paraoxonase (PON)-1 activity, thiol (SH-) groups, as well as tumor necrosis factor-α, glucose, insulin, triglycerides, uric acid and high-density lipoprotein cholesterol (HDL-C).Results: Logistic regression analysis showed that increased MDA and NOx and a lowered TRAP/uric acid ratio were associated with MetS. Machine learning including soft independent modeling of class analogy (SIMCA) showed that the top-3 most important features of MetS were increased glucose and MDA and lowered HDL-C. Support vector machine using MDA, glucose, insulin, HDL-C, triglycerides and body mass index as input variables yielded a 10-fold cross-validated accuracy of 89.8% when discriminating MetS from controls. The association between MetS and increased MDA was independent from the effects of overweight-obesity. glucose, insulin, triglycerides and HDL-C.Conclusion: In adolescents, increased MDA formation is a key component of MetS, indicating that increased production of reactive oxygen species with consequent lipid peroxidation and aldehyde formation participate in the development of MetS.


2019 ◽  
Author(s):  
Safwan Wshah ◽  
Christian Skalka ◽  
Matthew Price

BACKGROUND A majority of adults in the United States are exposed to a potentially traumatic event but only a handful go on to develop impairing mental health conditions such as posttraumatic stress disorder (PTSD). OBJECTIVE Identifying those at elevated risk shortly after trauma exposure is a clinical challenge. The aim of this study was to develop computational methods to more effectively identify at-risk patients and, thereby, support better early interventions. METHODS We proposed machine learning (ML) induction of models to automatically predict elevated PTSD symptoms in patients 1 month after a trauma, using self-reported symptoms from data collected via smartphones. RESULTS We show that an ensemble model accurately predicts elevated PTSD symptoms, with an area under the curve (AUC) of .85, using a bag of support vector machines, naive Bayes, logistic regression, and random forest algorithms. Furthermore, we show that only 7 self-reported items (features) are needed to obtain this AUC. Most importantly, we show that accurate predictions can be made 10 to 20 days posttrauma. CONCLUSIONS These results suggest that simple smartphone-based patient surveys, coupled with automated analysis using ML-trained models, can identify those at risk for developing elevated PTSD symptoms and thus target them for early intervention.


Flood and drought are frequently happening natural disasters in most of the countries. These disasters can cause considerable damage to agriculture, ecology and economy of the country. Mitigating the impacts of flood and drought is a valuable help to the human being. The main cause of these disasters is precipitation. If the past precipitation data are analyzed properly, the future flood and drought events can be easily found. Prediction using the Standard Precipitation Index (SPI) is a way to find the wet or dry condition of a region or country. In this paper the SPI values with different lead times are calculated for a long period of time. These SPI indices are analysed by a predictive model using the machine learning algorithm called Support Vector Regression (SVR) with RBF (Radial Basis Function) kernel. In this model the Grid Search approach is used for optimization. The forecast result of this predictive model shows the predictive skill of the SVR-RBF kernel.


Author(s):  
Inssaf El Guabassi ◽  
Zakaria Bousalem ◽  
Rim Marah ◽  
Aimad Qazdar

In recent years, the world's population is increasingly demanding to predict the future with certainty, predicting the right information in any area is becoming a necessity. One of the ways to predict the future with certainty is to determine the possible future. In this sense, machine learning is a way to analyze huge datasets to make strong predictions or decisions. The main objective of this research work is to build a predictive model for evaluating students’ performance. Hence, the contributions are threefold. The first is to apply several supervised machine learning algorithms (i.e. ANCOVA, Logistic Regression, Support Vector Regression, Log-linear Regression, Decision Tree Regression, Random Forest Regression, and Partial Least Squares Regression) on our education dataset. The second purpose is to compare and evaluate algorithms used to create a predictive model based on various evaluation metrics. The last purpose is to determine the most important factors that influence the success or failure of the students. The experimental results showed that the Log-linear Regression provides a better prediction as well as the behavioral factors that influence students’ performance.


Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Mariam N Rana ◽  
Chang Kim ◽  
Chris T Longenecker ◽  
Sadeer Al-Kindi

Background: Low-density lipoprotein cholesterol (LDL-C) is commonly estimated from total cholesterol (TC), high-density lipoprotein cholesterol (HDL-C) and triglycerides (TG) using predefined equations which assume fixed or varying relationships between these parameters and may under- or overestimate LDL-C. Machine learning (ML) algorithms allow prediction of complex non-linear relationships. We, therefore, sought to investigate the utility of ML to predict LDL-C in comparison with two widely-used methods (Friedewald and Hopkins). Methods: We identified 7397 direct LDL-C (4716 HIV, 2060 uninfected controls) measurements in the Women's Interagency HIV Study (WIHS), a prospective study of women with and without HIV undergoing serial assessments. We trained and optimized 5 ML methods (linear regression, random forest, gradient boosting machine, support vector machine, and neural networks) to predict LDL-C using TC, HDL-C, and TG in 80% of the measurements and tested model performance in a holdout test set (20% of the measurements). Results: Overall, the support vector machine model had the best performance characteristics, outperforming Friedewald’s and Hopkins methods with higher R 2 , lower root mean square error and lower mean absolute error. The support vector machine model performance remained superior in participant subgroups with and without HIV and those with non-fasting measurements. Model performance parameters for the test dataset are shown in the Figure. Conclusions: A support vector machine learning model predicts directly measured LDL-C more accurately than Friedewald and Hopkins methods, especially in non-fasting patients with HIV. Further studies are needed to provide external validation.


Diagnostics ◽  
2020 ◽  
Vol 10 (5) ◽  
pp. 307 ◽  
Author(s):  
Chih-Min Tsai ◽  
Chun-Hung Richard Lin ◽  
Huan Zhang ◽  
I-Min Chiu ◽  
Chi-Yung Cheng ◽  
...  

Blood culture is frequently used to detect bacteremia in febrile children. However, a high rate of negative or false-positive blood culture results is common at the pediatric emergency department (PED). The aim of this study was to use machine learning to build a model that could predict bacteremia in febrile children. We conducted a retrospective case-control study of febrile children who presented to the PED from 2008 to 2015. We adopted machine learning methods and cost-sensitive learning to establish a predictive model of bacteremia. We enrolled 16,967 febrile children with blood culture tests during the eight-year study period. Only 146 febrile children had true bacteremia, and more than 99% of febrile children had a contaminant or negative blood culture result. The maximum area under the curve of logistic regression and support vector machines to predict bacteremia were 0.768 and 0.832, respectively. Using the predictive model, we can categorize febrile children by risk value into five classes. Class 5 had the highest probability of having bacteremia, while class 1 had no risk. Obtaining blood cultures in febrile children at the PED rarely identifies a causative pathogen. Prediction models can help physicians determine whether patients have bacteremia and may reduce unnecessary expenses.


2019 ◽  
Vol 11 (7) ◽  
pp. 819 ◽  
Author(s):  
Julia Marrs ◽  
Wenge Ni-Meister

The use of light detection and ranging (LiDAR) techniques for recording and analyzing tree and forest structural variables shows strong promise for improving established hyperspectral-based tree species classifications; however, previous multi-sensoral projects were often limited by error resulting from seasonal or flight path differences. The National Aeronautics and Space Administration (NASA) Goddard’s LiDAR, hyperspectral, and thermal imager (G-LiHT) is now providing co-registered data on experimental forests in the United States, which are associated with established ground truths from existing forest plots. Free, user-friendly machine learning applications like the Orange Data Mining Extension for Python recently simplified the process of combining datasets, handling variable redundancy and noise, and reducing dimensionality in remotely sensed datasets. Neural networks, CN2 rules, and support vector machine methods are used here to achieve a final classification accuracy of 67% for dominant tree species in experimental plots of Howland Experimental Forest, a mixed coniferous–deciduous forest with ten dominant tree species, and 59% for plots in Penobscot Experimental Forest, a mixed coniferous–deciduous forest with 15 dominant tree species. These accuracies are higher than those produced using LiDAR or hyperspectral datasets separately, suggesting that combined spectral and structural data have a greater richness of complementary information than either dataset alone. Using greatly simplified datasets created by our dimensionality reduction methodology, machine learner performance remains comparable or higher to that using the full dataset. Across forests, the identification of shared structural and spectral variables suggests that this methodology can successfully identify parameters with high explanatory power for differentiating among tree species, and opens the possibility of addressing large-scale forestry questions using optimized remote sensing workflows.


Sign in / Sign up

Export Citation Format

Share Document