Predicting self-intercepted medication ordering errors using machine learning

Current approaches to understanding medication ordering errors rely on relatively small manually captured error samples. These approaches are resource-intensive, do not scale for computerized provider order entry (CPOE) systems, and are likely to miss important risk factors associated with medication ordering errors. Previously, we described a dataset of CPOE-based medication voiding accompanied by univariable and multivariable regression analyses. However, these traditional techniques require expert guidance and may perform poorly compared to newer approaches. In this paper, we update that analysis using machine learning (ML) models to predict erroneous medication orders and identify its contributing factors. We retrieved patient demographics (race/ethnicity, sex, age), clinician characteristics, type of medication order (inpatient, prescription, home medication by history), and order content. We compared logistic regression, random forest, boosted decision trees, and artificial neural network models. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). The dataset included 5,804,192 medication orders, of which 28,695 (0.5%) were voided. ML correctly classified voids at reasonable accuracy; with a positive predictive value of 10%, ~20% of errors were included. Gradient boosted decision trees achieved the highest AUROC (0.7968) and AUPRC (0.0647) among all models. Logistic regression had the poorest performance. Models identified predictive factors with high face validity (e.g., student orders), and a decision tree revealed interacting contexts with high rates of errors not identified by previous regression models. Prediction models using order-entry information offers promise for error surveillance, patient safety improvements, and targeted clinical review. The improved performance of models with complex interactions points to the importance of contextual medication ordering information for understanding contributors to medication errors.

Download Full-text

Development of Machine Learning Models to Predict Probabilities and Types of Stroke at Prehospital Stage: the Japan Urgent Stroke Triage Score Using Machine Learning (JUST-ML)

Translational Stroke Research ◽

10.1007/s12975-021-00937-x ◽

2021 ◽

Author(s):

Kazutaka Uchida ◽

Junichi Kouno ◽

Shinichi Yoshimura ◽

Norito Kinjo ◽

Fumihiro Sakakibara ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forests ◽

Prediction Models ◽

Characteristic Curve ◽

Predictive Performance ◽

Vessel Occlusion ◽

Predictive Values ◽

Training Cohort ◽

Sensitivity Specificity

AbstractIn conjunction with recent advancements in machine learning (ML), such technologies have been applied in various fields owing to their high predictive performance. We tried to develop prehospital stroke scale with ML. We conducted multi-center retrospective and prospective cohort study. The training cohort had eight centers in Japan from June 2015 to March 2018, and the test cohort had 13 centers from April 2019 to March 2020. We use the three different ML algorithms (logistic regression, random forests, XGBoost) to develop models. Main outcomes were large vessel occlusion (LVO), intracranial hemorrhage (ICH), subarachnoid hemorrhage (SAH), and cerebral infarction (CI) other than LVO. The predictive abilities were validated in the test cohort with accuracy, positive predictive value, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and F score. The training cohort included 3178 patients with 337 LVO, 487 ICH, 131 SAH, and 676 CI cases, and the test cohort included 3127 patients with 183 LVO, 372 ICH, 90 SAH, and 577 CI cases. The overall accuracies were 0.65, and the positive predictive values, sensitivities, specificities, AUCs, and F scores were stable in the test cohort. The classification abilities were also fair for all ML models. The AUCs for LVO of logistic regression, random forests, and XGBoost were 0.89, 0.89, and 0.88, respectively, in the test cohort, and these values were higher than the previously reported prediction models for LVO. The ML models developed to predict the probability and types of stroke at the prehospital stage had superior predictive abilities.

Download Full-text

Machine Learning-based in-hospital Mortality Prediction Models for Patients With Acute Coronary Syndrome

10.21203/rs.3.rs-134944/v1 ◽

2020 ◽

Author(s):

Jun Ke ◽

Yiwei Chen ◽

Xiaoping Wang ◽

Zhiyong Wu ◽

qiongyao Zhang ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Hospital Mortality ◽

Operating Characteristic ◽

Prediction Models ◽

Characteristic Curve ◽

Multivariate Logistic Regression Analysis ◽

Hdl Cholesterol ◽

Coronary Syndrome

Abstract BackgroundThe purpose of this study is to identify the risk factors of in-hospital mortality in patients with acute coronary syndrome (ACS) and to evaluate the performance of traditional regression and machine learning prediction models.MethodsThe data of ACS patients who entered the emergency department of Fujian Provincial Hospital from January 1, 2017 to March 31, 2020 for chest pain were retrospectively collected. The study used univariate and multivariate logistic regression analysis to identify risk factors for in-hospital mortality of ACS patients. The traditional regression and machine learning algorithms were used to develop predictive models, and the sensitivity, specificity, and receiver operating characteristic curve were used to evaluate the performance of each model.ResultsA total of 7810 ACS patients were included in the study, and the in-hospital mortality rate was 1.75%. Multivariate logistic regression analysis found that age and levels of D-dimer, cardiac troponin I, N-terminal pro-B-type natriuretic peptide (NT-proBNP), lactate dehydrogenase (LDH), high-density lipoprotein (HDL) cholesterol, and calcium channel blockers were independent predictors of in-hospital mortality. The study found that the area under the receiver operating characteristic curve of the models developed by logistic regression, gradient boosting decision tree (GBDT), random forest, and support vector machine (SVM) for predicting the risk of in-hospital mortality were 0.963, 0.960, 0.963, and 0.959, respectively. Feature importance evaluation found that NT-proBNP, LDH, and HDL cholesterol were top three variables that contribute the most to the prediction performance of the GBDT model and random forest model.ConclusionsThe predictive model developed using logistic regression, GBDT, random forest, and SVM algorithms can be used to predict the risk of in-hospital death of ACS patients. Based on our findings, we recommend that clinicians focus on monitoring the changes of NT-proBNP, LDH, and HDL cholesterol, as this may improve the clinical outcomes of ACS patients.

Download Full-text

An investigation of the factors influencing cost system functionality using decision trees, support vector machines and logistic regression

International Journal of Accounting and Information Management ◽

10.1108/ijaim-04-2017-0052 ◽

2019 ◽

Vol 27 (1) ◽

pp. 27-55 ◽

Cited By ~ 1

Author(s):

Cemil Kuzey ◽

Ali Uyar ◽

Dursun Delen

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Support Vector Machines ◽

Decision Trees ◽

Prediction Models ◽

Support Vector ◽

Content Type ◽

Cost System ◽

Factors Influencing ◽

Vector Machines

Purpose The paper aims to identify and critically analyze the factors influencing cost system functionality (CSF) using several machine learning techniques including decision trees, support vector machines and logistic regression. Design/methodology/approach The study used a self-administered survey method to collect the necessary data from companies conducting business in Turkey. Several prediction models are developed and tested; a series of sensitivity analyses is performed on the developed prediction models to assess the ranked importance of factors/variables. Findings Certain factors/variables influence CSF much more than others. The findings of the study suggest that utilization of management accounting practices require a functional cost system, which is supported by a comprehensive cost data management process (i.e. acquisition, storage and utilization). Research limitations/implications The underlying data were collected using a questionnaire survey; thus, it is subjective which reflects the perceptions of the respondents. Ideally, it is expected to reflect the objective of the practices of the firms. Second, the authors have measured CSF it on a “Yes” or “No” basis which does not allow survey respondents reply in between them; thus, it might have limited the choices of the respondents. Third, the Likert scales adopted in the measurement of the other constructs might be limiting the answers of the respondents. Practical implications Information technology plays a very important role for the success of CSF practices. That is, successful implementation of a functional cost system relies heavily on a fully integrated information infrastructure capable of constantly feeding CSF with accurate, relevant and timely data. Originality/value In addition to providing evidence regarding the factors underlying CSF based on a broad range of industries interesting finding, this study also illustrates the viability of machine learning methods as a research framework to critically analyze domain specific data.

Download Full-text

Predictors of remission from body dysmorphic disorder after internet-delivered cognitive behavior therapy: a machine learning approach

10.31234/osf.io/eqcdx ◽

2019 ◽

Author(s):

Oskar Flygare ◽

Jesper Enander ◽

Erik Andersson ◽

Brjánn Ljótsson ◽

Volen Z Ivanov ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forests ◽

Clinical Utility ◽

Body Dysmorphic Disorder ◽

Prediction Models ◽

Behavioral Therapy ◽

Learning Approach ◽

Learning Approaches ◽

Machine Learning Approach

**Background:** Previous attempts to identify predictors of treatment outcomes in body dysmorphic disorder (BDD) have yielded inconsistent findings. One way to increase precision and clinical utility could be to use machine learning methods, which can incorporate multiple non-linear associations in prediction models. **Methods:** This study used a random forests machine learning approach to test if it is possible to reliably predict remission from BDD in a sample of 88 individuals that had received internet-delivered cognitive behavioral therapy for BDD. The random forest models were compared to traditional logistic regression analyses. **Results:** Random forests correctly identified 78% of participants as remitters or non-remitters at post-treatment. The accuracy of prediction was lower in subsequent follow-ups (68%, 66% and 61% correctly classified at 3-, 12- and 24-month follow-ups, respectively). Depressive symptoms, treatment credibility, working alliance, and initial severity of BDD were among the most important predictors at the beginning of treatment. By contrast, the logistic regression models did not identify consistent and strong predictors of remission from BDD. **Conclusions:** The results provide initial support for the clinical utility of machine learning approaches in the prediction of outcomes of patients with BDD. **Trial registration:** ClinicalTrials.gov ID: NCT02010619.

Download Full-text

Experimental study on the factors affecting squeak noise occurrence in automotive suspension bushings

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/09544070211024109 ◽

2021 ◽

pp. 095440702110241

Author(s):

Byunghyun Kang ◽

Cheol Choi ◽

Daeun Sung ◽

Seongho Yoon ◽

Byoung-Ho Choi

Keyword(s):

Neural Network ◽

Logistic Regression ◽

Influencing Factors ◽

Frictional Force ◽

Prediction Models ◽

Statistical Tests ◽

Network Models ◽

Multiple Logistic Regression ◽

Neural Network Models ◽

Automotive Suspension

In this study, friction tests are performed, via a custom-built friction tester, on specimens of natural rubber used in automotive suspension bushings. By analyzing the problematic suspension bushings, the eleven candidate factors that influence squeak noise are selected: surface lubrication, hardness, vulcanization condition, surface texture, additive content, sample thickness, thermal aging, temperature, surface moisture, friction speed, and normal force. Through friction tests, the changes are investigated in frictional force and squeak noise occurrence according to various levels of the influencing factors. The degree of correlation between frictional force and squeak noise occurrence with the factors is determined through statistical tests, and the relationship between frictional force and squeak noise occurrence based on the test results is discussed. Squeak noise prediction models are constructed by considering the interactions among the influencing factors through both multiple logistic regression and neural network analysis. The accuracies of the two prediction models are evaluated by comparing predicted and measured results. The accuracies of the multiple logistic regression and neural network models in predicting the occurrence of squeak noise are 88.2% and 87.2%, respectively.

Download Full-text

Prediction of Hourly Effect of Land Use on Crime

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8010016 ◽

2018 ◽

Vol 8 (1) ◽

pp. 16 ◽

Cited By ~ 4

Author(s):

Irina Matijosaitiene ◽

Peng Zhao ◽

Sylvain Jaume ◽

Joseph Gilkey Jr

Keyword(s):

Machine Learning ◽

Land Use ◽

Logistic Regression ◽

Hot Spots ◽

Urban Areas ◽

Prediction Models ◽

Hot Spot ◽

Time Range ◽

Land Uses ◽

Police Departments

Predicting the exact urban places where crime is most likely to occur is one of the greatest interests for Police Departments. Therefore, the goal of the research presented in this paper is to identify specific urban areas where a crime could happen in Manhattan, NY for every hour of a day. The outputs from this research are the following: (i) predicted land uses that generates the top three most committed crimes in Manhattan, by using machine learning (random forest and logistic regression), (ii) identifying the exact hours when most of the assaults are committed, together with hot spots during these hours, by applying time series and hot spot analysis, (iii) built hourly prediction models for assaults based on the land use, by deploying logistic regression. Assault, as a physical attack on someone, according to criminal law, is identified as the third most committed crime in Manhattan. Land use (residential, commercial, recreational, mixed use etc.) is assigned to every area or lot in Manhattan, determining the actual use or activities within each particular lot. While plotting assaults on the map for every hour, this investigation has identified that the hot spots where assaults occur were ‘moving’ and not confined to specific lots within Manhattan. This raises a number of questions: Why are hot spots of assaults not static in an urban environment? What makes them ‘move’—is it a particular urban pattern? Is the ‘movement’ of hot spots related to human activities during the day and night? Answering these questions helps to build the initial frame for assault prediction within every hour of a day. Knowing a specific land use vulnerability to assault during each exact hour can assist the police departments to allocate forces during those hours in risky areas. For the analysis, the study is using two datasets: a crime dataset with geographical locations of crime, date and time, and a geographic dataset about land uses with land use codes for every lot, each obtained from open databases. The study joins two datasets based on the spatial location and classifies data into 24 classes, based on the time range when the assault occurred. Machine learning methods reveal the effect of land uses on larceny, harassment and assault, the three most committed crimes in Manhattan. Finally, logistic regression provides hourly prediction models and unveils the type of land use where assaults could occur during each hour for both day and night.

Download Full-text

45 Application of Machine Learning Models to Thermal Burn Patient Outcome Predictions in the Aftermath of a Nuclear Event

Journal of Burn Care & Research ◽

10.1093/jbcr/irab032.049 ◽

2021 ◽

Vol 42 (Supplement_1) ◽

pp. S33-S34

Author(s):

Morgan A Taylor ◽

Randy D Kearns ◽

Jeffrey E Carter ◽

Mark H Ebell ◽

Curt A Harris

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Length Of Stay ◽

Regression Models ◽

Large Scale ◽

Prediction Models ◽

Burn Patients ◽

Thermal Burn ◽

Logistic Regression Models ◽

Burn Patient

Abstract Introduction A nuclear disaster would generate an unprecedented volume of thermal burn patients from the explosion and subsequent mass fires (Figure 1). Prediction models characterizing outcomes for these patients may better equip healthcare providers and other responders to manage large scale nuclear events. Logistic regression models have traditionally been employed to develop prediction scores for mortality of all burn patients. However, other healthcare disciplines have increasingly transitioned to machine learning (ML) models, which are automatically generated and continually improved, potentially increasing predictive accuracy. Preliminary research suggests ML models can predict burn patient mortality more accurately than commonly used prediction scores. The purpose of this study is to examine the efficacy of various ML methods in assessing thermal burn patient mortality and length of stay in burn centers. Methods This retrospective study identified patients with fire/flame burn etiologies in the National Burn Repository between the years 2009 – 2018. Patients were randomly partitioned into a 67%/33% split for training and validation. A random forest model (RF) and an artificial neural network (ANN) were then constructed for each outcome, mortality and length of stay. These models were then compared to logistic regression models and previously developed prediction tools with similar outcomes using a combination of classification and regression metrics. Results During the study period, 82,404 burn patients with a thermal etiology were identified in the analysis. The ANN models will likely tend to overfit the data, which can be resolved by ending the model training early or adding additional regularization parameters. Further exploration of the advantages and limitations of these models is forthcoming as metric analyses become available. Conclusions In this proof-of-concept study, we anticipate that at least one ML model will predict the targeted outcomes of thermal burn patient mortality and length of stay as judged by the fidelity with which it matches the logistic regression analysis. These advancements can then help disaster preparedness programs consider resource limitations during catastrophic incidents resulting in burn injuries.

Download Full-text

Predicting hospitalization following psychiatric crisis care using machine learning

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01361-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Models ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Ensemble Model ◽

K Nearest Neighbors ◽

Crisis Care

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.

Download Full-text

Developing a Process for the Analysis of User Journeys and the Prediction of Dropout in Digital Health Interventions: Machine Learning Approach (Preprint)

10.2196/preprints.17738 ◽

2020 ◽

Author(s):

Vincent Bremer ◽

Philip I Chow ◽

Burkhardt Funk ◽

Frances P Thorndike ◽

Lee M Ritterband

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Behavioral Therapy ◽

Digital Health ◽

Area Under The Curve ◽

Prediction Performance ◽

Health Interventions ◽

Drop Out ◽

Support Vector ◽

Boosted Decision Trees

BACKGROUND User dropout is a widespread concern in the delivery and evaluation of digital (ie, web and mobile apps) health interventions. Researchers have yet to fully realize the potential of the large amount of data generated by these technology-based programs. Of particular interest is the ability to predict who will drop out of an intervention. This may be possible through the analysis of user journey data—self-reported as well as system-generated data—produced by the path (or journey) an individual takes to navigate through a digital health intervention. OBJECTIVE The purpose of this study is to provide a step-by-step process for the analysis of user journey data and eventually to predict dropout in the context of digital health interventions. The process is applied to data from an internet-based intervention for insomnia as a way to illustrate its use. The completion of the program is contingent upon completing 7 sequential cores, which include an initial tutorial core. Dropout is defined as not completing the seventh core. METHODS Steps of user journey analysis, including data transformation, feature engineering, and statistical model analysis and evaluation, are presented. Dropouts were predicted based on data from 151 participants from a fully automated web-based program (Sleep Healthy Using the Internet) that delivers cognitive behavioral therapy for insomnia. Logistic regression with L1 and L2 regularization, support vector machines, and boosted decision trees were used and evaluated based on their predictive performance. Relevant features from the data are reported that predict user dropout. RESULTS Accuracy of predicting dropout (area under the curve [AUC] values) varied depending on the program core and the machine learning technique. After model evaluation, boosted decision trees achieved AUC values ranging between 0.6 and 0.9. Additional handcrafted features, including time to complete certain steps of the intervention, time to get out of bed, and days since the last interaction with the system, contributed to the prediction performance. CONCLUSIONS The results support the feasibility and potential of analyzing user journey data to predict dropout. Theory-driven handcrafted features increased the prediction performance. The ability to predict dropout at an individual level could be used to enhance decision making for researchers and clinicians as well as inform dynamic intervention regimens.

Download Full-text

Development of an ensemble machine learning prognostic model to predict 60-day risk of major adverse cardiac events in adults with chest pain

10.1101/2021.03.08.21252615 ◽

2021 ◽

Author(s):

Chris J. Kennedy ◽

Dustin G. Mark ◽

Jie Huang ◽

Mark J. van der Laan ◽

Alan E. Hubbard ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Chest Pain ◽

Random Forest ◽

Decision Trees ◽

Low Risk ◽

Major Adverse Cardiac Events ◽

Risk Scores ◽

Cardiac Events ◽

Adverse Cardiac Events

Background: Chest pain is the second leading reason for emergency department (ED) visits and is commonly identified as a leading driver of low-value health care. Accurate identification of patients at low risk of major adverse cardiac events (MACE) is important to improve resource allocation and reduce over-treatment. Objectives: We sought to assess machine learning (ML) methods and electronic health record (EHR) covariate collection for MACE prediction. We aimed to maximize the pool of low-risk patients that are accurately predicted to have less than 0.5% MACE risk and may be eligible for reduced testing. Population Studied: 116,764 adult patients presenting with chest pain in the ED and evaluated for potential acute coronary syndrome (ACS). 60-day MACE rate was 1.9%. Methods: We evaluated ML algorithms (lasso, splines, random forest, extreme gradient boosting, Bayesian additive regression trees) and SuperLearner stacked ensembling. We tuned ML hyperparameters through nested ensembling, and imputed missing values with generalized low-rank models (GLRM). We benchmarked performance to key biomarkers, validated clinical risk scores, decision trees, and logistic regression. We explained the models through variable importance ranking and accumulated local effect visualization. Results: The best discrimination (area under the precision-recall [PR-AUC] and receiver operating characteristic [ROC-AUC] curves) was provided by SuperLearner ensembling (0.148, 0.867), followed by random forest (0.146, 0.862). Logistic regression (0.120, 0.842) and decision trees (0.094, 0.805) exhibited worse discrimination, as did risk scores [HEART (0.064, 0.765), EDACS (0.046, 0.733)] and biomarkers [serum troponin level (0.064, 0.708), electrocardiography (0.047, 0.686)]. The ensemble's risk estimates were miscalibrated by 0.2 percentage points. The ensemble accurately identified 50% of patients to be below a 0.5% 60-day MACE risk threshold. The most important predictors were age, peak troponin, HEART score, EDACS score, and electrocardiogram. GLRM imputation achieved 90% reduction in root mean-squared error compared to median-mode imputation. Conclusion: Use of ML algorithms, combined with broad predictor sets, improved MACE risk prediction compared to simpler alternatives, while providing calibrated predictions and interpretability. Standard risk scores may neglect important health information available in other characteristics and combined in nuanced ways via ML.

Download Full-text