A Machine Learning Study to Improve Surgical Case Duration Prediction

AbstractPredictive accuracy of surgical case duration plays a critical role in reducing cost of operation room (OR) utilization. The most common approaches used by hospitals rely on historic averages based on a specific surgeon or a specific procedure type obtained from the electronic medical record (EMR) scheduling systems. However, low predictive accuracy of EMR leads to negative impacts on patients and hospitals, such as rescheduling of surgeries and cancellation. In this study, we aim to improve prediction of operation case duration with advanced machine learning (ML) algorithms. We obtained a large data set containing 170,748 operation cases (from Jan 2017 to Dec 2019) from a hospital. The data covered a broad variety of details on patients, operations, specialties and surgical teams. Meanwhile, a more recent data with 8,672 cases (from Mar to Apr 2020) was also available to be used for external evaluation. We computed historic averages from EMR for surgeon- or procedure-specific and they were used as baseline models for comparison. Subsequently, we developed our models using linear regression, random forest and extreme gradient boosting (XGB) algorithms. All models were evaluated with R-squre (R2), mean absolute error (MAE), and percentage overage (case duration > prediction + 10 % & 15 mins), underage (case duration < prediction - 10 % & 15 mins) and within (otherwise). The XGB model was superior to the other models by having higher R2 (85 %) and percentage within (48 %) as well as lower MAE (30.2 mins). The total prediction errors computed for all the models showed that the XGB model had the lowest inaccurate percent (23.7 %). As a whole, this study applied ML techniques in the field of OR scheduling to reduce medical and financial burden for healthcare management. It revealed the importance of operation and surgeon factors in operation case duration prediction. This study also demonstrated the importance of performing an external evaluation to better validate performance of ML models.

Download Full-text

A Machine Learning Study to Improve Surgical Case Duration Prediction

10.21203/rs.3.rs-40927/v1 ◽

2020 ◽

Author(s):

Ching-Chieh Huang ◽

Jesyin Lai ◽

Der-Yang Cho ◽

Jiaxin Yu

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Healthcare Management ◽

Gradient Boosting ◽

External Evaluation ◽

Data Set ◽

Surgical Case ◽

Case Duration ◽

Extreme Gradient Boosting ◽

Duration Prediction

Abstract Since the emergence of COVID-19, many hospitals have encountered challenges in performing efficient scheduling and good resource management to ensure the quality of healthcare provided to patients is not compromised. Operating room (OR) scheduling is one of the issues that has gained our attention because it is related to workflow efficiency and critical care of hospitals. Automatic scheduling and high predictive accuracy of surgical case duration have a critical role in improving OR utilization. To estimate surgical case duration, many hospitals rely on historic averages based on a specific surgeon or a specific procedure type obtained from electronic medical record (EMR) scheduling systems. However, the low predictive accuracy with EMR data leads to negative impacts on patients and hospitals, such as rescheduling of surgeries and cancellation. In this study, we aim to improve the prediction of surgical case duration with advanced machine learning (ML) algorithms. We obtained a large data set containing 170,748 surgical cases (from Jan 2017 to Dec 2019) from a hospital. The data covered a broad variety of details on patients, surgeries, specialties and surgical teams. In addition, a more recent data set with 8,672 cases (from Mar to Apr 2020) was available to be used for external evaluation. We computed historic averages from the EMR data for surgeon- or procedure-specific cases, and they were used as baseline models for comparison. Subsequently, we developed our models using linear regression, random forest and extreme gradient boosting (XGB) algorithms. All models were evaluated with R-square (R2), mean absolute error (MAE), and percentage overage (actual duration longer than prediction), underage (shorter than prediction) and within (within prediction). The XGB model was superior to the other models, achieving a higher R2 (85 %) and percentage within (48 %) as well as a lower MAE (30.2 min). The total prediction errors computed for all models showed that the XGB model had the lowest inaccurate percentage (23.7 %). Overall, this study applied ML techniques in the field of OR scheduling to reduce the medical and financial burden for healthcare management. The results revealed the importance of surgery and surgeon factors in surgical case duration prediction. This study also demonstrated the importance of performing an external evaluation to better validate the performance of ML models.

Download Full-text

Exploiting Rules to Enhance Machine Learning in Extracting Information From Multi-Institutional Prostate Pathology Reports

JCO Clinical Cancer Informatics ◽

10.1200/cci.20.00028 ◽

2020 ◽

pp. 865-874

Author(s):

Enrico Santus ◽

Tal Schuster ◽

Amir M. Tahmasebi ◽

Clara Li ◽

Adam Yala ◽

...

Keyword(s):

Machine Learning ◽

Hybrid Systems ◽

High Performance ◽

Feature Model ◽

Training Data ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Extreme Gradient Boosting ◽

Pathology Reports

PURPOSE Literature on clinical note mining has highlighted the superiority of machine learning (ML) over hand-crafted rules. Nevertheless, most studies assume the availability of large training sets, which is rarely the case. For this reason, in the clinical setting, rules are still common. We suggest 2 methods to leverage the knowledge encoded in pre-existing rules to inform ML decisions and obtain high performance, even with scarce annotations. METHODS We collected 501 prostate pathology reports from 6 American hospitals. Reports were split into 2,711 core segments, annotated with 20 attributes describing the histology, grade, extension, and location of tumors. The data set was split by institutions to generate a cross-institutional evaluation setting. We assessed 4 systems, namely a rule-based approach, an ML model, and 2 hybrid systems integrating the previous methods: a Rule as Feature model and a Classifier Confidence model. Several ML algorithms were tested, including logistic regression (LR), support vector machine (SVM), and eXtreme gradient boosting (XGB). RESULTS When training on data from a single institution, LR lags behind the rules by 3.5% (F1 score: 92.2% v 95.7%). Hybrid models, instead, obtain competitive results, with Classifier Confidence outperforming the rules by +0.5% (96.2%). When a larger amount of data from multiple institutions is used, LR improves by +1.5% over the rules (97.2%), whereas hybrid systems obtain +2.2% for Rule as Feature (97.7%) and +2.6% for Classifier Confidence (98.3%). Replacing LR with SVM or XGB yielded similar performance gains. CONCLUSION We developed methods to use pre-existing handcrafted rules to inform ML algorithms. These hybrid systems obtain better performance than either rules or ML models alone, even when training data are limited.

Download Full-text

The Application of Machine Learning to a General Risk–Need Assessment Instrument in the Prediction of Criminal Recidivism

Criminal Justice and Behavior ◽

10.1177/0093854820969753 ◽

2020 ◽

pp. 009385482096975

Author(s):

Mehdi Ghasemi ◽

Daniel Anvari ◽

Mahshid Atapour ◽

J. Stephen wormith ◽

Keira C. Stockdale ◽

...

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Characteristic Curve ◽

Assessment Instrument ◽

Support Vector ◽

Data Set ◽

Applied Machine Learning ◽

Vector Machines ◽

Individual Scores

The Level of Service/Case Management Inventory (LS/CMI) is one of the most frequently used tools to assess criminogenic risk–need in justice-involved individuals. Meta-analytic research demonstrates strong predictive accuracy for various recidivism outcomes. In this exploratory study, we applied machine learning (ML) algorithms (decision trees, random forests, and support vector machines) to a data set with nearly 100,000 LS/CMI administrations to provincial corrections clientele in Ontario, Canada, and approximately 3 years follow-up. The overall accuracies and areas under the receiver operating characteristic curve (AUCs) were comparable, although ML outperformed LS/CMI in terms of predictive accuracy for the middle scores where it is hardest to predict the recidivism outcome. Moreover, ML improved the AUCs for individual scores to near 0.60, from 0.50 for the LS/CMI, indicating that ML also improves the ability to rank individuals according to their probability of recidivating. Potential considerations, applications, and future directions are discussed.

Download Full-text

Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences

Social Science Computer Review ◽

10.1177/0894439319888445 ◽

2019 ◽

pp. 089443931988844

Author(s):

Ranjith Vijayakumar ◽

Mike W.-L. Cheung

Keyword(s):

Machine Learning ◽

Empirical Data ◽

Fixed Effects ◽

Predictive Accuracy ◽

Support Vector ◽

Learning Methods ◽

Data Set ◽

Replication Studies ◽

Machine Learning Methods ◽

Accuracy Measure

Machine learning methods have become very popular in diverse fields due to their focus on predictive accuracy, but little work has been conducted on how to assess the replicability of their findings. We introduce and adapt replication methods advocated in psychology to the aims and procedural needs of machine learning research. In Study 1, we illustrate these methods with the use of an empirical data set, assessing the replication success of a predictive accuracy measure, namely, R 2 on the cross-validated and test sets of the samples. We introduce three replication aims. First, tests of inconsistency examine whether single replications have successfully rejected the original study. Rejection will be supported if the 95% confidence interval (CI) of R 2 difference estimates between replication and original does not contain zero. Second, tests of consistency help support claims of successful replication. We can decide apriori on a region of equivalence, where population values of the difference estimates are considered equivalent for substantive reasons. The 90% CI of a different estimate lying fully within this region supports replication. Third, we show how to combine replications to construct meta-analytic intervals for better precision of predictive accuracy measures. In Study 2, R 2 is reduced from the original in a subset of replication studies to examine the ability of the replication procedures to distinguish true replications from nonreplications. We find that when combining studies sampled from same population to form meta-analytic intervals, random-effects methods perform best for cross-validated measures while fixed-effects methods work best for test measures. Among machine learning methods, regression was comparable to many complex methods, while support vector machine performed most reliably across a variety of scenarios. Social scientists who use machine learning to model empirical data can use these methods to enhance the reliability of their findings.

Download Full-text

Discovery of Depression-Associated Factors From a Nationwide Population-Based Survey: Epidemiological Study Using Machine Learning and Network Analysis (Preprint)

10.2196/preprints.27344 ◽

2021 ◽

Author(s):

Sang Min Nam ◽

Thomas A Peterson ◽

Kyoung Yul Seo ◽

Hyun Wook Han ◽

Jee In Kang

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Network Analysis ◽

Survey Data ◽

Associated Factors ◽

Statistical Tests ◽

Epidemiological Studies ◽

Gradient Boosting ◽

Data Set ◽

Extreme Gradient Boosting

BACKGROUND In epidemiological studies, finding the best subset of factors is challenging when the number of explanatory variables is large. OBJECTIVE Our study had two aims. First, we aimed to identify essential depression-associated factors using the extreme gradient boosting (XGBoost) machine learning algorithm from big survey data (the Korea National Health and Nutrition Examination Survey, 2012-2016). Second, we aimed to achieve a comprehensive understanding of multifactorial features in depression using network analysis. METHODS An XGBoost model was trained and tested to classify “current depression” and “no lifetime depression” for a data set of 120 variables for 12,596 cases. The optimal XGBoost hyperparameters were set by an automated machine learning tool (TPOT), and a high-performance sparse model was obtained by feature selection using the feature importance value of XGBoost. We performed statistical tests on the model and nonmodel factors using survey-weighted multiple logistic regression and drew a correlation network among factors. We also adopted statistical tests for the confounder or interaction effect of selected risk factors when it was suspected on the network. RESULTS The XGBoost-derived depression model consisted of 18 factors with an area under the weighted receiver operating characteristic curve of 0.86. Two nonmodel factors could be found using the model factors, and the factors were classified into direct (<i>P</i><.05) and indirect (<i>P</i>≥.05), according to the statistical significance of the association with depression. Perceived stress and asthma were the most remarkable risk factors, and urine specific gravity was a novel protective factor. The depression-factor network showed clusters of socioeconomic status and quality of life factors and suggested that educational level and sex might be predisposing factors. Indirect factors (eg, diabetes, hypercholesterolemia, and smoking) were involved in confounding or interaction effects of direct factors. Triglyceride level was a confounder of hypercholesterolemia and diabetes, smoking had a significant risk in females, and weight gain was associated with depression involving diabetes. CONCLUSIONS XGBoost and network analysis were useful to discover depression-related factors and their relationships and can be applied to epidemiological studies using big survey data.

Download Full-text

Application of machine learning algorithm for predicting gestational diabetes mellitus in early pregnancy†

Frontiers of Nursing ◽

10.2478/fon-2021-0022 ◽

2021 ◽

Vol 8 (3) ◽

pp. 209-221

Author(s):

Li-Li Wei ◽

Yue-Shuai Pan ◽

Yan Zhang ◽

Kai Chen ◽

Hao-Yu Wang ◽

...

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Random Forest ◽

Prediction Model ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Random Forest Algorithm ◽

Random Forest Regression ◽

Data Set

Abstract Objective To study the application of a machine learning algorithm for predicting gestational diabetes mellitus (GDM) in early pregnancy. Methods This study identified indicators related to GDM through a literature review and expert discussion. Pregnant women who had attended medical institutions for an antenatal examination from November 2017 to August 2018 were selected for analysis, and the collected indicators were retrospectively analyzed. Based on Python, the indicators were classified and modeled using a random forest regression algorithm, and the performance of the prediction model was analyzed. Results We obtained 4806 analyzable data from 1625 pregnant women. Among these, 3265 samples with all 67 indicators were used to establish data set F1; 4806 samples with 38 identical indicators were used to establish data set F2. Each of F1 and F2 was used for training the random forest algorithm. The overall predictive accuracy of the F1 model was 93.10%, area under the receiver operating characteristic curve (AUC) was 0.66, and the predictive accuracy of GDM-positive cases was 37.10%. The corresponding values for the F2 model were 88.70%, 0.87, and 79.44%. The results thus showed that the F2 prediction model performed better than the F1 model. To explore the impact of sacrificial indicators on GDM prediction, the F3 data set was established using 3265 samples (F1) with 38 indicators (F2). After training, the overall predictive accuracy of the F3 model was 91.60%, AUC was 0.58, and the predictive accuracy of positive cases was 15.85%. Conclusions In this study, a model for predicting GDM with several input variables (e.g., physical examination, past history, personal history, family history, and laboratory indicators) was established using a random forest regression algorithm. The trained prediction model exhibited a good performance and is valuable as a reference for predicting GDM in women at an early stage of pregnancy. In addition, there are certain requirements for the proportions of negative and positive cases in sample data sets when the random forest algorithm is applied to the early prediction of GDM.

Download Full-text

Probabilistic forecasting of surgical case duration using machine learning: model development and validation

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa140 ◽

2020 ◽

Vol 27 (12) ◽

pp. 1885-1893

Author(s):

York Jiao ◽

Anshuman Sharma ◽

Arbi Ben Abdallah ◽

Thomas M Maddox ◽

Thomas Kannampallil

Keyword(s):

Machine Learning ◽

Probability Distribution ◽

Language Processing ◽

Tertiary Care ◽

Model Development ◽

Performance Measure ◽

Probabilistic Forecasting ◽

Data Set ◽

Mixture Density ◽

Surgical Case

Abstract Objective Accurate estimations of surgical case durations can lead to the cost-effective utilization of operating rooms. We developed a novel machine learning approach, using both structured and unstructured features as input, to predict a continuous probability distribution of surgical case durations. Materials and Methods The data set consisted of 53 783 surgical cases performed over 4 years at a tertiary-care pediatric hospital. Features extracted included categorical (American Society of Anesthesiologists [ASA] Physical Status, inpatient status, day of week), continuous (scheduled surgery duration, patient age), and unstructured text (procedure name, surgical diagnosis) variables. A mixture density network (MDN) was trained and compared to multiple tree-based methods and a Bayesian statistical method. A continuous ranked probability score (CRPS), a generalized extension of mean absolute error, was the primary performance measure. Pinball loss (PL) was calculated to assess accuracy at specific quantiles. Performance measures were additionally evaluated on common and rare surgical procedures. Permutation feature importance was measured for the best performing model. Results MDN had the best performance, with a CRPS of 18.1 minutes, compared to tree-based methods (19.5–22.1 minutes) and the Bayesian method (21.2 minutes). MDN had the best PL at all quantiles, and the best CRPS and PL for both common and rare procedures. Scheduled duration and procedure name were the most important features in the MDN. Conclusions Using natural language processing of surgical descriptors, we demonstrated the use of ML approaches to predict the continuous probability distribution of surgical case durations. The more discerning forecast of the ML-based MDN approach affords opportunities for guiding intelligent schedule design and day-of-surgery operational decisions.

Download Full-text

Machine Learning Can Improve Estimation of Surgical Case Duration: A Pilot Study

Journal of Medical Systems ◽

10.1007/s10916-019-1160-5 ◽

2019 ◽

Vol 43 (3) ◽

Cited By ~ 7

Author(s):

Justin P. Tuwatananurak ◽

Shayan Zadeh ◽

Xinling Xu ◽

Joshua A. Vacanti ◽

William R. Fulton ◽

...

Keyword(s):

Machine Learning ◽

Pilot Study ◽

Surgical Case ◽

Case Duration

Download Full-text

Application of Machine Learning to Interpret Steady State Drainage Relative Permeability Experiments

10.2118/207877-ms ◽

2021 ◽

Author(s):

Eric Sonny Mathew ◽

Moussa Tembely ◽

Waleed AlAmeri ◽

Emad W. Al-Shalabi ◽

Abdul Ravoof Shaik

Keyword(s):

Neural Network ◽

Machine Learning ◽

Experimental Data ◽

Steady State ◽

Relative Permeability ◽

Learning Model ◽

Gradient Boosting ◽

Data Set ◽

Machine Learning Model ◽

Extreme Gradient Boosting

Abstract A meticulous interpretation of steady-state or unsteady-state relative permeability (Kr) experimental data is required to determine a complete set of Kr curves. In this work, three different machine learning models was developed to assist in a faster estimation of these curves from steady-state drainage coreflooding experimental runs. The three different models that were tested and compared were extreme gradient boosting (XGB), deep neural network (DNN) and recurrent neural network (RNN) algorithms. Based on existing mathematical models, a leading edge framework was developed where a large database of Kr and Pc curves were generated. This database was used to perform thousands of coreflood simulation runs representing oil-water drainage steady-state experiments. The results obtained from these simulation runs, mainly pressure drop along with other conventional core analysis data, were utilized to estimate Kr curves based on Darcy's law. These analytically estimated Kr curves along with the previously generated Pc curves were fed as features into the machine learning model. The entire data set was split into 80% for training and 20% for testing. K-fold cross validation technique was applied to increase the model accuracy by splitting the 80% of the training data into 10 folds. In this manner, for each of the 10 experiments, 9 folds were used for training and the remaining one was used for model validation. Once the model is trained and validated, it was subjected to blind testing on the remaining 20% of the data set. The machine learning model learns to capture fluid flow behavior inside the core from the training dataset. The trained/tested model was thereby employed to estimate Kr curves based on available experimental results. The performance of the developed model was assessed using the values of the coefficient of determination (R2) along with the loss calculated during training/validation of the model. The respective cross plots along with comparisons of ground-truth versus AI predicted curves indicate that the model is capable of making accurate predictions with error percentage between 0.2 and 0.6% on history matching experimental data for all the three tested ML techniques (XGB, DNN, and RNN). This implies that the AI-based model exhibits better efficiency and reliability in determining Kr curves when compared to conventional methods. The results also include a comparison between classical machine learning approaches, shallow and deep neural networks in terms of accuracy in predicting the final Kr curves. The various models discussed in this research work currently focusses on the prediction of Kr curves for drainage steady-state experiments; however, the work can be extended to capture the imbibition cycle as well.

Download Full-text

Iterative Reweighted Noninteger Norm Regularizing SVM for Gene Expression Data Classification

Computational and Mathematical Methods in Medicine ◽

10.1155/2013/768404 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10 ◽

Cited By ~ 5

Author(s):

Jianwei Liu ◽

Shuang Cheng Li ◽

Xionglin Luo

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Adaptive Learning ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Training Dataset ◽

Support Vector ◽

Data Set ◽

Cancer Data ◽

Public Data

Support vector machine is an effective classification and regression method that uses machine learning theory to maximize the predictive accuracy while avoiding overfitting of data.L2regularization has been commonly used. If the training dataset contains many noise variables,L1regularization SVM will provide a better performance. However, bothL1andL2are not the optimal regularization method when handing a large number of redundant values and only a small amount of data points is useful for machine learning. We have therefore proposed an adaptive learning algorithm using the iterative reweightedp-norm regularization support vector machine for 0 <p≤ 2. A simulated data set was created to evaluate the algorithm. It was shown that apvalue of 0.8 was able to produce better feature selection rate with high accuracy. Four cancer data sets from public data banks were used also for the evaluation. All four evaluations show that the new adaptive algorithm was able to achieve the optimal prediction error using apvalue less thanL1norm. Moreover, we observe that the proposedLppenalty is more robust to noise variables than theL1andL2penalties.

Download Full-text