Prediction of Prolonged Length of Hospital Stay After Cancer Surgery Using Machine Learning on Electronic Health Records: Retrospective Cross-sectional Study

Background Postoperative length of stay is a key indicator in the management of medical resources and an indirect predictor of the incidence of surgical complications and the degree of recovery of the patient after cancer surgery. Recently, machine learning has been used to predict complex medical outcomes, such as prolonged length of hospital stay, using extensive medical information. Objective The objective of this study was to develop a prediction model for prolonged length of stay after cancer surgery using a machine learning approach. Methods In our retrospective study, electronic health records (EHRs) from 42,751 patients who underwent primary surgery for 17 types of cancer between January 1, 2000, and December 31, 2017, were sourced from a single cancer center. The EHRs included numerous variables such as surgical factors, cancer factors, underlying diseases, functional laboratory assessments, general assessments, medications, and social factors. To predict prolonged length of stay after cancer surgery, we employed extreme gradient boosting classifier, multilayer perceptron, and logistic regression models. Prolonged postoperative length of stay for cancer was defined as bed-days of the group of patients who accounted for the top 50% of the distribution of bed-days by cancer type. Results In the prediction of prolonged length of stay after cancer surgery, extreme gradient boosting classifier models demonstrated excellent performance for kidney and bladder cancer surgeries (area under the receiver operating characteristic curve [AUC] >0.85). A moderate performance (AUC 0.70-0.85) was observed for stomach, breast, colon, thyroid, prostate, cervix uteri, corpus uteri, and oral cancers. For stomach, breast, colon, thyroid, and lung cancers, with more than 4000 cases each, the extreme gradient boosting classifier model showed slightly better performance than the logistic regression model, although the logistic regression model also performed adequately. We identified risk variables for the prediction of prolonged postoperative length of stay for each type of cancer, and the importance of the variables differed depending on the cancer type. After we added operative time to the models trained on preoperative factors, the models generally outperformed the corresponding models using only preoperative variables. Conclusions A machine learning approach using EHRs may improve the prediction of prolonged length of hospital stay after primary cancer surgery. This algorithm may help to provide a more effective allocation of medical resources in cancer surgery.

Download Full-text

Prediction of the prolonged length of hospital stay after cancer surgery using the machine learning on electronic health records: Retrospective cross-sectional study (Preprint)

10.2196/preprints.23147 ◽

2020 ◽

Author(s):

Yong-Yeon Jo ◽

Jai Hong Han ◽

Hyun Woo Park ◽

Hyojung Jung ◽

Jaedong Lee ◽

...

Keyword(s):

Machine Learning ◽

Length Of Stay ◽

Cancer Surgery ◽

Gradient Boosting ◽

Cancer Center ◽

Cancer Type ◽

Prolonged Length ◽

Extreme Gradient Boosting ◽

Postoperative Length ◽

After Cancer

BACKGROUND Postoperative length of stay is a key indicator in the management of medical resources and an indirect parameter of the incidence of surgical complications and recovery of systemic conditions in cancer surgery. To our knowledge, machine learning models have not been used to predict prolonged length of stay after cancer surgery using extensive medical information. OBJECTIVE To develop a prediction model for prolonged length of stay after cancer surgery using a machine learning approach. METHODS In our retrospective study, electronic medical records (EHR) of 42,751 patients who underwent primary surgery for 17 types of cancer from January 1, 2000 to December 31, 2017, sourced from a single cancer center, were used. Those records include various variables such as surgical factors, cancer factors, underlying diseases, functional laboratory assessments, general assessments, medications, and social factors. To predict prolonged length of stay after cancer surgery, we employed extreme gradient boosting classifier, multiple layer perceptron, and logistic regression models. Prolonged postoperative length of stay for cancer is defined as bed-days of the group accounting for top 50% of the distribution of bed-days by cancer type. RESULTS In the prediction of prolonged length of stay after cancer surgery, extreme gradient boosting classifier models demonstrate excellent performance for kidney and bladder cancer surgeries (area under the receiver operating characteristic curve (AUC) > 0.85). A moderate performance (AUC: 0.70–0.85) was observed for stomach, breast, colon, thyroid, prostate, cervix uteri, corpus uteri, and oral cancers. For stomach, breast, colon, thyroid, and lung cancers, with more than 4000 cases, the extreme gradient boosting classifier model outperformed the other models. We identified risk variables for the prediction of prolonged postoperative length of stay for each cancer, and the importance of the variables differed depending on the cancer type. After we added operative time to the models trained on preoperative factors, the models generally outperformed the corresponding models using only preoperative variables. CONCLUSIONS A machine learning approach using EHR may improve the prediction of prolonged length of stay after primary cancer surgery. This algorithm may help in a more effective allocation of medical resources in cancer surgery. CLINICALTRIAL This study was approved by the institutional review board of the National Cancer Center-Korea, with a waiver for written informed consent (NCC-2018-0113).

Download Full-text

Machine Learning Applications for the Prediction of Bone Cement Leakage in Percutaneous Vertebroplasty

Frontiers in Public Health ◽

10.3389/fpubh.2021.812023 ◽

2021 ◽

Vol 9 ◽

Author(s):

Wenle Li ◽

Jiaming Wang ◽

Wencai Liu ◽

Chan Xu ◽

Wanying Li ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Bone Cement ◽

Percutaneous Vertebroplasty ◽

Cement Leakage ◽

Gradient Boosting ◽

Vertebral Compression Fractures ◽

Good Prediction ◽

Extreme Gradient Boosting ◽

The Web

Background: Bone cement leakage is a common complication of percutaneous vertebroplasty and it could be life-threatening to some extent. The aim of this study was to develop a machine learning model for predicting the risk of cement leakage in patients with osteoporotic vertebral compression fractures undergoing percutaneous vertebroplasty. Furthermore, we developed an online calculator for clinical application.Methods: This was a retrospective study including 385 patients, who had osteoporotic vertebral compression fracture disease and underwent surgery at the Department of Spine Surgery, Liuzhou People's Hospital from June 2016 to June 2018. Combing the patient's clinical characteristics variables, we applied six machine learning (ML) algorithms to develop the predictive models, including logistic regression (LR), Gradient boosting machine (GBM), Extreme gradient boosting (XGB), Random Forest (RF), Decision Tree (DT) and Multilayer perceptron (MLP), which could predict the risk of bone cement leakage. We tested the results with ten-fold cross-validation, which calculated the Area Under Curve (AUC) of the six models and selected the model with the highest AUC as the excellent performing model to build the web calculator.Results: The results showed that Injection volume of bone cement, Surgery time and Multiple vertebral fracture were all independent predictors of bone cement leakage by using multivariate logistic regression analysis in the 385 observation subjects. Furthermore, Heatmap revealed the relative proportions of the 15 clinical variables. In bone cement leakage prediction, the AUC of the six ML algorithms ranged from 0.633 to 0.898, while the RF model had an AUC of 0.898 and was used as the best performing ML Web calculator (https://share.streamlit.io/liuwencai0/pvp_leakage/main/pvp_leakage) was developed to estimate the risk of bone cement leakage that each patient undergoing vertebroplasty.Conclusion: It achieved a good prediction for the occurrence of bone cement leakage with our ML model. The Web calculator concluded based on RF model can help orthopedist to make more individual and rational clinical strategies.

Download Full-text

Using machine learning to improve risk prediction in durable left ventricular assist devices

PLoS ONE ◽

10.1371/journal.pone.0247866 ◽

2021 ◽

Vol 16 (3) ◽

pp. e0247866

Author(s):

Arman Kilic ◽

Daniel Dochtermann ◽

Rema Padman ◽

James K. Miller ◽

Artur Dubrawski

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Left Ventricular ◽

Ventricular Assist Devices ◽

Gradient Boosting ◽

Left Ventricular Assist Devices ◽

Ventricular Assist ◽

Assist Devices ◽

Extreme Gradient Boosting ◽

Left Ventricular Assist

Risk models have historically displayed only moderate predictive performance in estimating mortality risk in left ventricular assist device therapy. This study evaluated whether machine learning can improve risk prediction for left ventricular assist devices. Primary durable left ventricular assist devices reported in the Interagency Registry for Mechanically Assisted Circulatory Support between March 1, 2006 and December 31, 2016 were included. The study cohort was randomly divided 3:1 into training and testing sets. Logistic regression and machine learning models (extreme gradient boosting) were created in the training set for 90-day and 1-year mortality and their performance was evaluated after bootstrapping with 1000 replications in the testing set. Differences in model performance were also evaluated in cases of concordance versus discordance in predicted risk between logistic regression and extreme gradient boosting as defined by equal size patient tertiles. A total of 16,120 patients were included. Calibration metrics were comparable between logistic regression and extreme gradient boosting. C-index was improved with extreme gradient boosting (90-day: 0.707 [0.683–0.730] versus 0.740 [0.717–0.762] and 1-year: 0.691 [0.673–0.710] versus 0.714 [0.695–0.734]; each p<0.001). Net reclassification index analysis similarly demonstrated an improvement of 48.8% and 36.9% for 90-day and 1-year mortality, respectively, with extreme gradient boosting (each p<0.001). Concordance in predicted risk between logistic regression and extreme gradient boosting resulted in substantially improved c-index for both logistic regression and extreme gradient boosting (90-day logistic regression 0.536 versus 0.752, 1-year logistic regression 0.555 versus 0.726, 90-day extreme gradient boosting 0.623 versus 0.772, 1-year extreme gradient boosting 0.613 versus 0.742, each p<0.001). These results demonstrate that machine learning can improve risk model performance for durable left ventricular assist devices, both independently and as an adjunct to logistic regression.

Download Full-text

Machine Learning-Based Three-Month Outcome Prediction in Acute Ischemic Stroke: A Single Cerebrovascular-Specialty Hospital Study in South Korea

Diagnostics ◽

10.3390/diagnostics11101909 ◽

2021 ◽

Vol 11 (10) ◽

pp. 1909

Author(s):

Dougho Park ◽

Eunhwan Jeong ◽

Haejong Kim ◽

Hae Wook Pyun ◽

Haemin Kim ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Ischemic Stroke ◽

Acute Ischemic Stroke ◽

Functional Outcome ◽

Outcome Prediction ◽

Prediction Models ◽

Gradient Boosting ◽

Support Vector ◽

Extreme Gradient Boosting

Background: Functional outcomes after acute ischemic stroke are of great concern to patients and their families, as well as physicians and surgeons who make the clinical decisions. We developed machine learning (ML)-based functional outcome prediction models in acute ischemic stroke. Methods: This retrospective study used a prospective cohort database. A total of 1066 patients with acute ischemic stroke between January 2019 and March 2021 were included. Variables such as demographic factors, stroke-related factors, laboratory findings, and comorbidities were utilized at the time of admission. Five ML algorithms were applied to predict a favorable functional outcome (modified Rankin Scale 0 or 1) at 3 months after stroke onset. Results: Regularized logistic regression showed the best performance with an area under the receiver operating characteristic curve (AUC) of 0.86. Support vector machines represented the second-highest AUC of 0.85 with the highest F1-score of 0.86, and finally, all ML models applied achieved an AUC > 0.8. The National Institute of Health Stroke Scale at admission and age were consistently the top two important variables for generalized logistic regression, random forest, and extreme gradient boosting models. Conclusions: ML-based functional outcome prediction models for acute ischemic stroke were validated and proven to be readily applicable and useful.

Download Full-text

Machine Learning for the Prediction of Progression in Patients with Acute Kidney Injury in Critical Care

10.21203/rs.3.rs-412422/v1 ◽

2021 ◽

Author(s):

Lifan Zhang ◽

Canzheng Wei ◽

Yunxia Feng ◽

Aijia Ma ◽

Yan Kang

Keyword(s):

Machine Learning ◽

Acute Kidney Injury ◽

Logistic Regression ◽

Critical Care ◽

Intensive Care ◽

Logistic Regression Model ◽

Kidney Injury ◽

Gradient Boosting ◽

Extreme Gradient Boosting ◽

Stage 1

Abstract Background: Acute kidney injury (AKI) is a serve and harmful syndrome in the intensive care unit. The purpose of this study is to develop a prediction model that predict whether patients with AKI stage 1/2 will progress to AKI stage 3. Methods: Patients with AKI stage 1/2, when they were ﬁrst diagnosed with AKI in the Medical Information Mart for Intensive Care (MIMIC-III), were included. We excluded patients who had underwent RRT or progressed to AKI stage 3 within 72 hours of the ﬁrst AKI diagnosis. We also excluded patients with chronic kidney disease (CKD). We used the Logistic regression and machine learning extreme gradient boosting (XGBoost) to build two models which can predict patients who will progress to AKI stage 3. Established models were evaluated by cross-validation, receiver operating characteristic curve (ROC), and precision-recall curves (PRC). Results: We included 25711 patients, of whom 2130 (8.3%) progressed to AKI stage 3. Creatinine, multiple organ failure syndromes (MODS), blood urea nitrogen (BUN), sepsis, and respiratory failure were the most important in AKI progression prediction. The XGBoost model has a better performance than the Logistic regression model on predicting AKI stage 3 progression (AU-ROC, 0.926; 95%CI, 0.917 to 0.931 vs. 0.784; 95%CI, 0.771 to 0.796, respectively). Conclusions: The XGboost model can better identify patients with AKI progression than Logistic regression model. Machine learning techniques may improve predictive modeling in medical research. Keywords: Acute kidney injury; Critical care; Logistic Models; Extreme gradient boosting

Download Full-text

Machine Learning Models of Acute Kidney Injury Prediction in Acute Pancreatitis Patients

Gastroenterology Research and Practice ◽

10.1155/2020/3431290 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Cheng Qu ◽

Lin Gao ◽

Xian-qiang Yu ◽

Mei Wei ◽

Guo-quan Fang ◽

...

Keyword(s):

Machine Learning ◽

Acute Kidney Injury ◽

Acute Pancreatitis ◽

Logistic Regression ◽

Kidney Injury ◽

Gradient Boosting ◽

Support Vector ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Background. Acute kidney injury (AKI) has long been recognized as a common and important complication of acute pancreatitis (AP). In the study, machine learning (ML) techniques were used to establish predictive models for AKI in AP patients during hospitalization. This is a retrospective review of prospectively collected data of AP patients admitted within one week after the onset of abdominal pain to our department from January 2014 to January 2019. Eighty patients developed AKI after admission (AKI group) and 254 patients did not (non-AKI group) in the hospital. With the provision of additional information such as demographic characteristics or laboratory data, support vector machine (SVM), random forest (RF), classification and regression tree (CART), and extreme gradient boosting (XGBoost) were used to build models of AKI prediction and compared to the predictive performance of the classic model using logistic regression (LR). XGBoost performed best in predicting AKI with an AUC of 91.93% among the machine learning models. The AUC of logistic regression analysis was 87.28%. Present findings suggest that compared to the classical logistic regression model, machine learning models using features that can be easily obtained at admission had a better performance in predicting AKI in the AP patients.

Download Full-text

A machine learning-based prediction of hospital mortality in patients with postoperative sepsis

10.21203/rs.2.24188/v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Ren-qi Yao ◽

Xin Jin ◽

Guo-wei Wang ◽

Yue Yu ◽

Guo-sheng Wu ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Hospital Mortality ◽

Regression Model ◽

Logistic Regression Model ◽

Gradient Boosting ◽

Stepwise Logistic Regression ◽

Postoperative Sepsis ◽

Extreme Gradient Boosting ◽

Stepwise Logistic Regression Model

Abstract Background: The incidence of postoperative sepsis is continually increased, while few studies have specifically focused on the risk factors and clinical outcomes associated with the development of sepsis after surgical procedures. The present study aimed to develop a mathematical model for predicting the in-hospital mortality among patients with postoperative sepsis.Methods: Surgical patients in Medical Information Mart for Intensive Care (MIMIC-III) database who simultaneously fulfilled Sepsis 3.0 as well as Agency for Healthcare Research and Quality (AHRQ) criteria during ICU admission were incorporated. We employed both extreme gradient boosting (XGBoost) and stepwise logistic regression model to predict in-hospital mortality among included patients with postoperative sepsis. Consequently, model performance was assessed from the angles of discrimination and calibration.Results: We included 3713 patients who fulfilled our inclusion criteria, in which 397 (10.7%) patients died during hospitalization, while 3316 (89.3%) of them survived through discharge. Fluid-electrolyte disturbance, coagulopathy, renal replacement therapy (RRT), urine output, and cardiovascular surgery were important features related to the in-hospital mortality. The XGBoost model had a better performance in both discriminatory ability (c-statistics, 0.835 [95% CI, 0.786 to 0.877] vs. c-statistics, 0.737 [95% CI, 0.688 to 0.786]) and goodness of fit (visualized by calibration curve) compared to the stepwise logistic regression model. Conclusion: XGBoost model appears to be a better performance in predicting hospital mortality among postoperative septic patients compared to the conventional stepwise logistic regression model. Machine learning-based algorithm might have significant application in the development of early warning system for septic patients following major operations.

Download Full-text

Predicting Adverse Drug Events in Chinese Pediatric Inpatients With the Associated Risk Factors: A Machine Learning Study

Frontiers in Pharmacology ◽

10.3389/fphar.2021.659099 ◽

2021 ◽

Vol 12 ◽

Author(s):

Ze Yu ◽

Huanhuan Ji ◽

Jianwen Xiao ◽

Ping Wei ◽

Lin Song ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Adverse Drug Events ◽

Length Of Hospital Stay ◽

Gradient Boosting ◽

Multiple Risk Factors ◽

Extreme Gradient Boosting ◽

Novel Method ◽

Associated Risk Factors ◽

Medical University

The aim of this study was to apply machine learning methods to deeply explore the risk factors associated with adverse drug events (ADEs) and predict the occurrence of ADEs in Chinese pediatric inpatients. Data of 1,746 patients aged between 28 days and 18 years (mean age = 3.84 years) were included in the study from January 1, 2013, to December 31, 2015, in the Children’s Hospital of Chongqing Medical University. There were 247 cases of ADE occurrence, of which the most common drugs inducing ADEs were antibacterials. Seven algorithms, including eXtreme Gradient Boosting (XGBoost), CatBoost, AdaBoost, LightGBM, Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and TPOT, were used to select the important risk factors, and GBDT was chosen to establish the prediction model with the best predicting abilities (precision = 44%, recall = 25%, F1 = 31.88%). The GBDT model has better performance than Global Trigger Tools (GTTs) for ADE prediction (precision 44 vs. 13.3%). In addition, multiple risk factors were identified via GBDT, such as the number of trigger true (TT) (+), number of doses, BMI, number of drugs, number of admission, height, length of hospital stay, weight, age, and number of diagnoses. The influencing directions of the risk factors on ADEs were displayed through Shapley Additive exPlanations (SHAP). This study provides a novel method to accurately predict adverse drug events in Chinese pediatric inpatients with the associated risk factors, which may be applicable in clinical practice in the future.

Download Full-text

Using a Multiclass Machine Learning Model to Predict the Outcome of Acute Ischemic Stroke Requiring Reperfusion Therapy

Diagnostics ◽

10.3390/diagnostics11010080 ◽

2021 ◽

Vol 11 (1) ◽

pp. 80

Author(s):

I-Min Chiu ◽

Wun-Huei Zeng ◽

Chi-Yung Cheng ◽

Shih-Hsuan Chen ◽

Chun-Hung Richard Lin

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Ischemic Stroke ◽

Acute Ischemic Stroke ◽

Outcome Prediction ◽

Reperfusion Therapy ◽

Gradient Boosting ◽

Stroke Patients ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Prediction of functional outcome in ischemic stroke patients is useful for clinical decisions. Previous studies mostly elaborate on the prediction of favorable outcomes. Miserable outcomes, which are usually defined as modified Rankin Scale (mRS) 5–6, should be considered as well before further invasive intervention. By using a machine learning algorithm, we aimed to develop a multiclass classification model for outcome prediction in acute ischemic stroke patients requiring reperfusion therapy. This was a retrospective study performed at a stroke medical center in Taiwan. Patients with acute ischemic stroke who visited between January 2016 and December 2019 and who were candidates for reperfusion therapy were included. Clinical outcomes were classified as favorable outcome, intermediate outcome, and miserable outcome. We developed four different multiclass machine learning models (Logistic Regression, Supportive Vector Machine, Random Forest, and Extreme Gradient Boosting) to predict clinical outcomes and compared their performance to the DRAGON score. A sample of 590 patients was included in this study. Of them, 180 (30.5%) had favorable outcomes and 152 (25.8%) had miserable outcomes. All selected machine learning models outperformed the DRAGON score on accuracy of outcome prediction (Logistic Regression: 0.70, Supportive Vector Machine: 0.67, Random Forest: 0.69, and Extreme Gradient Boosting: 0.67, vs. DRAGON: 0.51, p < 0.001). Among all selected models, Logistic Regression also had a better performance than the DRAGON score on positive predictive value, sensitivity, and specificity. Compared with the DRAGON score, the multiclass machine learning approach showed better performance on the prediction of the 3-month functional outcome of acute ischemic stroke patients requiring reperfusion therapy.

Download Full-text

A Comparison of Text Classification Techniques Applied to Indonesian Text Dataset

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit195629 ◽

2019 ◽

pp. 217-222

Author(s):

Umniy Salamah

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Logistic Regression ◽

Learning Algorithm ◽

Computational Time ◽

Gradient Boosting ◽

Multiple Parameters ◽

Best Value ◽

Extreme Gradient Boosting ◽

It Organization

In organization, statement contained opinion and complaint to a service or program by it organization. can be proceed using machine learning and the result can be used by organization to improve and enhance their quality. This research attempted to classify the reports from social media based on complaint and non-complaint using machine learning algorithm named Logistic regression (LR) and eXtreme Gradient Boosting (XGBoost). Logistic Regression model using CountVectorizer feature extraction and TfidfVectorizer. Moreover, the XGBoost algorithm uses multiple parameters so that it can be improved by tuning the parameters, i.e. eta or learning rate, gamma, max_depth, min_child_weight, subsample, colsample_bytree and alpha. As the result, the best value for XGBoost with parameter are 'reg_alpha': 0.01, 'colsample_bytree': 0.9, 'learning_rate': 0.5, 'min_child_weight': 1, 'subsample': 0.8, 'max_depth': 3, 'gamma': 0.0, in wich the computational time is 13870.012468 and the best accuracy that achieved is 0.927943760984. Furthermore, the performance evaluation results for Logistic Regression using TfidfVectorizer and CountVectorizer feature extraction are 0.9181 and 0.9356.

Download Full-text