scholarly journals Prediction and Analysis of Length of Stay Based on Nonlinear Weighted XGBoost Algorithm in Hospital

2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Yong Chen

An improved nonlinear weighted extreme gradient boosting (XGBoost) technique is developed to forecast length of stay for patients with imbalance data. The algorithm first chooses an effective technique for fitting the duration of stay and determining the distribution law and then optimizes the negative log likelihood loss function using a heuristic nonlinear weighting method based on sample percentage. Theoretical and practical results reveal that, when compared to existing algorithms, the XGBoost method based on nonlinear weighting may achieve higher classification accuracy and better prediction performance, which is beneficial in treating more patients with fewer hospital beds.

2021 ◽  
pp. 22-37
Author(s):  
Han Gao ◽  
Pei Shan Fam ◽  
Lea Tien Tay ◽  
Heng Chin Low

Tree-based gradient boosting (TGB) models gain popularity in various areas due to their powerful prediction ability and fast processing speed. This study aims to compare the landslide spatial prediction performance of TGB models and non-tree-based machine learning (NML) models in Penang Island, Malaysia. Two specific instances of TGB models, eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM) and two specific instances of NML models, artificial neural network (ANN) and support vector machine (SVM), are applied to make predictions of landslide susceptibility. Feature selection and oversampling techniques are considered to improve the prediction performance as well. The results are analyzed and discussed mainly based on receiver operating characteristic (ROC) curves as well as the area under the curves (AUC). The results show that TGB models give better prediction performance compared to NML models, no matter what the sample size is. The TGB models’ performances are improved when training with the dataset considering either feature selection or oversampling techniques. The highest AUC value of 0.9525 is obtained from the combination of XGBoost and SMOTE. The landslide susceptibility maps (LSMs) produced by XGBoost and LightGBM can provide valuable information in landslide management and mitigation in Penang Island, Malaysia.


Symmetry ◽  
2019 ◽  
Vol 11 (1) ◽  
pp. 89 ◽  
Author(s):  
Tuong Le ◽  
Sung Baik

Recently, a standard dataset namely SCADI (Self-Care Activities Dataset) based on the International Classification of Functioning, Disability, and Health for Children and Youth framework for self-care problems identification of children with physical and motor disabilities was introduced. This is a very interesting, important and challenging topic due to its usefulness in medical diagnosis. This study proposes a robust framework using a sampling technique and extreme gradient boosting (FSX) to improve the prediction performance for the SCADI dataset. The proposed framework first converts the original dataset to a new dataset with a smaller number of dimensions. Then, our proposed framework balances the new dataset in the previous step using oversampling techniques with different ratios. Next, extreme gradient boosting was used to diagnose the problems. The experiments in terms of prediction performance and feature importance were conducted to show the effectiveness of FSX as well as to analyse the results. The experimental results show that FSX that uses the Synthetic Minority Over-sampling Technique (SMOTE) for the oversampling module outperforms the ANN (Artificial Neural Network) -based approach, Support vector machine (SVM) and Random Forest for the SCADI dataset. The overall accuracy of the proposed framework reaches 85.4%, a pretty high performance, which can be used for self-care problem classification in medical diagnosis.


Entropy ◽  
2021 ◽  
Vol 23 (4) ◽  
pp. 477
Author(s):  
Wei-Jen Chen ◽  
Mao-Jhen Jhou ◽  
Tian-Shyug Lee ◽  
Chi-Jie Lu

The sports market has grown rapidly over the last several decades. Sports outcomes prediction is an attractive sports analytic challenge as it provides useful information for operations in the sports market. In this study, a hybrid basketball game outcomes prediction scheme is developed for predicting the final score of the National Basketball Association (NBA) games by integrating five data mining techniques, including extreme learning machine, multivariate adaptive regression splines, k-nearest neighbors, eXtreme gradient boosting (XGBoost), and stochastic gradient boosting. Designed features are generated by merging different game-lags information from fundamental basketball statistics and used in the proposed scheme. This study collected data from all the games of the NBA 2018–2019 seasons. There are 30 teams in the NBA and each team play 82 games per season. A total of 2460 NBA game data points were collected. Empirical results illustrated that the proposed hybrid basketball game prediction scheme achieves high prediction performance and identifies suitable game-lag information and relevant game features (statistics). Our findings suggested that a two-stage XGBoost model using four pieces of game-lags information achieves the best prediction performance among all competing models. The six designed features, including averaged defensive rebounds, averaged two-point field goal percentage, averaged free throw percentage, averaged offensive rebounds, averaged assists, and averaged three-point field goal attempts, from four game-lags have a greater effect on the prediction of final scores of NBA games than other game-lags. The findings of this study provide relevant insights and guidance for other team or individual sports outcomes prediction research.


2020 ◽  
Author(s):  
Yong-Yeon Jo ◽  
Jai Hong Han ◽  
Hyun Woo Park ◽  
Hyojung Jung ◽  
Jaedong Lee ◽  
...  

BACKGROUND Postoperative length of stay is a key indicator in the management of medical resources and an indirect parameter of the incidence of surgical complications and recovery of systemic conditions in cancer surgery. To our knowledge, machine learning models have not been used to predict prolonged length of stay after cancer surgery using extensive medical information. OBJECTIVE To develop a prediction model for prolonged length of stay after cancer surgery using a machine learning approach. METHODS In our retrospective study, electronic medical records (EHR) of 42,751 patients who underwent primary surgery for 17 types of cancer from January 1, 2000 to December 31, 2017, sourced from a single cancer center, were used. Those records include various variables such as surgical factors, cancer factors, underlying diseases, functional laboratory assessments, general assessments, medications, and social factors. To predict prolonged length of stay after cancer surgery, we employed extreme gradient boosting classifier, multiple layer perceptron, and logistic regression models. Prolonged postoperative length of stay for cancer is defined as bed-days of the group accounting for top 50% of the distribution of bed-days by cancer type. RESULTS In the prediction of prolonged length of stay after cancer surgery, extreme gradient boosting classifier models demonstrate excellent performance for kidney and bladder cancer surgeries (area under the receiver operating characteristic curve (AUC) > 0.85). A moderate performance (AUC: 0.70–0.85) was observed for stomach, breast, colon, thyroid, prostate, cervix uteri, corpus uteri, and oral cancers. For stomach, breast, colon, thyroid, and lung cancers, with more than 4000 cases, the extreme gradient boosting classifier model outperformed the other models. We identified risk variables for the prediction of prolonged postoperative length of stay for each cancer, and the importance of the variables differed depending on the cancer type. After we added operative time to the models trained on preoperative factors, the models generally outperformed the corresponding models using only preoperative variables. CONCLUSIONS A machine learning approach using EHR may improve the prediction of prolonged length of stay after primary cancer surgery. This algorithm may help in a more effective allocation of medical resources in cancer surgery. CLINICALTRIAL This study was approved by the institutional review board of the National Cancer Center-Korea, with a waiver for written informed consent (NCC-2018-0113).


10.2196/23147 ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. e23147
Author(s):  
Yong-Yeon Jo ◽  
JaiHong Han ◽  
Hyun Woo Park ◽  
Hyojung Jung ◽  
Jae Dong Lee ◽  
...  

Background Postoperative length of stay is a key indicator in the management of medical resources and an indirect predictor of the incidence of surgical complications and the degree of recovery of the patient after cancer surgery. Recently, machine learning has been used to predict complex medical outcomes, such as prolonged length of hospital stay, using extensive medical information. Objective The objective of this study was to develop a prediction model for prolonged length of stay after cancer surgery using a machine learning approach. Methods In our retrospective study, electronic health records (EHRs) from 42,751 patients who underwent primary surgery for 17 types of cancer between January 1, 2000, and December 31, 2017, were sourced from a single cancer center. The EHRs included numerous variables such as surgical factors, cancer factors, underlying diseases, functional laboratory assessments, general assessments, medications, and social factors. To predict prolonged length of stay after cancer surgery, we employed extreme gradient boosting classifier, multilayer perceptron, and logistic regression models. Prolonged postoperative length of stay for cancer was defined as bed-days of the group of patients who accounted for the top 50% of the distribution of bed-days by cancer type. Results In the prediction of prolonged length of stay after cancer surgery, extreme gradient boosting classifier models demonstrated excellent performance for kidney and bladder cancer surgeries (area under the receiver operating characteristic curve [AUC] >0.85). A moderate performance (AUC 0.70-0.85) was observed for stomach, breast, colon, thyroid, prostate, cervix uteri, corpus uteri, and oral cancers. For stomach, breast, colon, thyroid, and lung cancers, with more than 4000 cases each, the extreme gradient boosting classifier model showed slightly better performance than the logistic regression model, although the logistic regression model also performed adequately. We identified risk variables for the prediction of prolonged postoperative length of stay for each type of cancer, and the importance of the variables differed depending on the cancer type. After we added operative time to the models trained on preoperative factors, the models generally outperformed the corresponding models using only preoperative variables. Conclusions A machine learning approach using EHRs may improve the prediction of prolonged length of hospital stay after primary cancer surgery. This algorithm may help to provide a more effective allocation of medical resources in cancer surgery.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. e14530-e14530
Author(s):  
Petri Bono ◽  
Jussi Ekström ◽  
Matti K Karvonen ◽  
Jami Mandelin ◽  
Jussi Koivunen

e14530 Background: Bexmarilimab, an investigational immunotherapeutic antibody targeting Clever-1, is currently investigated in phase I/II MATINS study (NCT03733990) for advanced solid tumors. Machine learning (ML) based models combining extensive data could be generated to predict treatment responses to this first-in-class macrophage checkpoint inhibitor. Methods: 58 baseline features from 30 patients included in the part 1 of phase I/II MATINS trial were included in ML modelling. Seven patients were classified as benefitting from the therapy by RECIST 1.1 (PR or SD response in target or non-target lesions). Initial feature selection was done using a combination of domain knowledge and removal of features with several missing values resulting in 20 clinically relevant features from 25 patients. The remaining data was standardized and feature selection using variance analysis (ANOVA) based on F-values between response and features was performed. With this approach, the number of features could be further reduced as the prediction performance increased until the most important features were included in the model. Several prediction models were trained, and prediction performance evaluated using leave-one-out cross-validation (LOOCV), with and without SMOTE oversampling of the positive class of the training data inside each LOOCV fold. In LOOCV the prediction model was trained 25 times. Stacked meta classifier with SMOTE oversampling combining three classifiers: elastic-net logistic regression, random forest and extreme gradient boosting was chosen as the best performing prediction model. Results: Seven baseline features were associated with bexmarilimab treatment benefit. Increasing bexmarilimab dose and high tumor FoxP3 cells showed positive benefit. On contrary, high baseline blood neutrophils, CD4, T-cells, B-cells, and CXCL10 indicated negative relationship to the treatment benefit. The ML model trained with these seven features performed well in LOOCV as 6/7 benefitting and 16/18 non-benefitting were classified correctly, and all considered classification performance metrics were good. In feature importance analysis, low baseline CXCL10 and neutrophils were characterized as the most important predictors for treatment benefit with values of 0.19 and 0.16. Conclusions: This study highlights possibility of using ML models in predicting treatment benefit for novel cancer drugs such as bexmarilimab and boost the clinical development. These findings are in line of expected immune activation of bexmarilimab treatment. The generated ML models should be further validated in a larger patient cohort. Clinical trial information: NCT03733990.


2019 ◽  
Author(s):  
Kasper Van Mens ◽  
Joran Lokkerbol ◽  
Richard Janssen ◽  
Robert de Lange ◽  
Bea Tiemens

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.


Author(s):  
Mohammad Hamim Zajuli Al Faroby ◽  
Mohammad Isa Irawan ◽  
Ni Nyoman Tri Puspaningsih

Protein Interaction Analysis (PPI) can be used to identify proteins that have a supporting function on the main protein, especially in the synthesis process. Insulin is synthesized by proteins that have the same molecular function covering different but mutually supportive roles. To identify this function, the translation of Gene Ontology (GO) gives certain characteristics to each protein. This study purpose to predict proteins that interact with insulin using the centrality method as a feature extractor and extreme gradient boosting as a classification algorithm. Characteristics using the centralized method produces  features as a central function of protein. Classification results are measured using measurements, precision, recall and ROC scores. Optimizing the model by finding the right parameters produces an accuracy of  and a ROC score of . The prediction model produced by XGBoost has capabilities above the average of other machine learning methods.


2021 ◽  
Vol 13 (5) ◽  
pp. 1021
Author(s):  
Hu Ding ◽  
Jiaming Na ◽  
Shangjing Jiang ◽  
Jie Zhu ◽  
Kai Liu ◽  
...  

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.


Author(s):  
Irfan Ullah Khan ◽  
Nida Aslam ◽  
Malak Aljabri ◽  
Sumayh S. Aljameel ◽  
Mariam Moataz Aly Kamaleldin ◽  
...  

The COVID-19 outbreak is currently one of the biggest challenges facing countries around the world. Millions of people have lost their lives due to COVID-19. Therefore, the accurate early detection and identification of severe COVID-19 cases can reduce the mortality rate and the likelihood of further complications. Machine Learning (ML) and Deep Learning (DL) models have been shown to be effective in the detection and diagnosis of several diseases, including COVID-19. This study used ML algorithms, such as Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and K-Nearest Neighbor (KNN) and DL model (containing six layers with ReLU and output layer with sigmoid activation), to predict the mortality rate in COVID-19 cases. Models were trained using confirmed COVID-19 patients from 146 countries. Comparative analysis was performed among ML and DL models using a reduced feature set. The best results were achieved using the proposed DL model, with an accuracy of 0.97. Experimental results reveal the significance of the proposed model over the baseline study in the literature with the reduced feature set.


Sign in / Sign up

Export Citation Format

Share Document