scholarly journals The Random Forest Model Has the Best Accuracy Among the Four Pressure Ulcer Prediction Models Using Machine Learning Algorithms

2021 ◽  
Vol Volume 14 ◽  
pp. 1175-1187
Author(s):  
Jie Song ◽  
Yuan Gao ◽  
Pengbin Yin ◽  
Yi Li ◽  
Yang Li ◽  
...  
2019 ◽  
Author(s):  
Ruilin Li ◽  
Xinyin Han ◽  
Liping Sun ◽  
Yannan Feng ◽  
Xiaolin Sun ◽  
...  

AbstractPrecisely predicting the required pre-surgery blood volume (PBV) in surgical patients is a formidable challenge in China. Inaccurate estimation is associate with excessive costs, postponed surgeries and adverse outcome after surgery due to in sufficient supply or inventory. This study aimed to predict required PBV based on machine learning techniques. 181,027 medical documents over 6 years were cleaned and finally obtained 92,057 blood transfusion records. The blood transfusion and surgery related factors of perioperative patients, surgeons experience volumes and the actual volumes of transfused RBCs were extracted. 6 machine learning algorithms were used to build prediction models. The surgery patients received allogenic RBCs or without transfusion, had total volume less than 10 units, or had the latest laboratory examinations of pre-surgery within 7 days were included, providing 118,823 data points. 39 predictive factors related to the RBCs transfusion were identified. Random forest model was selected to predict the required PBV of RBCs with 72.9% accuracy and strikingly improved the accuracy by 30.4% compared with surgeons experience, where 90% of data was used for training. We tested and demonstrated that both the data-driven models and the random forest model achieved higher accuracy than surgeons experience. Furthermore, we developed a computational tool, PTRBC, to precisely estimate the required PBV in surgical patients and we believe this tool will find more applications in assisting clinician decisions, not only confined to making accurate pre-surgery blood requirement predicting.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Silvia Alonso ◽  
Sara Cáceres ◽  
Daniel Vélez ◽  
Luis Sanz ◽  
Gema Silvan ◽  
...  

AbstractSteroidal hormone interaction in pregnancy is crucial for adequate fetal evolution and preparation for childbirth and extrauterine life. Estrone sulphate, estriol, progesterone and cortisol play important roles in the initiation of labour mechanism at the start of contractions and cervical effacement. However, their interaction remains uncertain. Although several studies regarding the hormonal mechanism of labour have been reported, the prediction of date of birth remains a challenge. In this study, we present for the first time machine learning algorithms for the prediction of whether spontaneous labour will occur from week 37 onwards. Estrone sulphate, estriol, progesterone and cortisol were analysed in saliva samples collected from 106 pregnant women since week 34 by enzyme-immunoassay (EIA) techniques. We compared a random forest model with a traditional logistic regression over a dataset constructed with the values observed of these measures. We observed that the results, evaluated in terms of accuracy and area under the curve (AUC) metrics, are sensibly better in the random forest model. For this reason, we consider that machine learning methods contribute in an important way to the obstetric practice.


2021 ◽  
Vol 11 (12) ◽  
pp. 1271
Author(s):  
Jaehyeong Cho ◽  
Jimyung Park ◽  
Eugene Jeong ◽  
Jihye Shin ◽  
Sangjeong Ahn ◽  
...  

Background: Several prediction models have been proposed for preoperative risk stratification for mortality. However, few studies have investigated postoperative risk factors, which have a significant influence on survival after surgery. This study aimed to develop prediction models using routine immediate postoperative laboratory values for predicting postoperative mortality. Methods: Two tertiary hospital databases were used in this research: one for model development and another for external validation of the resulting models. The following algorithms were utilized for model development: LASSO logistic regression, random forest, deep neural network, and XGBoost. We built the models on the lab values from immediate postoperative blood tests and compared them with the SASA scoring system to demonstrate their efficacy. Results: There were 3817 patients who had immediate postoperative blood test values. All models trained on immediate postoperative lab values outperformed the SASA model. Furthermore, the developed random forest model had the best AUROC of 0.82 and AUPRC of 0.13, and the phosphorus level contributed the most to the random forest model. Conclusions: Machine learning models trained on routine immediate postoperative laboratory values outperformed previously published approaches in predicting 30-day postoperative mortality, indicating that they may be beneficial in identifying patients at increased risk of postoperative death.


2017 ◽  
Vol 41 (S1) ◽  
pp. S95-S96 ◽  
Author(s):  
V. De Luc ◽  
A. Bani Fatemi ◽  
N. Hettige

ObjectiveSuicide is a major concern for those afflicted by schizophrenia. Identifying patients at the highest risk for future suicide attempts remains a complex problem for psychiatric intervention. Machine learning models allow for the integration of many risk factors in order to build an algorithm that predicts which patients are likely to attempt suicide. Currently, it is unclear how to integrate previously identified risk factors into a clinically relevant predictive tool to estimate the probability of a patient with schizophrenia for attempting suicide.MethodsWe conducted a cross-sectional assessment on a sample of 345 participants diagnosed with schizophrenia spectrum disorders. Suicide attempters and non-attempters were clearly identified using the Columbia Suicide Severity Rating Scale (C-SSRS) and the Beck Suicide Ideation Scale (BSS). We developed two classification algorithms using a regularized regression and random forest model with sociocultural and clinical variables as features to train the models.ResultsBoth classification models performed similarly in identifying suicide attempters and non-attempters. Our regularized logistic regression model demonstrated an accuracy of 66% and an area under the curve (AUC) of 0.71, while the random forest model demonstrated 65% accuracy and an AUC of 0.67.ConclusionMachine learning algorithms offer a relatively successful method for incorporating many clinical features to predict individuals at risk for future suicide attempts. Increased performance of these models using clinically relevant variables offers the potential to facilitate early treatment and intervention to prevent future suicide attempts.Disclosure of interestThe authors have not supplied their declaration of competing interest.


2021 ◽  
Vol 13 (17) ◽  
pp. 3404
Author(s):  
Rong Tang ◽  
Yuting Zhao ◽  
Huilong Lin

Accurate estimation of the aboveground biomass (AGB) of grassland is a key link in understanding the regional carbon cycle. We used 501 aboveground measurements, 29 environmental variables, and machine learning algorithms to construct and verify a custom model of grassland biomass in the Headwater of the Yellow River (HYR) and selected the random forest model to analyze the temporal and spatial distribution characteristics and dynamic trends of the biomass in the HYR from 2001 to 2020. The research results show that: (1) the random forest model is superior to the other three models (R2val = 0.56, RMSEval = 51.3 g/m2); (2) the aboveground biomass in the HYR decreases spatially from southeast to northwest, and the annual average value and total values are 176.8 g/m2 and 20.73 Tg, respectively; (3) 69.51% of the area has shown an increasing trend and 30.14% of the area showed a downward trend, mainly concentrated in the southeast of Hongyuan County, the northeast of Aba County, and the north of Qumalai County. The research results can provide accurate spatial data and scientific basis for the protection of grassland resources in the HYR.


2021 ◽  
Vol 13 (4) ◽  
pp. 694
Author(s):  
Kyuhee Shin ◽  
Joon Jin Song ◽  
Wonbae Bang ◽  
GyuWon Lee

Traditional radar-based rainfall estimation is typically done by known functional relationships between the rainfall intensity (R) and radar measurables, such as R–Zh, R–(Zh, ZDR), etc. One of the biggest advantages of machine learning algorithms is the applicability to a non-linear relationship between a dependent variable and independent variables without any predefined relationships. We explored the potential use of two supervised machine learning methods (regression tree and random forest) in rainfall estimation using dual-polarization radar variables. The regression tree does not require normalization and scaling of data; however, this method is quite unstable since each split depends on the parent split. Since the random forest is an ensemble method of regression trees, it has less variability in prediction compared with regression trees, but consumes more computer resources. We considered several different configurations for machine learning algorithms with different sets of dependent and independent variables. The random forest model was appropriately tuned. In the test of variable importance, the specific differential phase (differential reflectivity) was the most important variable to predict the rainfall rate (residual that is the difference between the true rainfall rate and the one estimated from the R–Z relationship). The models were evaluated by 10-fold cross-validation. The best model was the random forest model using a residual with the non-classified training set. The results indicated that the machine learning algorithms outperformed the traditional R–Z relationship. Then, we applied the best machine learning model to an S-band dual-polarization radar (Mt. Myeonbong) and validated the result with ground rain gauges. The results of the application to radar data showed that the estimates of the residuals had spatial variability. The stratiform and weak rain areas had positive residuals while convective areas had negative residuals, indicating that the spatial error structure driven by the R–Z relationship was well captured by the model. The rainfall rates of all pixels over the study area were adjusted with the estimated residuals. The rainfall rates adjusted by residual showed excellent agreement with the rain gauge, especially at high rainfall rates.


2018 ◽  
Author(s):  
Liyan Pan ◽  
Guangjian Liu ◽  
Xiaojian Mao ◽  
Huixian Li ◽  
Jiexin Zhang ◽  
...  

BACKGROUND Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis—gonadotropin-releasing hormone (GnRH)–stimulation test or GnRH analogue (GnRHa)–stimulation test—is expensive and makes patients uncomfortable due to the need for repeated blood sampling. OBJECTIVE We aimed to combine multiple CPP–related features and construct machine learning models to predict response to the GnRHa-stimulation test. METHODS In this retrospective study, we analyzed clinical and laboratory data of 1757 girls who underwent a GnRHa test in order to develop XGBoost and random forest classifiers for prediction of response to the GnRHa test. The local interpretable model-agnostic explanations (LIME) algorithm was used with the black-box classifiers to increase their interpretability. We measured sensitivity, specificity, and area under receiver operating characteristic (AUC) of the models. RESULTS Both the XGBoost and random forest models achieved good performance in distinguishing between positive and negative responses, with the AUC ranging from 0.88 to 0.90, sensitivity ranging from 77.91% to 77.94%, and specificity ranging from 84.32% to 87.66%. Basal serum luteinizing hormone, follicle-stimulating hormone, and insulin-like growth factor-I levels were found to be the three most important factors. In the interpretable models of LIME, the abovementioned variables made high contributions to the prediction probability. CONCLUSIONS The prediction models we developed can help diagnose CPP and may be used as a prescreening tool before the GnRHa-stimulation test.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Matthijs Blankers ◽  
Louk F. M. van der Post ◽  
Jack J. M. Dekker

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.


Sensors ◽  
2018 ◽  
Vol 18 (10) ◽  
pp. 3532 ◽  
Author(s):  
Nicola Mansbridge ◽  
Jurgen Mitsch ◽  
Nicola Bollard ◽  
Keith Ellis ◽  
Giuliana Miguel-Pacheco ◽  
...  

Grazing and ruminating are the most important behaviours for ruminants, as they spend most of their daily time budget performing these. Continuous surveillance of eating behaviour is an important means for monitoring ruminant health, productivity and welfare. However, surveillance performed by human operators is prone to human variance, time-consuming and costly, especially on animals kept at pasture or free-ranging. The use of sensors to automatically acquire data, and software to classify and identify behaviours, offers significant potential in addressing such issues. In this work, data collected from sheep by means of an accelerometer/gyroscope sensor attached to the ear and collar, sampled at 16 Hz, were used to develop classifiers for grazing and ruminating behaviour using various machine learning algorithms: random forest (RF), support vector machine (SVM), k nearest neighbour (kNN) and adaptive boosting (Adaboost). Multiple features extracted from the signals were ranked on their importance for classification. Several performance indicators were considered when comparing classifiers as a function of algorithm used, sensor localisation and number of used features. Random forest yielded the highest overall accuracies: 92% for collar and 91% for ear. Gyroscope-based features were shown to have the greatest relative importance for eating behaviours. The optimum number of feature characteristics to be incorporated into the model was 39, from both ear and collar data. The findings suggest that one can successfully classify eating behaviours in sheep with very high accuracy; this could be used to develop a device for automatic monitoring of feed intake in the sheep sector to monitor health and welfare.


Author(s):  
Harsha A K

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.


Sign in / Sign up

Export Citation Format

Share Document