scholarly journals Retrospective Study on the Influencing Factors and Prediction of Hospitalization Expenses for Chronic Renal Failure in China Based on Random Forest and LASSO Regression

2021 ◽  
Vol 9 ◽  
Author(s):  
Pingping Dai ◽  
Weifu Chang ◽  
Zirui Xin ◽  
Haiwei Cheng ◽  
Wei Ouyang ◽  
...  

Aim: With the improvement in people's living standards, the incidence of chronic renal failure (CRF) is increasing annually. The increase in the number of patients with CRF has significantly increased pressure on China's medical budget. Predicting hospitalization expenses for CRF can provide guidance for effective allocation and control of medical costs. The purpose of this study was to use the random forest (RF) method and least absolute shrinkage and selection operator (LASSO) regression to predict personal hospitalization expenses of hospitalized patients with CRF and to evaluate related influencing factors.Methods: The data set was collected from the first page of data of the medical records of three tertiary first-class hospitals for the whole year of 2016. Factors influencing hospitalization expenses for CRF were analyzed. Random forest and least absolute shrinkage and selection operator regression models were used to establish a prediction model for the hospitalization expenses of patients with CRF, and comparisons and evaluations were carried out.Results: For CRF inpatients, statistically significant differences in hospitalization expenses were found for major procedures, medical payment method, hospitalization frequency, length of stay, number of other diagnoses, and number of procedures. The R2 of LASSO regression model and RF regression model are 0.6992 and 0.7946, respectively. The mean absolute error (MAE) and root mean square error (RMSE) of the LASSO regression model were 0.0268 and 0.043, respectively, and the MAE and RMSE of the RF prediction model were 0.0171 and 0.0355, respectively. In the RF model, and the weight of length of stay was the highest (0.730).Conclusions: The hospitalization expenses of patients with CRF are most affected by length of stay. The RF prediction model is superior to the LASSO regression model and can be used to predict the hospitalization expenses of patients with CRF. Health administration departments may consider formulating accurate individualized hospitalization expense reimbursement mechanisms accordingly.

2021 ◽  
Author(s):  
Nianyue Wu ◽  
Siru Liu ◽  
Haotian Zhang ◽  
Xiaomin Hou ◽  
Ping Zhang ◽  
...  

BACKGROUND The intensive care unit (ICU) length of stay is significant to evaluate the effect of cardiac surgical treatment inpatient. OBJECTIVE This research aims to accurately predict the ICU length of stay in patients with cardiac surgery. Methods: We used machine learning methods to construct the model, and the medical information mart for intensive care (MIMIC IV) database was used as the data source. A total of 7,567 patients were enrolled and the mean length of stay in the ICU was 3.12 days. A total of 126 predictors were included, and 44 important predictors were screened by least absolute shrinkage and selection operator (Lasso) regression. METHODS We used machine learning methods to construct the model, and the medical information mart for intensive care (MIMIC IV) database was used as the data source. A total of 7,567 patients were enrolled and the mean length of stay in the ICU was 3.12 days. A total of 126 predictors were included, and 44 important predictors were screened by least absolute shrinkage and selection operator (Lasso) regression. RESULTS The mean accuracy are 0.603 (95% confidence interval (CI): [0.602-0.604]), 0.687 (95% confidence interval (CI): [0.687-0.688]) and 0.688 (95% confidence interval (CI): [0.687-0.689]) for the logistic regression (LR) with all variables, the gradient boosted decision tree (GBDT) with important variables and the GBDT with all variables respectively. CONCLUSIONS The GBDT model with important predictors partly overestimated patients whose length of stay was less than 3 days and underestimated patients whose length of stay was longer than 3 days. But the better prediction performance of GBDT facilitates early intervention of ICU patients with a long period of hospitalization.


2021 ◽  
Vol 9 ◽  
Author(s):  
Qiao-Ying Xie ◽  
Ming-Wei Wang ◽  
Zu-Ying Hu ◽  
Cheng-Jian Cao ◽  
Cong Wang ◽  
...  

Aim: Metabolic syndrome (MS) screening is essential for the early detection of the occupational population. This study aimed to screen out biomarkers related to MS and establish a risk assessment and prediction model for the routine physical examination of an occupational population.Methods: The least absolute shrinkage and selection operator (Lasso) regression algorithm of machine learning was used to screen biomarkers related to MS. Then, the accuracy of the logistic regression model was further verified based on the Lasso regression algorithm. The areas under the receiving operating characteristic curves were used to evaluate the selection accuracy of biomarkers in identifying MS subjects with risk. The screened biomarkers were used to establish a logistic regression model and calculate the odds ratio (OR) of the corresponding biomarkers. A nomogram risk prediction model was established based on the selected biomarkers, and the consistency index (C-index) and calibration curve were derived.Results: A total of 2,844 occupational workers were included, and 10 biomarkers related to MS were screened. The number of non-MS cases was 2,189 and that of MS was 655. The area under the curve (AUC) value for non-Lasso and Lasso logistic regression was 0.652 and 0.907, respectively. The established risk assessment model revealed that the main risk biomarkers were absolute basophil count (OR: 3.38, CI:1.05–6.85), platelet packed volume (OR: 2.63, CI:2.31–3.79), leukocyte count (OR: 2.01, CI:1.79–2.19), red blood cell count (OR: 1.99, CI:1.80–2.71), and alanine aminotransferase level (OR: 1.53, CI:1.12–1.98). Furthermore, favorable results with C-indexes (0.840) and calibration curves closer to ideal curves indicated the accurate predictive ability of this nomogram.Conclusions: The risk assessment model based on the Lasso logistic regression algorithm helped identify MS with high accuracy in physically examining an occupational population.


2020 ◽  
Author(s):  
Qiao-Ying Xie ◽  
Ming-Wei Wang ◽  
Zu-Ying Hu ◽  
Yan-Ming Chu ◽  
Cheng-Jian Cao ◽  
...  

Abstract Background: Metabolic syndrome (MS) screening is important for the early detection of occupational population. This study aimed to screen out biomarkers related to MS and establish a risk assessment and prediction model for the routine physical examination of an occupational population.Methods: The least absolute shrinkage and selection operator (Lasso) regression algorithm of machine learning was used to screen biomarkers related to MS. Then, the accuracy of the logistic regression model was further verified based on the Lasso regression algorithm. Finally, the screened biomarkers were used to establish a logistic regression model and calculate the odds ratio (OR) of the corresponding biomarkers. Results: A total of 2844 occupational workers were included, and 10 biomarkers related to MS were screened. The area under the curve (AUC) value for non-Lasso and Lasso regression was 0.652 and 0.907, respectively. The established risk assessment model revealed that the main risk factors were basophil absolute count (OR: 3.38), platelet packed volume (OR: 2.63), leukocyte count (OR: 2.01), red blood cell count (OR: 1.99), and alanine aminotransferase level (OR: 1.53). Conclusion: The risk assessment model based on the Lasso regression algorithm helped identify Metabolic syndrome with high accuracy in physically examining an occupational population.


2020 ◽  
Vol 23 (2) ◽  
pp. 148-156
Author(s):  
Wei Wang ◽  
Bin Liu ◽  
Xiaoran Duan ◽  
Xiaolei Feng ◽  
Tuanwei Wang ◽  
...  

Objective: The aim of this study areto screen MicroRNAs (miRNAs) related to the prognosis of lung adenocarcinoma (LUAD) and to explore the possible molecular mechanisms. Methods: The data for a total of 535 patients with LUAD data were downloaded from The Cancer Genome Atlas (TCGA) database. The miRNAs for LUAD prognosis were screened by both Cox risk proportional regression model and Last Absolute Shrinkage and Selection Operator (LASSO) regression model. The performances of the models were verified by time-dependent Receiver Operating Characteristic (ROC) curve. The possible biological processes linked to the miRNAs’ target genes were analyzed by Gene Ontology (GO), Kyoto gene and genome encyclopedia (KEGG). Results: Among 127 differentially expressed miRNAs identified from the screening analysis, there are 111 up-regulated and 16 down-regulated miRNAs. Three of them, hsa-miR-1293, hsa-miR-490 and hsa-miR- 5571, were also significantly associated with the survival of the LUAD patients. The targets of the three miRNAs are significantly enriched in systemic lupus erythematosus pathways. Conclusion: Hsa-miR-1293, hsa-miR-490 and hsa-miR-5571 can be potentially used as novel biomarkers for the prognosis prediction of LUAD.


2020 ◽  
Vol 21 (7) ◽  
pp. 2517
Author(s):  
Jeong-An Gim ◽  
Yonghan Kwon ◽  
Hyun A Lee ◽  
Kyeong-Ryoon Lee ◽  
Soohyun Kim ◽  
...  

Tacrolimus is an immunosuppressive drug with a narrow therapeutic index and larger interindividual variability. We identified genetic variants to predict tacrolimus exposure in healthy Korean males using machine learning algorithms such as decision tree, random forest, and least absolute shrinkage and selection operator (LASSO) regression. rs776746 (CYP3A5) and rs1137115 (CYP2A6) are single nucleotide polymorphisms (SNPs) that can affect exposure to tacrolimus. A decision tree, when coupled with random forest analysis, is an efficient tool for predicting the exposure to tacrolimus based on genotype. These tools are helpful to determine an individualized dose of tacrolimus.


Author(s):  
Omar Al-Sahili ◽  
Haitham Al-Deek ◽  
Adrian Sandt ◽  
Alexander V. Mantzaris ◽  
John H. Rogers ◽  
...  

Illegal U-turns on freeways and toll roads are risky maneuvers that sometimes result in the turning vehicles causing various types of collisions or disturbances to approaching traffic. These illegal U-turn maneuvers can occur at traversable grass medians and emergency crossovers. Limited literature was found regarding the impact of illegal U-turns on these facilities. Therefore, to understand the roadway and median characteristics that could influence drivers’ propensity to commit illegal U-turns, a sequential modeling methodology was adopted. This methodology combined a Poisson regression model with a Least Absolute Shrinkage and Selection Operator (LASSO) regression procedure to predict the cited violations at traversable median segments. Additionally, a logistic regression model was developed to predict the probability of a cited violation at official use only emergency crossovers. These models included illegal U-turn citations and crashes for the Orlando and Miami metropolitan areas in Florida from 2011 to 2016. The findings indicated that the average distance between access points, median width, speed limit, segment length, and distance to nearest segment were significant in predicting cited violations at traversable medians. Furthermore, the distance to the nearest interchange, distance to the nearest adjacent crossover, and median width were significant in predicting the probability of a cited violation occurring at an emergency crossover. This study helps agencies to predict the locations of illegal U-turn violations and to prioritize roadways for possible treatment to minimize the potential risk of head-on or other collisions due to illegal U-turn events.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Marie Beth van Egmond ◽  
Gabriele Spini ◽  
Onno van der Galien ◽  
Arne IJpma ◽  
Thijs Veugen ◽  
...  

Abstract Background Recent developments in machine learning have shown its potential impact for clinical use such as risk prediction, prognosis, and treatment selection. However, relevant data are often scattered across different stakeholders and their use is regulated, e.g. by GDPR or HIPAA. As a concrete use-case, hospital Erasmus MC and health insurance company Achmea have data on individuals in the city of Rotterdam, which would in theory enable them to train a regression model in order to identify high-impact lifestyle factors for heart failure. However, privacy and confidentiality concerns make it unfeasible to exchange these data. Methods This article describes a solution where vertically-partitioned synthetic data of Achmea and of Erasmus MC are combined using Secure Multi-Party Computation. First, a secure inner join protocol takes place to securely determine the identifiers of the patients that are represented in both datasets. Then, a secure Lasso Regression model is trained on the securely combined data. The involved parties thus obtain the prediction model but no further information on the input data of the other parties. Results We implement our secure solution and describe its performance and scalability: we can train a prediction model on two datasets with 5000 records each and a total of 30 features in less than one hour, with a minimal difference from the results of standard (non-secure) methods. Conclusions This article shows that it is possible to combine datasets and train a Lasso regression model on this combination in a secure way. Such a solution thus further expands the potential of privacy-preserving data analysis in the medical domain.


Vaccines ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 1221
Author(s):  
Fan Hu ◽  
Ruijie Gong ◽  
Yexin Chen ◽  
Jinxin Zhang ◽  
Tian Hu ◽  
...  

Since China’s launch of the COVID-19 vaccination, the situation of the public, especially the mobile population, has not been optimistic. We investigated 782 factory workers for whether they would get a COVID-19 vaccine within the next 6 months. The participants were divided into a training set and a testing set for external validation conformed to a ratio of 3:1 with R software. The variables were screened by the Lead Absolute Shrinkage and Selection Operator (LASSO) regression analysis. Then, the prediction model, including important variables, used a multivariate logistic regression analysis and presented as a nomogram. The Receiver Operating Characteristic (ROC) curve, Kolmogorov–Smirnov (K-S) test, Lift test and Population Stability Index (PSI) were performed to test the validity and stability of the model and summarize the validation results. Only 45.54% of the participants had vaccination intentions, while 339 (43.35%) were unsure. Four of the 16 screened variables—self-efficacy, risk perception, perceived support and capability—were included in the prediction model. The results indicated that the model has a high predictive power and is highly stable. The government should be in the leading position, and the whole society should be mobilized and also make full use of peer education during vaccination initiatives.


2021 ◽  
Vol 8 ◽  
Author(s):  
Lilin Yang ◽  
Haikuan Wang ◽  
Yanfang Li ◽  
Cheng Zeng ◽  
Xi Lin ◽  
...  

Objective: The aim of this study was to develop a nomogram to predict the risk of premature rupture of membrane (PROM) in pregnant women with vulvovaginal candidiasis (VVC).Patients and methods: We developed a prediction model based on a training dataset of 417 gravidas with VVC, the data were collected from January 2013 to December 2020. The least absolute shrinkage and selection operator regression model was used to optimize feature selection for the model. Multivariable logistic regression analysis was applied to build a prediction model incorporating the feature selected in the least absolute shrinkage and selection operator regression model. Discrimination, calibration, and clinical usefulness of the prediction model were assessed using the C-index, calibration plot, and decision curve analysis. Internal validation was assessed using bootstrapping validation.Results: Predictors contained in the prediction nomogram included age, regular perinatal visits, history of VVC before pregnancy, symptoms with VVC, cured of VVC during pregnancy, and bacterial vaginitis. The model displayed discrimination with a C-index of 0.684 (95% confidence interval: 0.631–0.737). Decision curve analysis showed that the PROM nomogram was clinically useful when intervention was decided at a PROM possibility threshold of 13%.Conclusion: This novel PROM nomogram incorporating age, regular perinatal visits, history of VVC before pregnancy, symptoms with VVC, cured of VVC during pregnancy, and bacterial vaginitis could be conveniently used to facilitate PROM risk prediction in gravidas.


Sign in / Sign up

Export Citation Format

Share Document