US Medical Expense Analysis Through Frequency and Severity Bootstrapping and Regression Model

2022 ◽  
pp. 177-207
Author(s):  
Fangjun Li ◽  
Gao Niu

For the purpose of control health expenditures, there are some papers investigating the characteristics of patients who may incur high expenditures. However fewer papers are found which are based on the overall medical conditions, so this chapter was to find a relationship among the prevalence of medical conditions, utilization of healthcare services, and average expenses per person. The authors used bootstrapping simulation for data preprocessing and then used linear regression and random forest methods to train several models. The metrics root mean square error (RMSE), mean absolute percent error (MAPE), mean absolute error (MAE) all showed that the selected linear regression model performs slightly better than the selected random forest regression model, and the linear model used medical conditions, type of services, and their interaction terms as predictors.

2020 ◽  
Vol 9 (11) ◽  
pp. 654
Author(s):  
Guanwei Zhao ◽  
Muzhuang Yang

Mapping population distribution at fine resolutions with high accuracy is crucial to urban planning and management. This paper takes Guangzhou city as the study area, illustrates the gridded population distribution map by using machine learning methods based on zoning strategy with multisource geospatial data such as night light remote sensing data, point of interest data, land use data, and so on. The street-level accuracy evaluation results show that the proposed approach achieved good overall accuracy, with determinant coefficient (R2) being 0.713 and root mean square error (RMSE) being 5512.9. Meanwhile, the goodness of fit for single linear regression (LR) model and random forest (RF) regression model are 0.0039 and 0.605, respectively. For dense area, the accuracy of the random forest model is better than the linear regression model, while for sparse area, the accuracy of the linear regression model is better than the random forest model. The results indicated that the proposed method has great potential in fine-scale population mapping. Therefore, it is advised that the zonal modeling strategy should be the primary choice for solving regional differences in the population distribution mapping research.


2020 ◽  
Author(s):  
Peijia Liu ◽  
Dong Yang ◽  
Shaomin Li ◽  
Yutian Chong ◽  
Wentao Hu ◽  
...  

Abstract Background The utilization of estimating-GFR equations is critical for kidney disease in the clinic. However, the performance of the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation has not improved substantially in the past eight years. Here we hypothesized that random forest regression(RF) method could go beyond revised linear regression, which is used to build the CKD-EPI equationMethods 1732 participants were enrolled in this study totally (1333 in development data set from Tianhe District and 399 in external data set Luogang District). Recursive feature elimination (RFE) is applied to the development data to select important variables and build random forest models. Then same variables were used to develop the estimated GFR equation with linear regression as a comparison. The performances of these equations are measured by bias, 30% accuracy , precision and root mean square error(RMSE).Results Of all the variables, creatinine, cystatin C, weight, body mass index (BMI), age, uric acid(UA), blood urea nitrogen(BUN), hematocrit(HCT) and apolipoprotein B(APOB) were selected by RFE method. The results revealed that the overall performance of random forest regression models ascended the revised regression models based on the same variables. In the 9-variable model, RF model was better than revised linear regression in term of bias, precision ,30%accuracy and RMSE(0.78 vs 2.98, 16.90 vs 23.62, 0.84 vs 0.80, 16.88 vs 18.70, all P<0.01 ). In the 4-variable model, random forest regression model showed an improvement in precision and RMSE compared with revised regression model. (20.82 vs 25.25, P<0.01, 19.08 vs 20.60, P<0.001). Bias and 30%accurancy were preferable, but the results were not statistically significant (0.34 vs 2.07, P=0.10, 0.8 vs 0.78, P=0.19, respectively).Conclusions The performances of random forest regression models are better than revised linear regression models when it comes to GFR estimation.


2020 ◽  
Author(s):  
Peijia Liu ◽  
Dong Yang ◽  
Shaomin Li ◽  
Yutian Chong ◽  
Ming Li ◽  
...  

Abstract Background The utilization of estimating-GFR equations is critical for kidney disease in the clinic. However, the performance of the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation has not improved substantially in the past eight years. Here we hypothesized that random forest regression(RF) method could go beyond revised linear regression, which is used to build the CKD-EPI equation Methods 1732 participants were enrolled in this study totally (1333 in development data set from Tianhe District and 399 in external data set Luogang District). Recursive feature elimination (RFE) is applied to the development data to select important variables and build random forest models. Then same variables were used to develop the estimated GFR equation with linear regression as a comparison. The performances of these equations are measured by bias, 30% accuracy, precision and root mean square error(RMSE). Results Of all the variables, creatinine, cystatin C, weight, body mass index (BMI), age, uric acid(UA), blood urea nitrogen(BUN), hematocrit(HCT) and apolipoprotein B(APOB) were selected by RFE method. The results revealed that the overall performance of random forest regression models ascended the revised regression models based on the same variables. In the 9-variable model, RF model was better than revised linear regression in term of bias, precision ,30%accuracy and RMSE(0.78 vs 2.98, 16.90 vs 23.62, 0.84 vs 0.80, 16.88 vs 18.70, all P < 0.01 ). In the 4-variable model, random forest regression model showed an improvement in precision and RMSE compared with revised regression model. (20.82 vs 25.25, P < 0.01, 19.08 vs 20.60, P < 0.001). Bias and 30%accurancy were preferable, but the results were not statistically significant (0.34 vs 2.07, P = 0.10, 0.8 vs 0.78, P = 0.19, respectively). Conclusions The performances of random forest regression models are better than revised linear regression models when it comes to GFR estimation.


2016 ◽  
Vol 74 (9) ◽  
pp. 2225-2233 ◽  
Author(s):  
Alaa H. Hawari ◽  
Wael Alnahhal

The impact of flow rate and turbidity on the performance of multi-media filtration has been studied using an artificial neural network (ANN) based model. The ANN model was developed and tested based on experimental data collected from a pilot scale multi-media filter system. Several ANN models were tested, and the best results with the lowest errors were achieved with two hidden layers and five neurons per layer. To examine the significance and efficiency of the developed ANN model it was compared with a linear regression model. The R2 values for the actual versus predicted results were 0.9736 and 0.9617 for the ANN model and the linear regression model, respectively. The ANN model showed an R-squared value increase of 1.22% when compared to the linear regression model. In addition, the ANN model gave a significant reduction of 91.5% and 97.9% in the mean absolute error and the root mean square error, respectively when compared to the linear regression model. The proposed model has proven to give plausible results to model complex relationships that can be used in real life water treatment plants.


2021 ◽  
Vol 13 (16) ◽  
pp. 3123
Author(s):  
Chunzhu Wei ◽  
Qianying Zhao ◽  
Yang Lu ◽  
Dongjie Fu

Pearl River Delta (PRD), as one of the most densely populated regions in the world, is facing both natural changes (e.g., sea level rise) and human-induced changes (e.g., dredging for navigation and land reclamation). Bathymetric information is thus important for the protection and management of the estuarine environment, but little effort has been made to comprehensively evaluate the performance of different methods and datasets. In this study, two linear regression models—the linear band model and the log-transformed band ratio model, and two non-linear regression models—the support vector regression model and the random forest regression model—were applied to Landsat 8 (L8) and Sentinel-2 (S2) imagery for bathymetry mapping in 2019 and 2020. Results suggested that a priori area clustering based on spectral features using the K-means algorithm improved estimation accuracy. The random forest regression model performed best, and the three-band combinations outperformed two-band combinations in all models. When the non-linear models were applied with three-band combination (red, green, blue) to L8 and S2 imagery, the Root Mean Square Error (Mean Absolute Error) decreased by 23.10% (35.53%), and the coefficient of determination (Kling-Gupta efficiency) increased by 0.08 (0.09) on average, compared to those using the linear regression models. Despite the differences in spatial resolution and band wavelength, L8 and S2 performed similarly in bathymetry estimation. This study quantified the relative performance of different models and may shed light on the potential combination of multiple data sources for more timely and accurate bathymetry mapping.


Author(s):  
Reza Norouzi ◽  
Rasoul Daneshfaraz ◽  
John Abraham ◽  
Parveen Sihag

Drops are the most important and most common energy dissipator in irrigation networks and erodible canals and consequently, their performance must be well understood. This study was designed to evaluate the capability of Artificial Intelligence (AI) methods including ANN, ANFIS, GRNN, SVM, GP, MLR, and LR to predict the relative energy dissipation (∆E/E0) in vertical drops equipped with a horizontal screen. For this study, 108 experiments were carried out to investigate energy dissipation with variable discharge, varying drop height, and porosity of the horizontal screens. Parameters yc/h, yd/yc, and p are considered as input variables and ∆E/E0 is the output variable. The efficiency of models was compared using Taylor's diagram, Box Plot of the applied error distribution, correlation coefficient (CC), mean absolute error (MAE) and root-mean-square error (RMSE). Results indicate that the performance of the ANFIS_gbellmf based model with CC value of 0.9953, RMSE value of 0.0069 and MAE value of 0.0042 was superior to other applied models. Also, the linear regression model with CC=0.9933, RMSE=0.0083, and MAE= 0.0067performs better than the multiple linear regression model in this study. Results of a sensitivity study suggest that yc/h is the most effective parameter for predicting ∆E/E0.


Plant Disease ◽  
2021 ◽  
Author(s):  
Yuxiang Zeng ◽  
Junjie Dong ◽  
Zhijuan Ji ◽  
Yan Liang ◽  
Changdeng Yang

Rice sheath blight (SB) disease is a global issue that causes great yield losses each year. To explore whether SB field resistance can be predicted, 273 rice genotypes were inoculated and evaluated for SB field resistance across nine environments (2012-2019) to identify loci associated with SB resistance by association mapping. A total of 80 significant marker-trait associations were detected in nine environments, among which six loci (D130B, D230A, D304B, D309, D427A, and RM409) were repeatedly detected in at least two environments. A linear regression model for predicting SB lesion length was developed using genotypic data of these 6 loci and SB field resistance data of the 273 rice genotypes: y = 34.44 - 0.56 x, where y is the predicted value of lesion length, and x is the total genotypic value of the six loci. A recombinant inbred line (RIL) population consisting of 219 lines that was grown in six environments (from 2013 to 2018) for evaluation of SB field resistance was used to check the prediction accuracy of the prediction model. The average absolute error between the predicted lesion length and real lesion length for the RIL population was 6.67 cm. The absolute errors between predicted and real lesion lengths were below 6 cm for 51.22% of the lines, and were below 9 cm for 71.22% of the lines. An SB visual rating prediction model was also developed, the average absolute error between the predicted visual rating and real visual rating for the RIL population was 0.94. These results indicated that the rice SB lesion length can be predicted by the development of a linear regression model using both genotypic and phenotypic data.


Author(s):  
Aliva Bera ◽  
D.P. Satapathy

In this paper, the linear regression model using ANN and the linear regression model using MS Excel were developed to estimate the physico-chemical concentrations in groundwater using pH, EC, TDS, TH, HCO3 as input parameters and Ca, Mg and K as output parameters. A comparison was made which indicated that ANN model had the better ability to estimate the physic-chemical concentrations in groundwater. An analytical survey along with simulation based tests for finding the climatic change and its effect on agriculture and water bodies in Angul-Talcher area is done. The various seasonal parameters such as pH, BOD, COD, TDS,TSS along with heavy elements like Pb, Cd, Zn, Cu, Fe, Mn concentration in water resources has been analyzed. For past 30 years rainfall data has been analyzed and water quality index values has been studied to find normal and abnormal quality of water resources and matlab based simulation has been done for performance analysis. All results has been analyzed and it is found that the condition is stable. 


Sign in / Sign up

Export Citation Format

Share Document