Urban Population Distribution Mapping with Multisource Geospatial Data Based on Zonal Strategy

Mapping population distribution at fine resolutions with high accuracy is crucial to urban planning and management. This paper takes Guangzhou city as the study area, illustrates the gridded population distribution map by using machine learning methods based on zoning strategy with multisource geospatial data such as night light remote sensing data, point of interest data, land use data, and so on. The street-level accuracy evaluation results show that the proposed approach achieved good overall accuracy, with determinant coefficient (R2) being 0.713 and root mean square error (RMSE) being 5512.9. Meanwhile, the goodness of fit for single linear regression (LR) model and random forest (RF) regression model are 0.0039 and 0.605, respectively. For dense area, the accuracy of the random forest model is better than the linear regression model, while for sparse area, the accuracy of the linear regression model is better than the random forest model. The results indicated that the proposed method has great potential in fine-scale population mapping. Therefore, it is advised that the zonal modeling strategy should be the primary choice for solving regional differences in the population distribution mapping research.

Download Full-text

Applying Artificial Neural Networks. I. Estimating Nicotine in Tobacco from near Infrared Data

Journal of Near Infrared Spectroscopy ◽

10.1255/jnirs.64 ◽

1995 ◽

Vol 3 (3) ◽

pp. 133-142 ◽

Cited By ~ 10

Author(s):

M. Hana ◽

W.F. McClure ◽

T.B. Whitaker ◽

M. White ◽

D.R. Bahler

Keyword(s):

Linear Regression ◽

Regression Model ◽

Linear Regression Model ◽

Near Infrared ◽

Back Propagation ◽

Linear Network ◽

Data Set ◽

Input Layer ◽

Propagation Network ◽

Better Than

Two artificial neural network models were used to estimate the nicotine in tobacco: (i) a back-propagation network and (ii) a linear network. The back-propagation network consisted of an input layer, an output layer and one hidden layer. The linear network consisted of an input layer and an output layer. Both networks used the generalised delta rule for learning. Performances of both networks were compared to the multiple linear regression method MLR of calibration. The nicotine content in tobacco samples was estimated for two different data sets. Data set A contained 110 near infrared (NIR) spectra each consisting of reflected energy at eight wavelengths. Data set B consisted of 200 NIR spectra with each spectrum having 840 spectral data points. The Fast Fourier transformation was applied to data set B in order to compress each spectrum into 13 Fourier coefficients. For data set A, the linear regression model gave better results followed by the back-propagation network which was followed by the linear network. The true performance of the linear regression model was better than the back-propagation and the linear networks by 14.0% and 18.1%, respectively. For data set B, the back-propagation network gave the best result followed by MLR and the linear network. Both the linear network and MLR models gave almost the same results. The true performance of the back-propagation network model was better than the MLR and linear network by 35.14%.

Download Full-text

Modified One-Parameter Liu Estimator for the Linear Regression Model

Modelling and Simulation in Engineering ◽

10.1155/2020/9574304 ◽

2020 ◽

Vol 2020 ◽

pp. 1-17

Author(s):

Adewale F. Lukman ◽

B. M. Golam Kibria ◽

Kayode Ayinde ◽

Segun L. Jegede

Keyword(s):

Linear Regression ◽

Regression Model ◽

Linear Regression Model ◽

Ridge Regression ◽

Real Life ◽

Liu Estimator ◽

Real Life Data ◽

Squared Prediction Error ◽

Simulation Results ◽

Better Than

Motivated by the ridge regression (Hoerl and Kennard, 1970) and Liu (1993) estimators, this paper proposes a modified Liu estimator to solve the multicollinearity problem for the linear regression model. This modification places this estimator in the class of the ridge and Liu estimators with a single biasing parameter. Theoretical comparisons, real-life application, and simulation results show that it consistently dominates the usual Liu estimator. Under some conditions, it performs better than the ridge regression estimators in the smaller MSE sense. Two real-life data are analyzed to illustrate the findings of the paper and the performances of the estimators assessed by MSE and the mean squared prediction error. The application result agrees with the theoretical and simulation results.

Download Full-text

CASE-BASED SOFTWARE QUALITY PREDICTION

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194000000092 ◽

2000 ◽

Vol 10 (02) ◽

pp. 139-152 ◽

Cited By ~ 37

Author(s):

K. GANESAN ◽

TAGHI M. KHOSHGOFTAAR ◽

EDWARD B. ALLEN

Keyword(s):

Linear Regression ◽

Regression Model ◽

Software Quality ◽

Linear Regression Model ◽

Multiple Linear Regression Model ◽

Case Based Reasoning ◽

Quality Prediction ◽

Software Quality Prediction ◽

Case Based ◽

Better Than

Highly reliable software is becoming an essential ingredient in many systems. However, assuring reliability often entails time-consuming costly development processes. One cost-effective strategy is to target reliability-enhancement activities to those modules that are likely to have the most problems. Software quality prediction models can predict the number of faults expected in each module early enough for reliability enhancement to be effective. This paper introduces a case-based reasoning technique for the prediction of software quality factors. Case-based reasoning is a technique that seeks to answer new problems by identifying similar "cases" from the past. A case-based reasoning system can function as a software quality prediction model. To our knowledge, this study is the first to use case-based reasoning systems for predicting quantitative measures of software quality. A case study applied case-based reasoning to software quality modeling of a family of full-scale industrial software systems. The case-based reasoning system's accuracy was much better than a corresponding multiple linear regression model in predicting the number of design faults. When predicting faults in code, its accuracy was significantly better than a corresponding multiple linear regression model for two of three test data sets and statistically equivalent for the third.

Download Full-text

Applying Random Forest Model Algorithm to GFR Estimation

10.21203/rs.3.rs-74843/v1 ◽

2020 ◽

Author(s):

Peijia Liu ◽

Dong Yang ◽

Shaomin Li ◽

Yutian Chong ◽

Wentao Hu ◽

...

Keyword(s):

Random Forest ◽

Kidney Disease ◽

Linear Regression ◽

Regression Model ◽

Regression Models ◽

Random Forest Regression ◽

Variable Model ◽

Data Set ◽

Development Data ◽

Better Than

Abstract Background The utilization of estimating-GFR equations is critical for kidney disease in the clinic. However, the performance of the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation has not improved substantially in the past eight years. Here we hypothesized that random forest regression(RF) method could go beyond revised linear regression, which is used to build the CKD-EPI equationMethods 1732 participants were enrolled in this study totally (1333 in development data set from Tianhe District and 399 in external data set Luogang District). Recursive feature elimination (RFE) is applied to the development data to select important variables and build random forest models. Then same variables were used to develop the estimated GFR equation with linear regression as a comparison. The performances of these equations are measured by bias, 30% accuracy , precision and root mean square error(RMSE).Results Of all the variables, creatinine, cystatin C, weight, body mass index (BMI), age, uric acid(UA), blood urea nitrogen(BUN), hematocrit(HCT) and apolipoprotein B(APOB) were selected by RFE method. The results revealed that the overall performance of random forest regression models ascended the revised regression models based on the same variables. In the 9-variable model, RF model was better than revised linear regression in term of bias, precision ,30%accuracy and RMSE(0.78 vs 2.98, 16.90 vs 23.62, 0.84 vs 0.80, 16.88 vs 18.70, all P<0.01 ). In the 4-variable model, random forest regression model showed an improvement in precision and RMSE compared with revised regression model. (20.82 vs 25.25, P<0.01, 19.08 vs 20.60, P<0.001). Bias and 30%accurancy were preferable, but the results were not statistically significant (0.34 vs 2.07, P=0.10, 0.8 vs 0.78, P=0.19, respectively).Conclusions The performances of random forest regression models are better than revised linear regression models when it comes to GFR estimation.

Download Full-text

A New Ridge-Type Estimator for the Linear Regression Model: Simulations and Applications

Scientifica ◽

10.1155/2020/9758378 ◽

2020 ◽

Vol 2020 ◽

pp. 1-16 ◽

Cited By ~ 6

Author(s):

B. M. Golam Kibria ◽

Adewale F. Lukman

Keyword(s):

Linear Regression ◽

Regression Model ◽

Linear Regression Model ◽

Ridge Regression ◽

Real Life ◽

Economic Data ◽

Nonlinear Regression Models ◽

Shrinkage Methods ◽

Regression Estimators ◽

Better Than

The ridge regression-type (Hoerl and Kennard, 1970) and Liu-type (Liu, 1993) estimators are consistently attractive shrinkage methods to reduce the effects of multicollinearity for both linear and nonlinear regression models. This paper proposes a new estimator to solve the multicollinearity problem for the linear regression model. Theory and simulation results show that, under some conditions, it performs better than both Liu and ridge regression estimators in the smaller MSE sense. Two real-life (chemical and economic) data are analyzed to illustrate the findings of the paper.

Download Full-text

Applying Random Forest Model Algorithm to GFR estimation

10.21203/rs.3.rs-22422/v1 ◽

2020 ◽

Author(s):

Peijia Liu ◽

Dong Yang ◽

Shaomin Li ◽

Yutian Chong ◽

Ming Li ◽

...

Keyword(s):

Random Forest ◽

Kidney Disease ◽

Linear Regression ◽

Regression Model ◽

Regression Models ◽

Random Forest Regression ◽

Variable Model ◽

Data Set ◽

Development Data ◽

Better Than

Abstract Background The utilization of estimating-GFR equations is critical for kidney disease in the clinic. However, the performance of the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation has not improved substantially in the past eight years. Here we hypothesized that random forest regression(RF) method could go beyond revised linear regression, which is used to build the CKD-EPI equation Methods 1732 participants were enrolled in this study totally (1333 in development data set from Tianhe District and 399 in external data set Luogang District). Recursive feature elimination (RFE) is applied to the development data to select important variables and build random forest models. Then same variables were used to develop the estimated GFR equation with linear regression as a comparison. The performances of these equations are measured by bias, 30% accuracy, precision and root mean square error(RMSE). Results Of all the variables, creatinine, cystatin C, weight, body mass index (BMI), age, uric acid(UA), blood urea nitrogen(BUN), hematocrit(HCT) and apolipoprotein B(APOB) were selected by RFE method. The results revealed that the overall performance of random forest regression models ascended the revised regression models based on the same variables. In the 9-variable model, RF model was better than revised linear regression in term of bias, precision ,30%accuracy and RMSE(0.78 vs 2.98, 16.90 vs 23.62, 0.84 vs 0.80, 16.88 vs 18.70, all P < 0.01 ). In the 4-variable model, random forest regression model showed an improvement in precision and RMSE compared with revised regression model. (20.82 vs 25.25, P < 0.01, 19.08 vs 20.60, P < 0.001). Bias and 30%accurancy were preferable, but the results were not statistically significant (0.34 vs 2.07, P = 0.10, 0.8 vs 0.78, P = 0.19, respectively). Conclusions The performances of random forest regression models are better than revised linear regression models when it comes to GFR estimation.

Download Full-text

An Adaptive Random Forest Model for Predicting Demands and Solar Power of a Real Integrated Energy System

10.36227/techrxiv.17149367 ◽

2021 ◽

Author(s):

Jie Mei ◽

Christopher Lee ◽

James L. Kirtley

Keyword(s):

Random Forest ◽

Solar Power ◽

Energy Systems ◽

Energy System ◽

Random Forest Model ◽

Operation Optimization ◽

Forest Model ◽

Energy Networks ◽

Integrated Energy System ◽

Better Than

In order to address the challenges of improving energy efficiency and integration of renewable energy, multi-energy systems, composed of electric, natural gas, heat and other energy networks, have received more and more attention in recent years and have been rapidly developed. Through integration as a multi-energy system, different energy infrastructures can be scheduled and managed as one unit. One of the main stages in the optimal scheduling of a multi-energy system is the predictions of various demands and sustainable energy in the scheduling horizon. <a>This paper proposes a prediction model based on adaptive random forest for demands and solar power of a real MES, Stone Edge Farm, in California. </a><a>The adaptive random forest model can provide a probability distribution of the prediction results. This allows users to consider a variety of scenarios that may occur in the future for further system operation optimization and help users evaluate the reliability of the results.</a> Besides, an online self-adaptability feature is implemented with the model so it can adapt to the new forecasting environment when new observations are detected. The simulations show that the adaptive random forest model is better than the benchmark models in terms of prediction accuracy.

Download Full-text

The Performance of Some Restricted Estimators In Restricted Linear Regression Model

Al-Qadisiyah Journal Of Pure Science ◽

10.29350/qjps.2021.26.2.1287 ◽

2021 ◽

Vol 26 (2) ◽

Author(s):

Bader Aboud ◽

Mustafa Ismaeel Naif

Keyword(s):

Linear Regression ◽

Regression Model ◽

Simulation Study ◽

Linear Regression Model ◽

Regression Estimator ◽

Mean Square ◽

Ridge Regression Estimator ◽

The Mean ◽

Biased Estimators ◽

Better Than

In the linear regression model, the restricted biased estimation as one of important methods to addressing the high variance and the multicollinearity problems. In this paper, we make the simulation study of the some restricted biased estimators. The mean square error (MME) criteria are used to make a comparison among them. According to the simulation study we observe that, the performance of the restricted modified unbiased ridge regression estimator (RMUR) was proposed by Bader and Alheety (2020) is better than of these estimators. Numerical example have been considered to illustrate the performance of the estimators.

Download Full-text

An Adaptive Random Forest Model for Predicting Demands and Solar Power of a Real Integrated Energy System

10.36227/techrxiv.17149367.v1 ◽

2021 ◽

Author(s):

Jie Mei ◽

Christopher Lee ◽

James L. Kirtley

Keyword(s):

Random Forest ◽

Solar Power ◽

Energy Systems ◽

Energy System ◽

Random Forest Model ◽

Operation Optimization ◽

Forest Model ◽

Energy Networks ◽

Integrated Energy System ◽

Better Than

Download Full-text

The statistical importance of P-POSSUM scores for predicting mortality after emergency laparotomy in geriatric patients

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-1100-9 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Yang Cao ◽

Gary A. Bass ◽

Rebecka Ahl ◽

Arvid Pourlotfi ◽

Håkan Geijer ◽

...

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Regression Model ◽

Mortality Risk ◽

Logistic Regression Model ◽

Geriatric Patients ◽

Random Forest Model ◽

Emergency Laparotomy ◽

Relative Importance ◽

Forest Model

Abstract Background Geriatric patients frequently undergo emergency general surgery and accrue a greater risk of postoperative complications and fatal outcomes than the general population. It is highly relevant to develop the most appropriate care measures and to guide patient-centered decision-making around end-of-life care. Portsmouth - Physiological and Operative Severity Score for the enumeration of Mortality and morbidity (P-POSSUM) has been used to predict mortality in patients undergoing different types of surgery. In the present study, we aimed to evaluate the relative importance of the P-POSSUM score for predicting 90-day mortality in the elderly subjected to emergency laparotomy from statistical aspects. Methods One hundred and fifty-seven geriatric patients aged ≥65 years undergoing emergency laparotomy between January 1st, 2015 and December 31st, 2016 were included in the study. Mortality and 27 other patient characteristics were retrieved from the computerized records of Örebro University Hospital in Örebro, Sweden. Two supervised classification machine methods (logistic regression and random forest) were used to predict the 90-day mortality risk. Three scalers (Standard scaler, Robust scaler and Min-Max scaler) were used for variable engineering. The performance of the models was evaluated using accuracy, sensitivity, specificity and area under the receiver operating characteristic curve (AUC). Importance of the predictors were evaluated using permutation variable importance and Gini importance. Results The mean age of the included patients was 75.4 years (standard deviation =7.3 years) and the 90-day mortality rate was 29.3%. The most common indication for surgery was bowel obstruction occurring in 92 (58.6%) patients. Types of post-operative complications ranged between 7.0–36.9% with infection being the most common type. Both the logistic regression and random forest models showed satisfactory performance for predicting 90-day mortality risk in geriatric patients after emergency laparotomy, with AUCs of 0.88 and 0.93, respectively. Both models had an accuracy > 0.8 and a specificity ≥0.9. P-POSSUM had the greatest relative importance for predicting 90-day mortality in the logistic regression model and was the fifth important predictor in the random forest model. No notable change was found in sensitivity analysis using different variable engineering methods with P-POSSUM being among the five most accurate variables for mortality prediction. Conclusion P-POSSUM is important for predicting 90-day mortality after emergency laparotomy in geriatric patients. The logistic regression model and random forest model may have an accuracy of > 0.8 and an AUC around 0.9 for predicting 90-day mortality. Further validation of the variables’ importance and the models’ robustness is needed by use of larger dataset.

Download Full-text