A COMPARISON OF SINGLE SAMPLE AND CROSS-VALIDATION METHODS FOR ESTIMATING THE MEAN SQUARED ERROR OF PREDICTION IN MULTIPLE LINEAR REGRESSION

The realization of village welfare and improvement of Village development can be started from the financial management aspects of the village. The village government has authority ranging from planning, implementation, reporting to accountability. There are two important variables as the financial aspects, there is village income, and village expenditure. The village budget process is a plan that will be compiled systematically. Planning has an association with predictions which is an indication of what is supposed to happen and predictions relating to what will happen. To provide a good village budget planning the village budget prediction feature is required. This prediction feature is done using data mining which is modeled i.e. multiple linear regression algorithm. The variable is selected using a purposive sampling technique and the sample count is 29 villages. Dependent variables are village Expenditure as Y, and independent variables i.e. village funds as X1 and village funding allocation as X2. The best values as validation were gained in the 3rd fold with a correlation coefficient of 0.8907, Mean Absolute Error value of 87209395.37, the value of Root Mean Squared Error of 114867675.6, Roll Absolute Error (RAE) Percentage was 42 %, and Root Relative Squared Error was 44 %.

Download Full-text

CV-α: designing validations sets to increase the precision and enable multiple comparison tests in genomic prediction

10.1101/2020.11.11.376343 ◽

2020 ◽

Author(s):

Rafael Massahiro Yassue ◽

José Felipe Gonzaga Sabadin ◽

Giovanni Galli ◽

Filipe Couto Alves ◽

Roberto Fritsche-Neto

Keyword(s):

Genomic Prediction ◽

Cross Validation ◽

Prediction Models ◽

Mean Squared Error ◽

Predictive Ability ◽

Proof Of Concept ◽

Squared Error ◽

High Effect ◽

The Mean ◽

Fold Cross Validation

AbstractUsually, the comparison among genomic prediction models is based on validation schemes as Repeated Random Subsampling (RRS) or K-fold cross-validation. Nevertheless, the design of training and validation sets has a high effect on the way and subjectiveness that we compare models. Those procedures cited above have an overlap across replicates that might cause an overestimated estimate and lack of residuals independence due to resampling issues and might cause less accurate results. Furthermore, posthoc tests, such as ANOVA, are not recommended due to assumption unfulfilled regarding residuals independence. Thus, we propose a new way to sample observations to build training and validation sets based on cross-validation alpha-based design (CV-α). The CV-α was meant to create several scenarios of validation (replicates x folds), regardless of the number of treatments. Using CV-α, the number of genotypes in the same fold across replicates was much lower than K-fold, indicating higher residual independence. Therefore, based on the CV-α results, as proof of concept, via ANOVA, we could compare the proposed methodology to RRS and K-fold, applying four genomic prediction models with a simulated and real dataset. Concerning the predictive ability and bias, all validation methods showed similar performance. However, regarding the mean squared error and coefficient of variation, the CV-α method presented the best performance under the evaluated scenarios. Moreover, as it has no additional cost nor complexity, it is more reliable and allows the use of non-subjective methods to compare models and factors. Therefore, CV-α can be considered a more precise validation methodology for model selection.

Download Full-text

A Novel S-Regression Model on an Auto Price

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.29.14282 ◽

2018 ◽

Vol 7 (2.29) ◽

pp. 912

Author(s):

Fadzilah Salim ◽

Nur Azman Abu

Keyword(s):

Linear Regression ◽

Regression Model ◽

Mean Squared Error ◽

Squared Error ◽

Curve Model ◽

Used Car ◽

Independent Variable ◽

The Mean ◽

Used Cars ◽

The Relationship

A simple linear regression model is useful in a prediction model. A general linear regression beyond a single independent variable is still not popular. A nonlinear regression can be easily produced a better predictive model but it is difficult to construct. The objective of this paper is to propose a technique for predicting the price of used cars in Malaysia using S-shaped curve model. In this paper, the S-shaped Membership Function [SMF] is used as the basis to develop a novel S-Regression model. Comparisons between linear regression, cubic regression and S-Regression have been made on the used car prices. The mean squared error of S-Regression model is found to be closer to cubic regression than the linear regression. S-Regression model is found to be quite suitable to represent the relationship between the price of a used car and the make year of a car. The result demonstrates that the S-Regression model gives better and practical estimate of the price of a used car in Malaysia.

Download Full-text

On the Stochastic Restrictedr-kClass Estimator and Stochastic Restrictedr-dClass Estimator in Linear Regression Model

Journal of Applied Mathematics ◽

10.1155/2014/173836 ◽

2014 ◽

Vol 2014 ◽

pp. 1-6 ◽

Cited By ~ 3

Author(s):

Jibo Wu

Keyword(s):

Linear Regression ◽

Regression Model ◽

Linear Regression Model ◽

Mean Squared Error ◽

Multiple Linear Regression Model ◽

Error Matrix ◽

Squared Error ◽

The Mean ◽

Linear Restrictions ◽

Theoretical Results

The stochastic restrictedr-kclass estimator and stochastic restrictedr-dclass estimator are proposed for the vector of parameters in a multiple linear regression model with stochastic linear restrictions. The mean squared error matrix of the proposed estimators is derived and compared, and some properties of the proposed estimators are also discussed. Finally, a numerical example is given to show some of the theoretical results.

Download Full-text

Estimation of the Monthly Dynamics of Surface Water in Wetlands from Satellite and Secondary Hydro-Climatological Data

Remote Sensing ◽

10.3390/rs13122380 ◽

2021 ◽

Vol 13 (12) ◽

pp. 2380

Author(s):

Antonio-Juan Collados-Lara ◽

Eulogio Pardo-Igúzquiza ◽

David Pulido-Velazquez ◽

Leticia Baena-Ruiz

Keyword(s):

Surface Water ◽

Regression Models ◽

Cross Validation ◽

Mean Squared Error ◽

Climatological Data ◽

Explanatory Variables ◽

Squared Error ◽

Historical Series ◽

Novel Method ◽

The Mean

Satellites produce valuable information for studying the surface water in wetlands, but in many cases the period covered, the spatial resolution and/or the revisit frequency is not enough to produce long historical series. In this paper we propose a novel method which uses regression models that include climatic and hydrological variables to complete the satellite information. We used this method in the Lagunas de Ruidera wetland (Spain). We approached the monthly dynamic of the surface water for a long period (1984–2015). Information from LANDSAT (30-m resolution) and MODIS (250-m resolution) satellites were tested but, due to the size of some lagoons, only the LANDSAT approach produced satisfactory results. An ensemble of regression models based on hydro-climatological explanatory variables was defined to complete the gaps in the monthly surface water. It showed a root mean squared error of around 476 pixels (0.4 Km2) in the cross-validation analysis. Our analysis showed that the explanatory variables with a more significant participation in the regression ensemble are the aquifer discharge, the effective precipitation and the surface water from the previous month. From January to June, the mean surface water in Lagunas de Ruidera is around 4.3 Km2. In summer a reduction of around 13% of the surface water can be observed, which is recovered during the autumn.

Download Full-text

Linearized Ridge Regression Estimator Under the Mean Squared Error Criterion in a Linear Regression Model

Communications in Statistics - Simulation and Computation ◽

10.1080/03610918.2011.575506 ◽

2011 ◽

Vol 40 (9) ◽

pp. 1434-1443 ◽

Cited By ~ 6

Author(s):

Feng Gao ◽

Xu-Qing Liu

Keyword(s):

Linear Regression ◽

Regression Model ◽

Linear Regression Model ◽

Ridge Regression ◽

Mean Squared Error ◽

Regression Estimator ◽

Error Criterion ◽

Squared Error ◽

Ridge Regression Estimator ◽

The Mean

Download Full-text

Estimators of the Mean Squared Error of Prediction in Linear Regression

Technometrics ◽

10.1080/00401706.1984.10487940 ◽

1984 ◽

Vol 26 (2) ◽

pp. 145-155 ◽

Cited By ~ 9

Author(s):

O. Bunke ◽

B. Droge

Keyword(s):

Linear Regression ◽

Mean Squared Error ◽

Squared Error ◽

The Mean

Download Full-text

Two Classes of Almost Unbiased Type Principal Component Estimators in Linear Regression Model

Journal of Applied Mathematics ◽

10.1155/2014/639070 ◽

2014 ◽

Vol 2014 ◽

pp. 1-6 ◽

Cited By ~ 1

Author(s):

Yalian Li ◽

Hu Yang

Keyword(s):

Linear Regression ◽

Regression Model ◽

Linear Regression Model ◽

Mean Squared Error ◽

Principal Component ◽

Monte Carlo Simulation Study ◽

Error Matrix ◽

Squared Error ◽

The Mean ◽

Almost Unbiased

This paper is concerned with the parameter estimator in linear regression model. To overcome the multicollinearity problem, two new classes of estimators called the almost unbiased ridge-type principal component estimator (AURPCE) and the almost unbiased Liu-type principal component estimator (AULPCE) are proposed, respectively. The mean squared error matrix of the proposed estimators is derived and compared, and some properties of the proposed estimators are also discussed. Finally, a Monte Carlo simulation study is given to illustrate the performance of the proposed estimators.

Download Full-text