A COMPARISON OF SINGLE SAMPLE AND CROSS-VALIDATION METHODS FOR ESTIMATING THE MEAN SQUARED ERROR OF PREDICTION IN MULTIPLE LINEAR REGRESSION

Author(s):  
M. W. Browne
Kursor ◽  
2020 ◽  
Vol 10 (2) ◽  
Author(s):  
Nisa Hanum Harani ◽  
Hanna Theresia Siregar ◽  
Cahyo Prianto

The realization of village welfare and improvement of Village development can be started from the financial management aspects of the village.  The village government has authority ranging from planning, implementation, reporting to accountability.  There are two important variables as the financial aspects, there is village income, and village expenditure.  The village budget process is a plan that will be compiled systematically. Planning has an association with predictions which is an indication of what is supposed to happen and predictions relating to what will happen.   To provide a  good village budget planning the village budget prediction feature is required. This prediction feature is done using data mining which is modeled i.e. multiple linear regression algorithm.  The variable is selected using a purposive sampling technique and the sample count is 29 villages.  Dependent variables are village Expenditure as Y, and independent variables i.e. village funds as  X1 and village funding allocation as X2.   The best values as validation were gained in the 3rd fold with a correlation coefficient of 0.8907, Mean Absolute Error value of 87209395.37, the value of Root Mean Squared Error of 114867675.6, Roll Absolute  Error  (RAE) Percentage was 42 %, and  Root  Relative  Squared Error was 44 %.


2020 ◽  
Author(s):  
Rafael Massahiro Yassue ◽  
José Felipe Gonzaga Sabadin ◽  
Giovanni Galli ◽  
Filipe Couto Alves ◽  
Roberto Fritsche-Neto

AbstractUsually, the comparison among genomic prediction models is based on validation schemes as Repeated Random Subsampling (RRS) or K-fold cross-validation. Nevertheless, the design of training and validation sets has a high effect on the way and subjectiveness that we compare models. Those procedures cited above have an overlap across replicates that might cause an overestimated estimate and lack of residuals independence due to resampling issues and might cause less accurate results. Furthermore, posthoc tests, such as ANOVA, are not recommended due to assumption unfulfilled regarding residuals independence. Thus, we propose a new way to sample observations to build training and validation sets based on cross-validation alpha-based design (CV-α). The CV-α was meant to create several scenarios of validation (replicates x folds), regardless of the number of treatments. Using CV-α, the number of genotypes in the same fold across replicates was much lower than K-fold, indicating higher residual independence. Therefore, based on the CV-α results, as proof of concept, via ANOVA, we could compare the proposed methodology to RRS and K-fold, applying four genomic prediction models with a simulated and real dataset. Concerning the predictive ability and bias, all validation methods showed similar performance. However, regarding the mean squared error and coefficient of variation, the CV-α method presented the best performance under the evaluated scenarios. Moreover, as it has no additional cost nor complexity, it is more reliable and allows the use of non-subjective methods to compare models and factors. Therefore, CV-α can be considered a more precise validation methodology for model selection.


2018 ◽  
Vol 7 (2.29) ◽  
pp. 912
Author(s):  
Fadzilah Salim ◽  
Nur Azman Abu

A simple linear regression model is useful in a prediction model. A general linear regression beyond a single independent variable is still not popular. A nonlinear regression can be easily produced a better predictive model but it is difficult to construct. The objective of this paper is to propose a technique for predicting the price of used cars in Malaysia using S-shaped curve model. In this paper, the S-shaped Membership Function [SMF] is used as the basis to develop a novel S-Regression model. Comparisons between linear regression, cubic regression and S-Regression have been made on the used car prices. The mean squared error of S-Regression model is found to be closer to cubic regression than the linear regression. S-Regression model is found to be quite suitable to represent the relationship between the price of a used car and the make year of a car. The result demonstrates that the S-Regression model gives better and practical estimate of the price of a used car in Malaysia.  


2014 ◽  
Vol 2014 ◽  
pp. 1-6 ◽  
Author(s):  
Jibo Wu

The stochastic restrictedr-kclass estimator and stochastic restrictedr-dclass estimator are proposed for the vector of parameters in a multiple linear regression model with stochastic linear restrictions. The mean squared error matrix of the proposed estimators is derived and compared, and some properties of the proposed estimators are also discussed. Finally, a numerical example is given to show some of the theoretical results.


2021 ◽  
Vol 13 (12) ◽  
pp. 2380
Author(s):  
Antonio-Juan Collados-Lara ◽  
Eulogio Pardo-Igúzquiza ◽  
David Pulido-Velazquez ◽  
Leticia Baena-Ruiz

Satellites produce valuable information for studying the surface water in wetlands, but in many cases the period covered, the spatial resolution and/or the revisit frequency is not enough to produce long historical series. In this paper we propose a novel method which uses regression models that include climatic and hydrological variables to complete the satellite information. We used this method in the Lagunas de Ruidera wetland (Spain). We approached the monthly dynamic of the surface water for a long period (1984–2015). Information from LANDSAT (30-m resolution) and MODIS (250-m resolution) satellites were tested but, due to the size of some lagoons, only the LANDSAT approach produced satisfactory results. An ensemble of regression models based on hydro-climatological explanatory variables was defined to complete the gaps in the monthly surface water. It showed a root mean squared error of around 476 pixels (0.4 Km2) in the cross-validation analysis. Our analysis showed that the explanatory variables with a more significant participation in the regression ensemble are the aquifer discharge, the effective precipitation and the surface water from the previous month. From January to June, the mean surface water in Lagunas de Ruidera is around 4.3 Km2. In summer a reduction of around 13% of the surface water can be observed, which is recovered during the autumn.


2014 ◽  
Vol 2014 ◽  
pp. 1-6 ◽  
Author(s):  
Yalian Li ◽  
Hu Yang

This paper is concerned with the parameter estimator in linear regression model. To overcome the multicollinearity problem, two new classes of estimators called the almost unbiased ridge-type principal component estimator (AURPCE) and the almost unbiased Liu-type principal component estimator (AULPCE) are proposed, respectively. The mean squared error matrix of the proposed estimators is derived and compared, and some properties of the proposed estimators are also discussed. Finally, a Monte Carlo simulation study is given to illustrate the performance of the proposed estimators.


2011 ◽  
Vol 60 (2) ◽  
pp. 248-255 ◽  
Author(s):  
Sangmun Shin ◽  
Funda Samanlioglu ◽  
Byung Rae Cho ◽  
Margaret M. Wiecek

Sign in / Sign up

Export Citation Format

Share Document