On Cross Validation for Model Selection

1999 ◽  
Vol 11 (4) ◽  
pp. 863-870 ◽  
Author(s):  
Isabelle Rivals ◽  
Léon Personnaz

In response to Zhu and Rower (1996), a recent communication (Goutte, 1997) established that leave-one-out cross validation is not subject to the “no-free-lunch” criticism. Despite this optimistic conclusion, we show here that cross validation has very poor performances for the selection of linear models as compared to classic statistical tests. We conclude that the statistical tests are preferable to cross validation for linear as well as for nonlinear model selection.

Author(s):  
Federico Belotti ◽  
Franco Peracchi

In this article, we describe jackknife2, a new prefix command for jackknifing linear estimators. It takes full advantage of the available leave-one-out formula, thereby allowing for substantial reduction in computing time. Of special note is that jackknife2 allows the user to compute cross-validation and diagnostic measures that are currently not available after ivregress 2sls, xtreg, and xtivregress.


1996 ◽  
Vol 8 (7) ◽  
pp. 1421-1426 ◽  
Author(s):  
Huaiyu Zhu ◽  
Richard Rohwer

It is known theoretically that an algorithm cannot be good for an arbitrary prior. We show that in practical terms this also applies to the technique of “cross-validation,” which has been widely regarded as defying this general rule. Numerical examples are analyzed in detail. Their implications to researches on learning algorithms are discussed.


2020 ◽  
Vol 10 (7) ◽  
pp. 2448
Author(s):  
Liye Lv ◽  
Xueguan Song ◽  
Wei Sun

The leave-one-out cross validation (LOO-CV), which is a model-independent evaluate method, cannot always select the best of several models when the sample size is small. We modify the LOO-CV method by moving a validation point around random normal distributions—rather than leaving it out—naming it the move-one-away cross validation (MOA-CV), which is a model-dependent method. The key point of this method is to improve the accuracy rate of model selection that is unreliable in LOO-CV without enough samples. Errors from LOO-CV and MOA-CV, i.e., LOO-CVerror and MOA-CVerror, respectively, are employed to select the best one of four typical surrogate models through four standard mathematical functions and one engineering problem. The coefficient of determination (R-square, R2) is used to be a calibration of MOA-CVerror and LOO-CVerror. Results show that: (i) in terms of selecting the best models, MOA-CV and LOO-CV become better as sample size increases; (ii) MOA-CV has a better performance in selecting best models than LOO-CV; (iii) in the engineering problem, both the MOA-CV and LOO-CV can choose the worst models, and in most cases, MOA-CV has a higher probability to select the best model than LOO-CV.


2007 ◽  
Vol 11 (5) ◽  
pp. 1673-1682 ◽  
Author(s):  
H. Hellebrand ◽  
L. Hoffmann ◽  
J. Juilleret ◽  
L. Pfister

Abstract. In this study two approaches are used to predict winter storm flow coefficients in meso-scale basins (10 km² to 1000 km²) with a view to regionalization. The winter storm flow coefficient corresponds to the ratio between direct discharge and rainfall. It is basin specific and supposed to give an integrated response to rainfall. The two approaches, which used the permeability of the substratum and dominating runoff generation processes as basin attributes are compared. The study area is the Rhineland Palatinate and the Grand Duchy of Luxembourg and the study focuses on the Nahe basin and its 16 sub-basins (Rhineland Palatinate). For the comparison, three statistical models were derived by means of regression analysis. The models used the winter storm flow coefficient as the dependent variable; the independent variables were the permeability of the substratum, preliminary derived dominating runoff generation processes and a combination of both. It is demonstrated that the permeability and the preliminary derived processes carry different layers of information. Cross-validation and statistical tests were used to determine and evaluate model differences. The cross-validation resulted in a best model performance for the model that used both parameters, followed by the model that used the dominant runoff generation processes. From the statistical tests it was concluded that the models come from different populations, carrying different information layers. Analysis of the residuals of the models indicated that the permeability and runoff generation processes did provide complementary information. Simple linear models appeared to perform well in describing the winter storm flow coefficient at the meso-scale when a combination of the permeability of the substratum and dominating runoff generation processes served as independent parameters.


2019 ◽  
Vol 158 ◽  
pp. 394-400
Author(s):  
Taha Houcine Kerbaa ◽  
Amar Mezache ◽  
Houcine Oudira

Sign in / Sign up

Export Citation Format

Share Document