scholarly journals Limitations of Bayesian Leave-One-Out Cross-Validation for Model Selection

Author(s):  
Quentin Frederik Gronau ◽  
Eric-Jan Wagenmakers

Cross-validation (CV) is increasingly popular as a generic method to adjudicate between mathematical models of cognition and behavior. In order to measure model generalizability, CV quantifies out-of-sample predictive performance, and the CV preference goes to the model that predicted the out-of-sample data best. The advantages of CV include theoretic simplicity and practical feasibility. Despite its prominence, however, the limitations of CV are often underappreciated. Here we demonstrate the limitations of a particular form of CV --Bayesian leave-one-out cross-validation or LOO-- with three concrete examples. In each example, a data set of infinite size is perfectly in line with the predictions of a simple model (i.e., a general law or invariance). Nevertheless, LOO shows bounded and relatively modest support for the simple model. We conclude that CV is not a panacea for model selection.

2019 ◽  
Vol 56 (4) ◽  
pp. 514-528 ◽  
Author(s):  
Stijn van Weezel

This study exploits a sudden and abrupt decline in precipitation of the long rains season in the Horn of Africa to analyze the possible link between climate change and violent armed conflict. Following the 1998 El Niño there has been an overall reduction in precipitation levels – associated with sea-surface temperature changes in the Indian and Pacific Oceans – resulting in an increase in the number and severity of droughts. Given that the probable cause of this shift is anthropogenic forcing, it provides a unique opportunity to study the effect of climate change on society compared to statistical inference based on weather variation. Focusing on communal conflict in Ethiopia and Kenya between 1999 and 2014, exploiting cross-sectional variation across districts, the regression analysis links the precipitation decline to an additional 1.3 conflict events per district. The main estimates show that there is a negative correlation between precipitation and communal conflict with a probability of 0.90. Changing model specification to consider plausible alternative models and accommodate other identifying assumptions produces broadly similar results. The generaliziability of the link between precipitation decline and conflict breaks down when using out-of-sample cross-validation to test the external validity. A leave-one-out cross-validation exercise shows that accounting for climate contributes relatively little to improving the predictive performance of the model. This suggests that there are other more salient factors underlying communal violence in Ethiopia and Kenya. As such, in this case the link between climate and conflict should not be overstated.


2020 ◽  
Vol 10 (1) ◽  
pp. 1-11
Author(s):  
Arvind Shrivastava ◽  
Nitin Kumar ◽  
Kuldeep Kumar ◽  
Sanjeev Gupta

The paper deals with the Random Forest, a popular classification machine learning algorithm to predict bankruptcy (distress) for Indian firms. Random Forest orders firms according to their propensity to default or their likelihood to become distressed. This is also useful to explain the association between the tendency of firm failure and its features. The results are analyzed vis-à-vis Tree Net. Both in-sample and out of sample estimations have been performed to compare Random Forest with Tree Net, which is a cutting edge data mining tool known to provide satisfactory estimation results. An exhaustive data set comprising companies from varied sectors have been included in the analysis. It is found that Tree Net procedure provides improved classification and predictive performance vis-à-vis Random Forest methodology consistently that may be utilized further by industry analysts and researchers alike for predictive purposes.


Author(s):  
Federico Belotti ◽  
Franco Peracchi

In this article, we describe jackknife2, a new prefix command for jackknifing linear estimators. It takes full advantage of the available leave-one-out formula, thereby allowing for substantial reduction in computing time. Of special note is that jackknife2 allows the user to compute cross-validation and diagnostic measures that are currently not available after ivregress 2sls, xtreg, and xtivregress.


2020 ◽  
Vol 10 (7) ◽  
pp. 2448
Author(s):  
Liye Lv ◽  
Xueguan Song ◽  
Wei Sun

The leave-one-out cross validation (LOO-CV), which is a model-independent evaluate method, cannot always select the best of several models when the sample size is small. We modify the LOO-CV method by moving a validation point around random normal distributions—rather than leaving it out—naming it the move-one-away cross validation (MOA-CV), which is a model-dependent method. The key point of this method is to improve the accuracy rate of model selection that is unreliable in LOO-CV without enough samples. Errors from LOO-CV and MOA-CV, i.e., LOO-CVerror and MOA-CVerror, respectively, are employed to select the best one of four typical surrogate models through four standard mathematical functions and one engineering problem. The coefficient of determination (R-square, R2) is used to be a calibration of MOA-CVerror and LOO-CVerror. Results show that: (i) in terms of selecting the best models, MOA-CV and LOO-CV become better as sample size increases; (ii) MOA-CV has a better performance in selecting best models than LOO-CV; (iii) in the engineering problem, both the MOA-CV and LOO-CV can choose the worst models, and in most cases, MOA-CV has a higher probability to select the best model than LOO-CV.


2019 ◽  
Vol 57 (2) ◽  
pp. 314-323 ◽  
Author(s):  
Jamal Ouenniche ◽  
Oscar Javier Uvalle Perez ◽  
Aziz Ettouhami

PurposeNowadays, the field of data analytics is witnessing an unprecedented interest from a variety of stakeholders. The purpose of this paper is to contribute to the subfield of predictive analytics by proposing a new non-parametric classifier.Design/methodology/approachThe proposed new non-parametric classifier performs both in-sample and out-of-sample predictions, where in-sample predictions are devised with a new Evaluation Based on Distance from Average Solution (EDAS)-based classifier, and out-of-sample predictions are devised with a CBR-based classifier trained on the class predictions provided by the proposed EDAS-based classifier.FindingsThe performance of the proposed new non-parametric classification framework is tested on a data set of UK firms in predicting bankruptcy. Numerical results demonstrate an outstanding predictive performance, which is robust to the implementation decisions’ choices.Practical implicationsThe exceptional predictive performance of the proposed new non-parametric classifier makes it a real contender in actual applications in areas such as finance and investment, internet security, fraud and medical diagnosis, where the accuracy of the risk-class predictions has serious consequences for the relevant stakeholders.Originality/valueOver and above the design elements of the new integrated in-sample-out-of-sample classification framework and its non-parametric nature, it delivers an outstanding predictive performance for a bankruptcy prediction application.


2018 ◽  
Vol 124 (5) ◽  
pp. 1284-1293 ◽  
Author(s):  
Alexander H. K. Montoye ◽  
Bradford S. Westgate ◽  
Morgan R. Fonley ◽  
Karin A. Pfeiffer

Wrist-worn accelerometers are gaining popularity for measurement of physical activity. However, few methods for predicting physical activity intensity from wrist-worn accelerometer data have been tested on data not used to create the methods (out-of-sample data). This study utilized two previously collected data sets [Ball State University (BSU) and Michigan State University (MSU)] in which participants wore a GENEActiv accelerometer on the left wrist while performing sedentary, lifestyle, ambulatory, and exercise activities in simulated free-living settings. Activity intensity was determined via direct observation. Four machine learning models (plus 2 combination methods) and six feature sets were used to predict activity intensity (30-s intervals) with the accelerometer data. Leave-one-out cross-validation and out-of-sample testing were performed to evaluate accuracy in activity intensity prediction, and classification accuracies were used to determine differences among feature sets and machine learning models. In out-of-sample testing, the random forest model (77.3–78.5%) had higher accuracy than other machine learning models (70.9–76.4%) and accuracy similar to combination methods (77.0–77.9%). Feature sets utilizing frequency-domain features had improved accuracy over other feature sets in leave-one-out cross-validation (92.6–92.8% vs. 87.8–91.9% in MSU data set; 79.3–80.2% vs. 76.7–78.4% in BSU data set) but similar or worse accuracy in out-of-sample testing (74.0–77.4% vs. 74.1–79.1% in MSU data set; 76.1–77.0% vs. 75.5–77.3% in BSU data set). All machine learning models outperformed the euclidean norm minus one/GGIR method in out-of-sample testing (69.5–78.5% vs. 53.6–70.6%). From these results, we recommend out-of-sample testing to confirm generalizability of machine learning models. Additionally, random forest models and feature sets with only time-domain features provided the best accuracy for activity intensity prediction from a wrist-worn accelerometer. NEW & NOTEWORTHY This study includes in-sample and out-of-sample cross-validation of an alternate method for deriving meaningful physical activity outcomes from accelerometer data collected with a wrist-worn accelerometer. This method uses machine learning to directly predict activity intensity. By so doing, this study provides a classification model that may avoid high errors present with energy expenditure prediction while still allowing researchers to assess adherence to physical activity guidelines.


Sign in / Sign up

Export Citation Format

Share Document