Applying Non-Random Block Cross-Validation to Improve Reliability of Model Selection and Evaluation in Hydrology: An illustration using an algorithmic model of seasonal snowpack

Author(s):  
Charles Luce ◽  
Abigail Lute

<p>A central question in model structural uncertainty is how complex a model should be in order to have greatest generality or transferability.  One school of thought is that models become more general by adding process subroutines.  On the other hand, model parameters and structures have been shown to change significantly when calibrated to different basins or time periods, suggesting that model complexity and model transferability may be antithetical.  An important facet to this discussion is noting that validation methods and data applied to model evaluation and selection may tend to bias answers to this question.  Here we apply non-random block cross-validation as a direct assessment of model transferability to a series of algorithmic space-time models of April 1 snow water equivalent (SWE) across 497 SNOTEL stations for 20 years.  In general, we show that low to moderate complexity models transfer most successfully to new conditions in space and time.  In other words, there is an optimum between overly complex and overly simple models.  Because structures in data resulting from temporal dynamics and spatial dependency in atmospheric and hydrological processes exist, naïvely applied cross-validation practices can lead to overfitting, overconfidence in model precision or reliability, and poor ability to infer causal mechanisms.  For example, random k-fold cross-validation methods, which are in common use for evaluating models, essentially assume independence of the data and would promote selection of more complex models.  We further demonstrate that blocks sampled with pseudoreplicated data can produce similar outcomes.  Some sampling strategies favored for hydrologic model validation may tend to promote pseudoreplication, requiring heightened attentiveness for model selection and evaluation.  While the illustrative examples are drawn from snow modeling, the concepts can be readily applied to common hydrologic modeling issues.</p>

2008 ◽  
Vol 26 (3) ◽  
pp. 275-292 ◽  
Author(s):  
Geng Cui ◽  
Man Leung Wong ◽  
Guichang Zhang ◽  
Lin Li

PurposeThe purpose of this paper is to assess the performance of competing methods and model selection, which are non‐trivial issues given the financial implications. Researchers have adopted various methods including statistical models and machine learning methods such as neural networks to assist decision making in direct marketing. However, due to the different performance criteria and validation techniques currently in practice, comparing different methods is often not straightforward.Design/methodology/approachThis study compares the performance of neural networks with that of classification and regression tree, latent class models and logistic regression using three criteria – simple error rate, area under the receiver operating characteristic curve (AUROC), and cumulative lift – and two validation methods, i.e. bootstrap and stratified k‐fold cross‐validation. Systematic experiments are conducted to compare their performance.FindingsThe results suggest that these methods vary in performance across different criteria and validation methods. Overall, neural networks outperform the others in AUROC value and cumulative lifts, and the stratified ten‐fold cross‐validation produces more accurate results than bootstrap validation.Practical implicationsTo select predictive models to support direct marketing decisions, researchers need to adopt appropriate performance criteria and validation procedures.Originality/valueThe study addresses the key issues in model selection, i.e. performance criteria and validation methods, and conducts systematic analyses to generate the findings and practical implications.


2015 ◽  
Vol 12 (4) ◽  
pp. 3945-4004 ◽  
Author(s):  
S. Pande ◽  
L. Arkesteijn ◽  
H. Savenije ◽  
L. A. Bastidas

Abstract. This paper shows that instability of hydrological system representation in response to different pieces of information and associated prediction uncertainty is a function of model complexity. After demonstrating the connection between unstable model representation and model complexity, complexity is analyzed in a step by step manner. This is done measuring differences between simulations of a model under different realizations of input forcings. Algorithms are then suggested to estimate model complexity. Model complexities of the two model structures, SAC-SMA (Sacramento Soil Moisture Accounting) and its simplified version SIXPAR (Six Parameter Model), are computed on resampled input data sets from basins that span across the continental US. The model complexities for SIXPAR are estimated for various parameter ranges. It is shown that complexity of SIXPAR increases with lower storage capacity and/or higher recession coefficients. Thus it is argued that a conceptually simple model structure, such as SIXPAR, can be more complex than an intuitively more complex model structure, such as SAC-SMA for certain parameter ranges. We therefore contend that magnitudes of feasible model parameters influence the complexity of the model selection problem just as parameter dimensionality (number of parameters) does and that parameter dimensionality is an incomplete indicator of stability of hydrological model selection and prediction problems.


2011 ◽  
Vol 14 (2) ◽  
pp. 443-463 ◽  
Author(s):  
Saket Pande ◽  
Luis A. Bastidas ◽  
Sandjai Bhulai ◽  
Mac McKee

We provide analytical bounds on convergence rates for a class of hydrologic models and consequently derive a complexity measure based on the Vapnik–Chervonenkis (VC) generalization theory. The class of hydrologic models is a spatially explicit interconnected set of linear reservoirs with the aim of representing globally nonlinear hydrologic behavior by locally linear models. Here, by convergence rate, we mean convergence of the empirical risk to the expected risk. The derived measure of complexity measures a model's propensity to overfit data. We explore how data finiteness can affect model selection for this class of hydrologic model and provide theoretical results on how model performance on a finite sample converges to its expected performance as data size approaches infinity. These bounds can then be used for model selection, as the bounds provide a tradeoff between model complexity and model performance on finite data. The convergence bounds for the considered hydrologic models depend on the magnitude of their parameters, which are the recession parameters of constituting linear reservoirs. Further, the complexity of hydrologic models not only varies with the magnitude of their parameters but also depends on the network structure of the models (in terms of the spatial heterogeneity of parameters and the nature of hydrologic connectivity).


2021 ◽  
Author(s):  
Thi Lan Anh Dinh ◽  
Filipe Aires

Abstract. The use of statistical models to study the impact of weather on crop yield has not ceased to increase. Unfortunately, this type of application is characterised by datasets with a very limited number of samples (typically one sample per year). In general, statistical inference uses three datasets: the training dataset to optimise the model parameters, the validation datasets to select the best model, and the testing dataset to evaluate the model generalisation ability. Splitting the overall database into three datasets is impossible in crop yield modelling. The leave-one-out cross-validation method or simply leave-one-out (LOO) has been introduced to facilitate statistical modelling when the database is limited. However, the model choice is made using the testing dataset, which can be misleading by favouring unnecessarily complex models. The nested cross-validation approach was introduced in machine learning to avoid this problem by truly utilising three datasets, especially problems with limited databases. In this study, we proposed one particular implementation of the nested cross-validation, called the leave-two-out method (LTO), to chose the best model with an optimal model complexity (using the validation dataset) and estimated the true model quality (using the testing dataset). Two applications are considered: Robusta coffee in Cu M'gar (Dak Lak, Vietnam) and grain maize over 96 French departments. In both cases, LOO is misleading by choosing too complex models; LTO indicates that simpler models actually perform better when a reliable generalisation test is considered. The simple models obtained using the LTO approach have reasonable yield anomaly forecasting skills in both study crops. This LTO approach can also be used in seasonal forecasting applications. We suggest that the LTO method should become a standard procedure for statistical crop modelling.


1996 ◽  
Vol 8 (3) ◽  
pp. 583-593 ◽  
Author(s):  
Mikko Lehtokangas ◽  
Jukka Saarinen ◽  
Pentti Huuhtanen ◽  
Kimmo Kaski

Nonlinear time series modeling with a multilayer perceptron network is presented. An important aspect of this modeling is the model selection, i.e., the problem of determining the size as well as the complexity of the model. To overcome this problem we apply the predictive minimum description length (PMDL) principle as a minimization criterion. In the neural network scheme it means minimizing the number of input and hidden units. Three time series modeling experiments are used to examine the usefulness of the PMDL model selection scheme. A comparison with the widely used cross-validation technique is also presented. In our experiments the PMDL scheme and the cross-validation scheme yield similar results in terms of model complexity. However, the PMDL method was found to be two times faster to compute. This is significant improvement since model selection in general is very time consuming.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Nick Pepper ◽  
Luca Gerardo-Giorda ◽  
Francesco Montomoli

Abstract Invasive species are recognized as a significant threat to biodiversity. The mathematical modeling of their spatio-temporal dynamics can provide significant help to environmental managers in devising suitable control strategies. Several mathematical approaches have been proposed in recent decades to efficiently model the dispersal of invasive species. Relying on the assumption that the dispersal of an individual is random, but the density of individuals at the scale of the population can be considered smooth, reaction-diffusion models are a good trade-off between model complexity and flexibility for use in different situations. In this paper we present a continuous reaction-diffusion model coupled with arbitrary Polynomial Chaos (aPC) to assess the impact of uncertainties in the model parameters. We show how the finite elements framework is well-suited to handle important landscape heterogeneities as elevation and the complex geometries associated with the boundaries of an actual geographical region. We demonstrate the main capabilities of the proposed coupled model by assessing the uncertainties in the invasion of an alien species invading the Basque Country region in Northern Spain.


2005 ◽  
Vol 44 (03) ◽  
pp. 438-443 ◽  
Author(s):  
R. Spang ◽  
F. Markowetz

Summary Objectives: We discuss supervised classification techniques applied to medical diagnosis based on gene expression profiles. Our focus lies on strategies of adaptive model selection to avoid overfitting in high-dimensional spaces. Methods: We introduce likelihood-based methods, classification trees, support vector machines and regularized binary regression. For regularization by dimension reduction, we describe feature selection methods: feature filtering, feature shrinkage and wrapper approaches. In small sample-size situations efficient methods of data re-use are needed to assess the predictive power of a model. We discuss two issues in using cross-validation: the difference between in-loop and out-of-loop feature selection, and estimating model parameters in nested-loop cross-validation. Results: Gene selection does not reduce the dimensionality of the model. Tuning parameters enable adaptive model selection. The feature selection bias is a common pitfall in performance evaluation. Model selection and performance evaluation can be combined by nested-loop cross-validation. Conclusions: Classification of microarrays is prone to overfitting. A rigorous and unbiased assessment of the predictive power of the model is a must.


Author(s):  
Daniel Bittner ◽  
Beatrice Richieri ◽  
Gabriele Chiogna

AbstractUncertainties in hydrologic model outputs can arise for many reasons such as structural, parametric and input uncertainty. Identification of the sources of uncertainties and the quantification of their impacts on model results are important to appropriately reproduce hydrodynamic processes in karst aquifers and to support decision-making. The present study investigates the time-dependent relevance of model input uncertainties, defined as the conceptual uncertainties affecting the representation and parameterization of processes relevant for groundwater recharge, i.e. interception, evapotranspiration and snow dynamic, on the lumped karst model LuKARS. A total of nine different models are applied, three to compute interception (DVWK, Gash and Liu), three to compute evapotranspiration (Thornthwaite, Hamon and Oudin) and three to compute snow processes (Martinec, Girons Lopez and Magnusson). All the input model combinations are tested for the case study of the Kerschbaum spring in Austria. The model parameters are kept constant for all combinations. While parametric uncertainties computed for the same model in previous studies do not show pronounced temporal variations, the results of the present work show that input uncertainties are seasonally varying. Moreover, the input uncertainties of evapotranspiration and snowmelt are higher than the interception uncertainties. The results show that the importance of a specific process for groundwater recharge can be estimated from the respective input uncertainties. These findings have practical implications as they can guide researchers to obtain relevant field data to improve the representation of different processes in lumped parameter models and to support model calibration.


2010 ◽  
Vol 11 (3) ◽  
pp. 781-796 ◽  
Author(s):  
Jonathan J. Gourley ◽  
Scott E. Giangrande ◽  
Yang Hong ◽  
Zachary L. Flamig ◽  
Terry Schuur ◽  
...  

Abstract Rainfall estimated from the polarimetric prototype of the Weather Surveillance Radar-1988 Doppler [WSR-88D (KOUN)] was evaluated using a dense Micronet rain gauge network for nine events on the Ft. Cobb research watershed in Oklahoma. The operation of KOUN and its upgrade to dual polarization was completed by the National Severe Storms Laboratory. Storm events included an extreme rainfall case from Tropical Storm Erin that had a 100-yr return interval. Comparisons with collocated Micronet rain gauge measurements indicated all six rainfall algorithms that used polarimetric observations had lower root-mean-squared errors and higher Pearson correlation coefficients than the conventional algorithm that used reflectivity factor alone when considering all events combined. The reflectivity based relation R(Z) was the least biased with an event-combined normalized bias of −9%. The bias for R(Z), however, was found to vary significantly from case to case and as a function of rainfall intensity. This variability was attributed to different drop size distributions (DSDs) and the presence of hail. The synthetic polarimetric algorithm R(syn) had a large normalized bias of −31%, but this bias was found to be stationary. To evaluate whether polarimetric radar observations improve discharge simulation, recent advances in Markov Chain Monte Carlo simulation using the Hydrology Laboratory Research Distributed Hydrologic Model (HL-RDHM) were used. This Bayesian approach infers the posterior probability density function of model parameters and output predictions, which allows us to quantify HL-RDHM uncertainty. Hydrologic simulations were compared to observed streamflow and also to simulations forced by rain gauge inputs. The hydrologic evaluation indicated that all polarimetric rainfall estimators outperformed the conventional R(Z) algorithm, but only after their long-term biases were identified and corrected.


2021 ◽  
Author(s):  
Mikhail Sviridov ◽  
◽  
Anton Mosin ◽  
Sergey Lebedev ◽  
Ron Thompson ◽  
...  

While proactive geosteering, special inversion algorithms are used to process the readings of logging-while-drilling resistivity tools in real-time and provide oil field operators with formation models to make informed steering decisions. Currently, there is no industry standard for inversion deliverables and corresponding quality indicators because major tool vendors develop their own device-specific algorithms and use them internally. This paper presents the first implementation of vendor-neutral inversion approach applicable for any induction resistivity tool and enabling operators to standardize the efficiency of various geosteering services. The necessity of such universal inversion approach was inspired by the activity of LWD Deep Azimuthal Resistivity Services Standardization Workgroup initiated by SPWLA Resistivity Special Interest Group in 2016. Proposed inversion algorithm utilizes a 1D layer-cake formation model and is performed interval-by-interval. The following model parameters can be determined: horizontal and vertical resistivities of each layer, positions of layer boundaries, and formation dip. The inversion can support arbitrary deep azimuthal induction resistivity tool with coaxial, tilted, or orthogonal transmitting and receiving antennas. The inversion is purely data-driven; it works in automatic mode and provides fully unbiased results obtained from tool readings only. The algorithm is based on statistical reversible-jump Markov chain Monte Carlo method that does not require any predefined assumptions about the formation structure and enables searching of models explaining the data even if the number of layers in the model is unknown. To globalize search, the algorithm runs several Markov chains capable of exchanging their states between one another to move from the vicinity of local minimum to more perspective domain of model parameter space. While execution, the inversion keeps all models it is dealing with to estimate the resolution accuracy of formation parameters and generate several quality indicators. Eventually, these indicators are delivered together with recovered resistivity models to help operators with the evaluation of inversion results reliability. To ensure high performance of the inversion, a fast and accurate semi-analytical forward solver is employed to compute required responses of a tool with specific geometry and their derivatives with respect to any parameter of multi-layered model. Moreover, the reliance on the simultaneous evolution of multiple Markov chains makes the algorithm suitable for parallel execution that significantly decreases the computational time. Application of the proposed inversion is shown on a series of synthetic examples and field case studies such as navigating the well along the reservoir roof or near the oil-water-contact in oil sands. Inversion results for all scenarios confirm that the proposed algorithm can successfully evaluate formation model complexity, recover model parameters, and quantify their uncertainty within a reasonable computational time. Presented vendor-neutral stochastic approach to data processing leads to the standardization of the inversion output including the resistivity model and its quality indicators that helps operators to better understand capabilities of tools from different vendors and eventually make more confident geosteering decisions.


Sign in / Sign up

Export Citation Format

Share Document