Robust data worth analysis with surrogate models in groundwater

Author(s):  
Moritz Gosses ◽  
Thomas Wöhling

<p>Physically-based groundwater models allow highly detailed spatial resolution, parameterization and process representation, among other advantages. Unfortunately, their size and complexity make many model applications computationally demanding. This is especially problematic for uncertainty and data worth analysis methods, which often require many model runs.</p><p>To alleviate the problem of high computational demand for the application of groundwater models for data worth analysis, we combine two different solutions:</p><ol><li>a) the use of surrogate models as faster alternatives to a complex model, and</li> <li>b) a robust data worth analysis method that is based on linear predictive uncertainty estimation, coupled with highly efficient null-space Monte Carlo techniques.</li> </ol><p>We compare the performance of a complex benchmark model of a real-world aquifer in New Zealand to two different surrogate models: a spatially and parametrically simplified version of the complex model, and a projection-based surrogate model created with proper orthogonal decomposition (POD). We generate predictive uncertainty estimates with all three models using linearization techniques implemented in the PEST Toolbox (Doherty 2016) and calculate the worth of existing, “future” and “parametric” data in relation to predictive uncertainty. To somewhat account for non-uniqueness of the model parameters, we use null-space Monte Carlo methods (Doherty 2016) to efficiently generate a multitude of calibrated model parameter sets. These are used to compute the variability of the data worth estimates generated by the three models.</p><p>Comparison between the results of the complex benchmark model and the two surrogates show good agreement for both surrogates in estimating the worth of the existing data sets for various model predictions. The simplified surrogate model shows difficulties in estimating worth of “future” data and is unable to reproduce “parametric” data worth due to its simplification in parameter representation. The POD model was able to successfully reproduce both “future” and “parametric” data worth for different predictions. Many of its data worth estimates exhibit a high variance, though, demonstrating the need of robust data worth methods as presented here which (to some degree) can account for parameter non-uniqueness.</p><p> </p><p>Literature:</p><p>Doherty, J., 2016. PEST: Model-Independent Parameter Estimation - User Manual. Watermark Numerical Computing, 6th Edition.</p>

Energies ◽  
2019 ◽  
Vol 12 (10) ◽  
pp. 1906 ◽  
Author(s):  
Mohamed Ibrahim ◽  
Saad Al-Sobhi ◽  
Rajib Mukherjee ◽  
Ahmed AlNouss

Data-driven models are essential tools for the development of surrogate models that can be used for the design, operation, and optimization of industrial processes. One approach of developing surrogate models is through the use of input–output data obtained from a process simulator. To enhance the model robustness, proper sampling techniques are required to cover the entire domain of the process variables uniformly. In the present work, Monte Carlo with pseudo-random samples as well as Latin hypercube samples and quasi-Monte Carlo samples with Hammersley Sequence Sampling (HSS) are generated. The sampled data obtained from the process simulator are fitted to neural networks for generating a surrogate model. An illustrative case study is solved to predict the gas stabilization unit performance. From the developed surrogate models to predict process data, it can be concluded that of the different sampling methods, Latin hypercube sampling and HSS have better performance than the pseudo-random sampling method for designing the surrogate model. This argument is based on the maximum absolute value, standard deviation, and the confidence interval for the relative average error as obtained from different sampling techniques.


2020 ◽  
Author(s):  
Daniel Erdal ◽  
Sinan Xiao ◽  
Wolfgang Nowak ◽  
Olaf A. Cirpka

<p>Global sensitivity analysis and uncertainty quantification of nonlinear models may be performed using ensembles of model runs. However, already in moderately complex models many combinations of parameters, which appear reasonable by prior knowledge, can lead to unrealistic model outcomes, like perennial rivers that fall dry in the model or simulated severe floodings that have not been observed in the real system. We denote these parameter combinations with implausible outcome as “non-behavior”. Creating a sufficiently large ensemble of behavioral model realizations can be computationally prohibitive, if the individual model runs are expensive and only a small fraction of the parameter space is behavioral. In this work, we design a stochastic, sequential sampling engine that utilizes fast and simple surrogate models trained on past realizations of the original, complex model. Our engine uses the surrogate model to estimate whether a candidate realization will turn out to be behavioral or not. Only parameter sets that with a reasonable certainty of being behavioral (as predicted by the surrogate model) are simulated using the original, complex model. For a subsurface flow model of a small south-western German catchment, we can show high accuracy in the surrogate model predictions regarding the behavioral status of the parameter sets. This increases the fraction of behavioral model runs (actually computed with the original, complex model) over total complex-models runs to 20-90%, compared to 0.1% without our method (e.g., using brute-force Monte Carlo sampling).  This notable performance increase depends on the choice of surrogate modeling technique. Towards this end, we consider both Gaussian Process Emulation (GPE) and models based on polynomials of active variables determined by Active Subspace decomposition as surrogate models. For the GPE-based surrogate model, we also compare random search and active learning strategies for the training of the surrogate model.</p>


2020 ◽  
Author(s):  
Kyle Mosley ◽  
David Applegate ◽  
James Mather ◽  
John Shevelan ◽  
Hannah Woollard

<p>The issue of safely dealing with radioactive waste has been addressed in several countries by opting for a geological disposal solution, in which the waste material is isolated in a subsurface repository. Safety assessments of such facilities require an in-depth understanding of the environment they are constructed in. Assessments are commonly underpinned by simulations of groundwater flow and transport, using numerical models of the subsurface. Accordingly, it is imperative that the level of uncertainty associated with key model outputs is accurately characterised and communicated. Only in this way can decisions on the long-term safety and operation of these facilities be effectively supported by modelling.</p><p>In view of this, a new approach for quantifying uncertainty in the modelling process has been applied to hydrogeological models for the UK Low Level Waste Repository, which is constructed in a complex system of Quaternary sediments of glacial origins. Model calibration was undertaken against a dataset of observed groundwater heads, acquired from a borehole monitoring network of over 200 locations. The new methodology comprises an evolution of the calibration process, in which greater emphasis is placed on understanding the propagation of uncertainty. This is supported by the development of methods for evaluating uncertainty in the observed heads data, as well as the application of mathematical regularisation tools (Doherty, 2018) to constrain the solution and ensure stability of the inversion. Additional information sources, such as data on the migration of key solutes, are used to further constrain specific model parameters. The sensitivity of model predictions to the representation of heterogeneity and other geological uncertainties is determined by smaller studies. Then, with the knowledge of posterior parameter uncertainty provided by the calibration process, the resulting implications for model predictive capacity can be explored. This is achieved using the calibration-constrained Monte Carlo methodology developed by Tonkin and Doherty (2009).</p><p>The new approach affords greater insight into the model calibration process, providing valuable information on the constraining influence of the observed data as it pertains to individual model parameters. Similarly, characterisation of the uncertainty associated with different model outputs provides a deeper understanding of the model’s predictive power. Such information can also be used to determine the appropriate level of model complexity; the guiding principle being that additional complexity is justified only where it contributes either to the characterisation of expert knowledge of the system, or to the model’s capacity to represent details of the system’s behaviour that are relevant for the predictions of interest (Doherty, 2015). Finally, the new approach enables more effective communication of modelling results – and limitations – to stakeholders, which should allow management decisions to be better supported by modelling work.</p><p><strong>References:</strong></p><ul><li>Doherty, J., 2015. <em>Calibration and Uncertainty Analysis for Complex Environmental Models</em>. Watermark Numerical Computing, Brisbane, Australia. ISBN: 978-0-9943786-0-6.</li> <li>Doherty, J., 2018. <em>PEST Model-Independent Parameter Estimation. User Manual Part I. 7<sup>th</sup> Edition.</em> Watermark Numerical Computing, Brisbane, Australia.</li> <li>Tonkin, M. and Doherty. J., 2009. <em>Calibration-constrained Monte Carlo analysis of highly parameterized models using subspace techniques.</em> Water Resources Research, 45, W00B10.</li> </ul>


2020 ◽  
Author(s):  
Citlali Cabrera Gutiérrez ◽  
Jean Christophe Jouhaud

<div> <div> <div> <p>Complex models calculations can be very expensive and time consuming. A surrogate model aims at producing results which are very close to the ones obtained using a complex model, but with largely reduced calculation times. Building a surrogate model requires only a few calculations with the real model. Once the surrogate model is built, further calculations can be quickly realized.</p> <p>In this study, we propose to build surrogate models by combining Proper Orthogonal Decomposition (POD) and kriging (also known as Gaussian Process Regression) for immediate forecasts. More precisely, we create surrogate models for rainfall forecasts on short deadlines. Currently rainfall forecasts in France are calculated for 15 minutes time laps using the AROME-PI model developed by M ́et ́eo-France. In this work, we show that the results obtained with our surrogate models are not only close to the ones obtained by AROME-PI, but they also have a better time resolution (1 minute) and a reduced calculation time.</p> </div> </div> </div>


1996 ◽  
Vol 33 (2) ◽  
pp. 79-90 ◽  
Author(s):  
Jian Hua Lei ◽  
Wolfgang Schilling

Physically-based urban rainfall-runoff models are mostly applied without parameter calibration. Given some preliminary estimates of the uncertainty of the model parameters the associated model output uncertainty can be calculated. Monte-Carlo simulation followed by multi-linear regression is used for this analysis. The calculated model output uncertainty can be compared to the uncertainty estimated by comparing model output and observed data. Based on this comparison systematic or spurious errors can be detected in the observation data, the validity of the model structure can be confirmed, and the most sensitive parameters can be identified. If the calculated model output uncertainty is unacceptably large the most sensitive parameters should be calibrated to reduce the uncertainty. Observation data for which systematic and/or spurious errors have been detected should be discarded from the calibration data. This procedure is referred to as preliminary uncertainty analysis; it is illustrated with an example. The HYSTEM program is applied to predict the runoff volume from an experimental catchment with a total area of 68 ha and an impervious area of 20 ha. Based on the preliminary uncertainty analysis, for 7 of 10 events the measured runoff volume is within the calculated uncertainty range, i.e. less than or equal to the calculated model predictive uncertainty. The remaining 3 events include most likely systematic or spurious errors in the observation data (either in the rainfall or the runoff measurements). These events are then discarded from further analysis. After calibrating the model the predictive uncertainty of the model is estimated.


Water ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 1484
Author(s):  
Dagmar Dlouhá ◽  
Viktor Dubovský ◽  
Lukáš Pospíšil

We present an approach for the calibration of simplified evaporation model parameters based on the optimization of parameters against the most complex model for evaporation estimation, i.e., the Penman–Monteith equation. This model computes the evaporation from several input quantities, such as air temperature, wind speed, heat storage, net radiation etc. However, sometimes all these values are not available, therefore we must use simplified models. Our interest in free water surface evaporation is given by the need for ongoing hydric reclamation of the former Ležáky–Most quarry, i.e., the ongoing restoration of the land that has been mined to a natural and economically usable state. For emerging pit lakes, the prediction of evaporation and the level of water plays a crucial role. We examine the methodology on several popular models and standard statistical measures. The presented approach can be applied in a general model calibration process subject to any theoretical or measured evaporation.


2021 ◽  
Vol 11 (11) ◽  
pp. 5234
Author(s):  
Jin Hun Park ◽  
Pavel Pereslavtsev ◽  
Alexandre Konobeev ◽  
Christian Wegmann

For the stable and self-sufficient functioning of the DEMO fusion reactor, one of the most important parameters that must be demonstrated is the Tritium Breeding Ratio (TBR). The reliable assessment of the TBR with safety margins is a matter of fusion reactor viability. The uncertainty of the TBR in the neutronic simulations includes many different aspects such as the uncertainty due to the simplification of the geometry models used, the uncertainty of the reactor layout and the uncertainty introduced due to neutronic calculations. The last one can be reduced by applying high fidelity Monte Carlo simulations for TBR estimations. Nevertheless, these calculations have inherent statistical errors controlled by the number of neutron histories, straightforward for a quantity such as that of TBR underlying errors due to nuclear data uncertainties. In fact, every evaluated nuclear data file involved in the MCNP calculations can be replaced with the set of the random data files representing the particular deviation of the nuclear model parameters, each of them being correct and valid for applications. To account for the uncertainty of the nuclear model parameters introduced in the evaluated data file, a total Monte Carlo (TMC) method can be used to analyze the uncertainty of TBR owing to the nuclear data used for calculations. To this end, two 3D fully heterogeneous geometry models of the helium cooled pebble bed (HCPB) and water cooled lithium lead (WCLL) European DEMOs were utilized for the calculations of the TBR. The TMC calculations were performed, making use of the TENDL-2017 nuclear data library random files with high enough statistics providing a well-resolved Gaussian distribution of the TBR value. The assessment was done for the estimation of the TBR uncertainty due to the nuclear data for entire material compositions and for separate materials: structural, breeder and neutron multipliers. The overall TBR uncertainty for the nuclear data was estimated to be 3~4% for the HCPB and WCLL DEMOs, respectively.


Sign in / Sign up

Export Citation Format

Share Document