scholarly journals Technical note: Diagnostic efficiency – specific evaluation of model performance

2021 ◽  
Vol 25 (4) ◽  
pp. 2187-2198
Author(s):  
Robin Schwemmle ◽  
Dominic Demand ◽  
Markus Weiler

Abstract. A better understanding of the reasons why hydrological model performance is unsatisfying represents a crucial part of meaningful model evaluation. However, current evaluation efforts are mostly based on aggregated efficiency measures such as Kling–Gupta efficiency (KGE) or Nash–Sutcliffe efficiency (NSE). These aggregated measures provide a relative gradation of model performance. Especially in the case of a weak model performance it is important to identify the different errors which may have caused such unsatisfactory predictions. These errors may originate from the model parameters, the model structure, and/or the input data. In order to provide more insight, we define three types of errors which may be related to their source: constant error (e.g. caused by consistent input data error such as precipitation), dynamic error (e.g. structural model errors such as a deficient storage routine) and timing error (e.g. caused by input data errors or deficient model routines/parameters). Based on these types of errors, we propose the novel diagnostic efficiency (DE) measure, which accounts for these three error types. The disaggregation of DE into its three metric terms can be visualized in a plain radial space using diagnostic polar plots. A major advantage of this visualization technique is that error contributions can be clearly differentiated. In order to provide a proof of concept, we first generated time series artificially with the three different error types (i.e. simulations are surrogated by manipulating observations). By computing DE and the related diagnostic polar plots for the reproduced errors, we could then supply evidence for the concept. Finally, we tested the applicability of our approach for a modelling example. For a particular catchment, we compared streamflow simulations realized with different parameter sets to the observed streamflow. For this modelling example, the diagnostic polar plot suggests that dynamic errors explain the overall error to a large extent. The proposed evaluation approach provides a diagnostic tool for model developers and model users and the diagnostic polar plot facilitates interpretation of the proposed performance measure as well as a relative gradation of model performance similar to the well-established efficiency measures in hydrology.

2020 ◽  
Author(s):  
Robin Schwemmle ◽  
Dominic Demand ◽  
Markus Weiler

Abstract. Better understanding of the reasons why hydrological model performance is good or poor represents a crucial part for meaningful model evaluation. However, current evaluation efforts are mostly based on aggregated efficiency measures such as Kling-Gupta Efficiency (KGE) or Nash-Sutcliffe Efficiency (NSE). These aggregated measures only distinguish between good and poor model performance. Especially in the case of a poor model performance it is important to identify the different errors which may have caused such unsatisfying predictions. These errors may origin from the model parameters, the model structure, and/or the input data. In order to provide more insight, we define three types of errors which may be related to their origin: constant error (e.g. caused by consistent input data error such as precipitation), dynamic error (e.g. structural model errors such as a deficient storage routine) and timing error (e.g. caused by input data errors or deficient model routines/parameters). Based on these types of errors, we propose the novel Diagnostic Efficiency (DE) measure, which accounts for the three error types. The disaggregation of DE into its three metric terms can be visualized in a plain radial space using diagnostic polar plots. A major advantage of this visualization technique is that error contributions can be clearly differentiated. In order to provide a proof of concept, we first generated errors systematically by mimicking the three error types (i.e. simulations are surrogated by manipulating observations). By computing DE and the related diagnostic polar plots for the mimicked errors, we could then supply evidence for the concept. Finally, we tested the applicability of our approach for a modelling example. For a particular catchment, we compared streamflow simulations realized with different parameter sets to the observed streamflow. For this modelling example, the diagnostic polar plot suggests, that dynamic errors explain the model performance to a large extent. The proposed evaluation approach provides a diagnostic tool for model developers and model users and the diagnostic polar plot facilitates interpretation of the proposed performance measure.


2020 ◽  
Author(s):  
Robin Schwemmle ◽  
Dominic Demand ◽  
Markus Weiler

<p>A better understanding of what is causing the performance of hydrological models to be “poor” or “good” is crucial for a diagnostically meaningful evaluation approach. However, current evaluation efforts are mostly based on aggregated efficiency measures such as Kling-Gupta Efficiency (<em>KGE</em>) and Nash-Sutcliffe Efficiency (<em>NSE</em>). These aggregated measures allow to distinguish between “poor” and “good” model performance only. Especially in case of “poor” model performance it is important to identify the errors which may have caused such unsatisfying simulations. These errors may have their origin in the model parameters, the model structure, and/or the input data. In order to provide insight into the origin of the error, we define three types of errors which may be related to the source of error: constant error (e.g. caused by consistent precipitation overestimation), dynamic error (e.g. caused by deficient vertical redistribution) and timing error (e.g. caused by precipitation or infiltration routine). Based on these types of errors, we propose the novel Diagnostic Efficiency (<em>DE</em>) measure, which accounts for the three error types by representing them in three individual metric components. The disaggregation of <em>DE</em> into its three metric components can be used for visualization in a 2-D space using a diagnostic polar plot. A major advantage of this visualization technique is that regions of error terms can be clearly distinguished from each other. In order to prove our concept, we first systematically generated errors by mimicking the three error types (i.e. simulations are calculated by manipulating observations). Secondly, by computing <em>DE</em> and the related diagnostic polar plots for the mimicked errors, we could supply evidence of the concept. Moreover, we tested our approach for a real case example. For this we used the CAMELS dataset. In particular, we compared streamflow simulations of a single catchment realized with different parameter sets to the observed streamflow. For this real case example the diagnostic polar plot suggests, that dynamic errors explain the model performance to a large extent. With the proposed evaluation approach, we aim to provide a diagnostic tool for model developers and model users. Particularly, the diagnostic polar plot enables hydrological interpretation of the proposed performance measure.</p>


2008 ◽  
Vol 5 (3) ◽  
pp. 1641-1675 ◽  
Author(s):  
A. Bárdossy ◽  
S. K. Singh

Abstract. The estimation of hydrological model parameters is a challenging task. With increasing capacity of computational power several complex optimization algorithms have emerged, but none of the algorithms gives an unique and very best parameter vector. The parameters of hydrological models depend upon the input data. The quality of input data cannot be assured as there may be measurement errors for both input and state variables. In this study a methodology has been developed to find a set of robust parameter vectors for a hydrological model. To see the effect of observational error on parameters, stochastically generated synthetic measurement errors were applied to observed discharge and temperature data. With this modified data, the model was calibrated and the effect of measurement errors on parameters was analysed. It was found that the measurement errors have a significant effect on the best performing parameter vector. The erroneous data led to very different optimal parameter vectors. To overcome this problem and to find a set of robust parameter vectors, a geometrical approach based on the half space depth was used. The depth of the set of N randomly generated parameters was calculated with respect to the set with the best model performance (Nash-Sutclife efficiency was used for this study) for each parameter vector. Based on the depth of parameter vectors, one can find a set of robust parameter vectors. The results show that the parameters chosen according to the above criteria have low sensitivity and perform well when transfered to a different time period. The method is demonstrated on the upper Neckar catchment in Germany. The conceptual HBV model was used for this study.


2013 ◽  
Vol 70 (3) ◽  
pp. 470-484 ◽  
Author(s):  
Kiersten L. Curti ◽  
Jeremy S. Collie ◽  
Christopher M. Legault ◽  
Jason S. Link

Predation is a substantial source of mortality that is a function of the abundance of predator and prey species. This source of mortality creates a challenge of incorporating species interactions in statistical catch-at-age models in a way that accounts for the uncertainty in input data, parameters, and results. We developed a statistical, age-structured, multispecies model for three important species in the Georges Bank fish community: Atlantic cod (Gadus morhua), silver hake (Merluccius bilinearis), and Atlantic herring (Clupea harengus). The model was fit to commercial catch, survey, and diet data from 1978 to 2007. The estimated predation rates were high, compared with fishing mortality, and variable with time. The dynamics of the three species can be explained by the interplay between fishing and predation mortality. Monte Carlo simulations were used to evaluate the ability of the model to estimate parameters with known error introduced into each of the data types. The model parameters could be estimated with confidence from input data with error levels similar to those obtained from the model fit to the observed data. This evaluation of model performance should help to move multispecies statistical catch-at-age models from proof of concept to functional tools for ecosystem-based fisheries management.


2016 ◽  
Vol 18 (6) ◽  
pp. 961-974 ◽  
Author(s):  
Younggu Her ◽  
Conrad Heatwole

Parameter uncertainty in hydrologic modeling is commonly evaluated, but assessing the impact of spatial input data uncertainty in spatially descriptive ‘distributed’ models is not common. This study compares the significance of uncertainty in spatial input data and model parameters on the output uncertainty of a distributed hydrology and sediment transport model, HYdrology Simulation using Time-ARea method (HYSTAR). The Shuffled Complex Evolution Metropolis (SCEM-UA) algorithm was used to quantify parameter uncertainty of the model. Errors in elevation and land cover layers were simulated using the Sequential Gaussian/Indicator Simulation (SGS/SIS) techniques and then incorporated into the model to evaluate their impact on the outputs relative to those of the parameter uncertainty. This study demonstrated that parameter uncertainty had a greater impact on model output than did errors in the spatial input data. In addition, errors in elevation data had a greater impact on model output than did errors in land cover data. Thus, for the HYSTAR distributed hydrologic model, accuracy and reliability can be improved more effectively by refining parameters rather than further improving the accuracy of spatial input data and by emphasizing the topographic data over the land cover data.


2008 ◽  
Vol 12 (6) ◽  
pp. 1273-1283 ◽  
Author(s):  
A. Bárdossy ◽  
S. K. Singh

Abstract. The estimation of hydrological model parameters is a challenging task. With increasing capacity of computational power several complex optimization algorithms have emerged, but none of the algorithms gives a unique and very best parameter vector. The parameters of fitted hydrological models depend upon the input data. The quality of input data cannot be assured as there may be measurement errors for both input and state variables. In this study a methodology has been developed to find a set of robust parameter vectors for a hydrological model. To see the effect of observational error on parameters, stochastically generated synthetic measurement errors were applied to observed discharge and temperature data. With this modified data, the model was calibrated and the effect of measurement errors on parameters was analysed. It was found that the measurement errors have a significant effect on the best performing parameter vector. The erroneous data led to very different optimal parameter vectors. To overcome this problem and to find a set of robust parameter vectors, a geometrical approach based on Tukey's half space depth was used. The depth of the set of N randomly generated parameters was calculated with respect to the set with the best model performance (Nash-Sutclife efficiency was used for this study) for each parameter vector. Based on the depth of parameter vectors, one can find a set of robust parameter vectors. The results show that the parameters chosen according to the above criteria have low sensitivity and perform well when transfered to a different time period. The method is demonstrated on the upper Neckar catchment in Germany. The conceptual HBV model was used for this study.


2021 ◽  
Vol 13 (12) ◽  
pp. 2405
Author(s):  
Fengyang Long ◽  
Chengfa Gao ◽  
Yuxiang Yan ◽  
Jinling Wang

Precise modeling of weighted mean temperature (Tm) is critical for realizing real-time conversion from zenith wet delay (ZWD) to precipitation water vapor (PWV) in Global Navigation Satellite System (GNSS) meteorology applications. The empirical Tm models developed by neural network techniques have been proved to have better performances on the global scale; they also have fewer model parameters and are thus easy to operate. This paper aims to further deepen the research of Tm modeling with the neural network, and expand the application scope of Tm models and provide global users with more solutions for the real-time acquisition of Tm. An enhanced neural network Tm model (ENNTm) has been developed with the radiosonde data distributed globally. Compared with other empirical models, the ENNTm has some advanced features in both model design and model performance, Firstly, the data for modeling cover the whole troposphere rather than just near the Earth’s surface; secondly, the ensemble learning was employed to weaken the impact of sample disturbance on model performance and elaborate data preprocessing, including up-sampling and down-sampling, which was adopted to achieve better model performance on the global scale; furthermore, the ENNTm was designed to meet the requirements of three different application conditions by providing three sets of model parameters, i.e., Tm estimating without measured meteorological elements, Tm estimating with only measured temperature and Tm estimating with both measured temperature and water vapor pressure. The validation work is carried out by using the radiosonde data of global distribution, and results show that the ENNTm has better performance compared with other competing models from different perspectives under the same application conditions, the proposed model expanded the application scope of Tm estimation and provided the global users with more choices in the applications of real-time GNSS-PWV retrival.


Author(s):  
Stephen A Solovitz

Abstract Following volcanic eruptions, forecasters need accurate estimates of mass eruption rate (MER) to appropriately predict the downstream effects. Most analyses use simple correlations or models based on large eruptions at steady conditions, even though many volcanoes feature significant unsteadiness. To address this, a superposition model is developed based on a technique used for spray injection applications, which predicts plume height as a function of the time-varying exit velocity. This model can be inverted, providing estimates of MER using field observations of a plume. The model parameters are optimized using laboratory data for plumes with physically-relevant exit profiles and Reynolds numbers, resulting in predictions that agree to within 10% of measured exit velocities. The model performance is examined using a historic eruption from Stromboli with well-documented unsteadiness, again providing MER estimates of the correct order of magnitude. This method can provide a rapid alternative for real-time forecasting of small, unsteady eruptions.


2021 ◽  
Author(s):  
Elzbieta Wisniewski ◽  
Wit Wisniewski

<p>The presented research examines what minimum combination of input variables are required to obtain state-of-the-art fractional snow cover (FSC) estimates for heterogeneous alpine-forested terrains. Currently, one of the most accurate FSC estimators for alpine regions is based on training an Artificial Neural Network (ANN) that can deconvolve the relationships among numerous compounded and possibly non-linear bio-geophysical relations encountered in alpine terrain. Under the assumption that the ANN optimally extracts available information from its input data, we can exploit the ANN as a tool to assess the contributions toward FSC estimation of each of the data sources, and combinations thereof. By assessing the quality of the modeled FSC estimates versus ground equivalent data, suitable combinations of input variables can be identified. High spatial resolution IKONOS images are used to estimate snow cover for ANN training and validation, and also for error assessment of the ANN FSC results. Input variables are initially chosen representing information already incorporated into leading snow cover estimators (ex. two multispectral bands for NDSI, etc.). Additional variables such as topographic slope, aspect, and shadow distribution are evaluated to observe the ANN as it accounts for illumination incidence and directional reflectance of surfaces affecting the viewed radiance in complex terrain. Snow usually covers vegetation and underlying geology partially, therefore the ANN also has to resolve spectral mixtures of unobscured surfaces surrounded by snow. Multispectral imagery if therefore acquired in the fall prior to the first snow of the season and are included in the ANN analyses for assessing the baseline reflectance values of the environment that later become modified by the snow. In this study, nine representative scenarios of input data are selected to analyze the FSC performance. Numerous selections of input data combinations produced good results attesting to the powerful ability of ANNs to extract information and utilize redundancy. The best ANN FSC model performance was achieved when all 15 pre-selected inputs were used. The need for non-linear modeling to estimate FSC was verified by forcing the ANN to behave linearly. The linear ANN model exhibited profoundly decreased FSC performance, indicating that non-linear processing more optimally estimates FSC in alpine-forested environments.</p>


2018 ◽  
Vol 22 (8) ◽  
pp. 4565-4581 ◽  
Author(s):  
Florian U. Jehn ◽  
Lutz Breuer ◽  
Tobias Houska ◽  
Konrad Bestian ◽  
Philipp Kraft

Abstract. The ambiguous representation of hydrological processes has led to the formulation of the multiple hypotheses approach in hydrological modeling, which requires new ways of model construction. However, most recent studies focus only on the comparison of predefined model structures or building a model step by step. This study tackles the problem the other way around: we start with one complex model structure, which includes all processes deemed to be important for the catchment. Next, we create 13 additional simplified models, where some of the processes from the starting structure are disabled. The performance of those models is evaluated using three objective functions (logarithmic Nash–Sutcliffe; percentage bias, PBIAS; and the ratio between the root mean square error and the standard deviation of the measured data). Through this incremental breakdown, we identify the most important processes and detect the restraining ones. This procedure allows constructing a more streamlined, subsequent 15th model with improved model performance, less uncertainty and higher model efficiency. We benchmark the original Model 1 and the final Model 15 with HBV Light. The final model is not able to outperform HBV Light, but we find that the incremental model breakdown leads to a structure with good model performance, fewer but more relevant processes and fewer model parameters.


Sign in / Sign up

Export Citation Format

Share Document