Interactive comment on “Identifying rainfall-runoff events in discharge time series: A data-driven method based on Information Theory” by Stephanie Thiesen et al.

2018 ◽  
Author(s):  
Yiwen Mei
2018 ◽  
Author(s):  
Stephanie Thiesen ◽  
Paul Darscheid ◽  
Uwe Ehret

Abstract. In this study, we propose a data-driven approach to automatically identify rainfall-runoff events in discharge time series. The core of the concept is to construct and apply discrete multivariate probability distributions to obtain probabilistic predictions of each time step being part of an event. The approach permits any data to serve as predictors, and it is non-parametric in the sense that it can handle any kind of relation between the predictor(s) and the target. Each choice of a particular predictor data set is equivalent to formulating a model hypothesis. Among competing models, the best is found by comparing their predictive power in a training data set with user-classified events. For evaluation, we use measures from Information Theory such as Shannon Entropy and Conditional Entropy to select the best predictors and models and, additionally, measure the risk of overfitting via Cross Entropy and Kullback–Leibler Divergence. As all these measures are expressed in bit, we can combine them to identify models with the best tradeoff between predictive power and robustness given the available data. We applied the method to data from the Dornbirnerach catchment in Austria distinguishing three different model types: Models relying on discharge data, models using both discharge and precipitation data, and recursive models, i.e., models using their own predictions of a previous time step as an additional predictor. In the case study, the additional use of precipitation reduced predictive uncertainty only by a small amount, likely because the information provided by precipitation is already contained in the discharge data. More generally, we found that the robustness of a model quickly dropped with the increase in the number of predictors used (an effect well known as the Curse of Dimensionality), such that in the end, the best model was a recursive one applying four predictors (three standard and one recursive): discharge from two distinct time steps, the relative magnitude of discharge in a 65-hour time window and event predictions from the previous time step. Applying the model reduced the uncertainty about event classification by 77.8 %, decreasing Conditional Entropy from 0.516 to 0.114 bits. Given enough data to build data-driven models, their potential lies in the way they learn and exploit relations between data unconstrained by functional or parametric assumptions and choices. And, beyond that, the use of these models to reproduce a hydrologist's way to identify rainfall-runoff events is just one of many potential applications.


2015 ◽  
Vol 17 (6) ◽  
pp. 943-958 ◽  
Author(s):  
Carolina Massmann

The main objective of this paper is assessing the usefulness of parameter sensitivity information from conceptual hydrological models for data-driven models, an approach which might allow us to take advantage of the strengths of both data-based and process-based models. This study uses the parameter sensitivity of three widely used conceptual hydrological models (GR4J, Hymod and SAC-SMA) and combines them with M5 model trees. The study was carried out for three case studies dealing with different problems to which model trees are applied: one using model trees as error correctors and two case studies in which model trees were used as rainfall–runoff models and which differ in how the sensitivity information is used. The results show that sensitivity time series can improve the predictions of M5 model trees, especially when they do not include the time series of previous discharge as predictor variables. The use of parameter sensitivity information for clustering the time series resulted in model trees that had a structure consistent with the hydrological processes that were taking place in the considered cluster, indicating that the use of sensitivity indices could be a viable way of introducing hydrological knowledge into data-based models.


2013 ◽  
Vol 17 (5) ◽  
pp. 2001-2016 ◽  
Author(s):  
N. De Vleeschouwer ◽  
V. R. N. Pauwels

Abstract. In this paper the potential of discharge-based indirect calibration of the probability-distributed model (PDM), a lumped rainfall-runoff (RR) model, is examined for six selected catchments in Flanders. The concept of indirect calibration indicates that one has to estimate the calibration data because the catchment is ungauged or scarcely gauged. A first case in which indirect calibration is applied is that of spatial gauging divergence: because no observed discharge records are available at the outlet of the ungauged catchment, the calibration is carried out based on a rescaled discharge time series of a very similar donor catchment. Both a calibration in the time domain and the frequency domain (also known as spectral domain) are carried out. Furthermore, the case of temporal gauging divergence is considered: limited (e.g. historical or very recent) discharge records are available at the outlet of the scarcely gauged catchment. Additionally, no time overlap exists between the forcing and discharge records. Therefore, only an indirect spectral calibration can be performed in this case. To conclude also the combination case of spatio-temporal gauging divergence is considered. In this last case only limited discharge records are available at the outlet of a donor catchment. Again the forcing and discharge records are not concomitant, which only makes feasible an indirect spectral calibration. For most catchments the modelled discharge time series is found to be acceptable in the considered cases. In the case of spatial gauging divergence, indirect temporal calibration results in a better model performance than indirect spectral calibration. Furthermore, indirect spectral calibration in the case of temporal gauging divergence leads to a better model performance than indirect spectral calibration in the case of spatial gauging divergence. Finally, the combination of spatial and temporal gauging divergence does not lead to a notably worse model performance compared to the case of spatial gauging divergence.


Water ◽  
2019 ◽  
Vol 11 (2) ◽  
pp. 212 ◽  
Author(s):  
Haniyeh Asadi ◽  
Kaka Shahedi ◽  
Ben Jarihani ◽  
Roy Sidle

The input selection process for data-driven rainfall-runoff models is critical because input vectors determine the structure of the model and, hence, can influence model results. Here, hydro-geomorphic and biophysical time series inputs, including Normalized Difference Vegetation Index (NDVI) and Index of Connectivity (IC; a type of hydrological connectivity index), in addition to climatic and hydrologic inputs were assessed. Selected inputs were used to develop Artificial Neural Networks (ANNs) in the Haughton River catchment and the Calliope River catchment, Queensland, Australia. Results show that incorporating IC as a hydro-geomorphic parameter and remote sensing NDVI as a biophysical parameter, together with rainfall and runoff as hydro-climatic parameters, can improve ANN model performance compared to ANN models using only hydro-climatic parameters. Comparisons amongst different input patterns showed that IC inputs can contribute to further improvement in model performance, than NDVI inputs. Overall, ANN model simulations showed that using IC along with hydro-climatic inputs noticeably improved model performance in both catchments, especially in the Calliope catchment. This improvement is indicated by a slight increase (9.77% and 11.25%) in the Nash–Sutcliffe efficiency and noticeable decrease (24.43% and 37.89%) in the root mean squared error of monthly runoff from Haughton River and Calliope River, respectively. Here, we demonstrate the significant effect of hydro-geomorphic and biophysical time series inputs for estimating monthly runoff using ANN data-driven models, which are valuable for water resources planning and management.


2013 ◽  
Vol 10 (1) ◽  
pp. 103-144
Author(s):  
N. De Vleeschouwer ◽  
V. R. N. Pauwels

Abstract. In this paper the potential of discharge-based indirect calibration of the Probability Distributed Model (PDM), a lumped rainfall-runoff (RR) model, is examined for six selected catchments in Flanders. The concept of indirect calibration indicates that one has to estimate the calibration data because the catchment is ungauged. A first case in which indirect calibration is applied is that of spatial gauging divergence: Because no observed discharge records are available at the outlet of the ungauged catchment, the calibration is carried out based on a rescaled discharge time series of a very similar donor catchment. Both a calibration in the time domain and the frequency domain (a.k.a. spectral domain) are carried out. Furterhermore, the case of temporal gauging divergence is considered: Limited (e.g. historical or very recent) discharge records are available at the outlet of the ungauged catchment. Additionally, no time overlap exists between the forcing and discharge records. Therefore, only an indirect spectral calibration can be performed in this case. To conclude also the combination case of spatio-temporal gauging divergence is considered. In this last case only limited discharge records are available at the outlet of a donor catchment. Again the forcing and discharge records are not contemporaneous which only makes feasible an indirect spectral calibration. The modelled discharge time series are found to be acceptable in all three considered cases. In the case of spatial gauging divergence, indirect temporal calibration results in a slightly better model performance than indirect spectral calibration. Furthermore, indirect spectral calibration in the case of temporal gauging divergence leads to a better model performance than indirect spectral calibration in the case of spatial gauging divergence. Finally, the combination of spatial and temporal gauging divergence does not necessarily lead to a worse model performance compared to the separate cases of spatial and temporal gauging divergence.


2018 ◽  
Vol 22 (10) ◽  
pp. 5081-5095 ◽  
Author(s):  
Petra Hulsman ◽  
Thom A. Bogaard ◽  
Hubert H. G. Savenije

Abstract. Hydrological models play an important role in water resources management. These models generally rely on discharge data for calibration. Discharge time series are normally derived from observed water levels by using a rating curve. However, this method suffers from many uncertainties due to insufficient observations, inadequate rating curve fitting procedures, rating curve extrapolation, and temporal changes in the river geometry. Unfortunately, this problem is prominent in many African river basins. In this study, an alternative calibration method is presented using water-level time series instead of discharge, applied to a semi-distributed rainfall-runoff model for the semi-arid and poorly gauged Mara River basin in Kenya. The modelled discharges were converted into water levels using the Strickler–Manning formula. This method produces an additional model output; this is a “geometric rating curve equation” that relates the modelled discharge to the observed water level using the Strickler–Manning formula and a calibrated slope-roughness parameter. This procedure resulted in good and consistent model results during calibration and validation. The hydrological model was able to reproduce the water levels for the entire basin as well as for the Nyangores sub-catchment in the north. The newly derived geometric rating curves were subsequently compared to the existing rating curves. At the catchment outlet of the Mara, these differed significantly, most likely due to uncertainties in the recorded discharge time series. However, at the “Nyangores” sub-catchment, the geometric and recorded discharge were almost identical. In conclusion, the results obtained for the Mara River basin illustrate that with the proposed calibration method, the water-level time series can be simulated well, and that the discharge-water-level relation can also be derived, even in catchments with uncertain or lacking rating curve information.


2019 ◽  
Vol 23 (2) ◽  
pp. 1015-1034 ◽  
Author(s):  
Stephanie Thiesen ◽  
Paul Darscheid ◽  
Uwe Ehret

Abstract. In this study, we propose a data-driven approach for automatically identifying rainfall-runoff events in discharge time series. The core of the concept is to construct and apply discrete multivariate probability distributions to obtain probabilistic predictions of each time step that is part of an event. The approach permits any data to serve as predictors, and it is non-parametric in the sense that it can handle any kind of relation between the predictor(s) and the target. Each choice of a particular predictor data set is equivalent to formulating a model hypothesis. Among competing models, the best is found by comparing their predictive power in a training data set with user-classified events. For evaluation, we use measures from information theory such as Shannon entropy and conditional entropy to select the best predictors and models and, additionally, measure the risk of overfitting via cross entropy and Kullback–Leibler divergence. As all these measures are expressed in “bit”, we can combine them to identify models with the best tradeoff between predictive power and robustness given the available data. We applied the method to data from the Dornbirner Ach catchment in Austria, distinguishing three different model types: models relying on discharge data, models using both discharge and precipitation data, and recursive models, i.e., models using their own predictions of a previous time step as an additional predictor. In the case study, the additional use of precipitation reduced predictive uncertainty only by a small amount, likely because the information provided by precipitation is already contained in the discharge data. More generally, we found that the robustness of a model quickly dropped with the increase in the number of predictors used (an effect well known as the curse of dimensionality) such that, in the end, the best model was a recursive one applying four predictors (three standard and one recursive): discharge from two distinct time steps, the relative magnitude of discharge compared with all discharge values in a surrounding 65 h time window and event predictions from the previous time step. Applying the model reduced the uncertainty in event classification by 77.8 %, decreasing conditional entropy from 0.516 to 0.114 bits. To assess the quality of the proposed method, its results were binarized and validated through a holdout method and then compared to a physically based approach. The comparison showed similar behavior of both models (both with accuracy near 90 %), and the cross-validation reinforced the quality of the proposed model. Given enough data to build data-driven models, their potential lies in the way they learn and exploit relations between data unconstrained by functional or parametric assumptions and choices. And, beyond that, the use of these models to reproduce a hydrologist's way of identifying rainfall-runoff events is just one of many potential applications.


RBRH ◽  
2019 ◽  
Vol 24 ◽  
Author(s):  
Luiz Claudio Galvão do Valle Junior ◽  
Dulce Buchala Bicca Rodrigues ◽  
Paulo Tarso Sanches de Oliveira

ABSTRACT The Curve Number (CN) method is extensively used for predict surface runoff from storm events. However, remain some uncertainties in the method, such as in the use of an initial abstraction (λ) standard value of 0.2 and on the choice of the most suitable CN values. Here, we compute λ and CN values using rainfall and runoff data to a rural basin located in Midwestern Brazil. We used 30 observed rainfall-runoff events with rainfall depth greater than 25 mm to derive associated CN values using five statistical methods. We noted λ values ranging from 0.005 to 0.455, with a median of 0.045, suggesting the use of λ = 0.05 instead of 0.2. We found a S0.2 to S0.05 conversion factor of 2.865. We also found negative values of Nash-Sutcliffe Efficiency (to the estimated and observed runoff). Therefore, our findings indicated that the CN method was not suitable to estimate runoff in the studied basin. This poor performance suggests that the runoff mechanisms in the studied area are dominated by subsurface stormflow.


Sign in / Sign up

Export Citation Format

Share Document