Identifying rainfall-runoff events in discharge time series: a data-driven method based on information theory

Abstract. In this study, we propose a data-driven approach for automatically identifying rainfall-runoff events in discharge time series. The core of the concept is to construct and apply discrete multivariate probability distributions to obtain probabilistic predictions of each time step that is part of an event. The approach permits any data to serve as predictors, and it is non-parametric in the sense that it can handle any kind of relation between the predictor(s) and the target. Each choice of a particular predictor data set is equivalent to formulating a model hypothesis. Among competing models, the best is found by comparing their predictive power in a training data set with user-classified events. For evaluation, we use measures from information theory such as Shannon entropy and conditional entropy to select the best predictors and models and, additionally, measure the risk of overfitting via cross entropy and Kullback–Leibler divergence. As all these measures are expressed in “bit”, we can combine them to identify models with the best tradeoff between predictive power and robustness given the available data. We applied the method to data from the Dornbirner Ach catchment in Austria, distinguishing three different model types: models relying on discharge data, models using both discharge and precipitation data, and recursive models, i.e., models using their own predictions of a previous time step as an additional predictor. In the case study, the additional use of precipitation reduced predictive uncertainty only by a small amount, likely because the information provided by precipitation is already contained in the discharge data. More generally, we found that the robustness of a model quickly dropped with the increase in the number of predictors used (an effect well known as the curse of dimensionality) such that, in the end, the best model was a recursive one applying four predictors (three standard and one recursive): discharge from two distinct time steps, the relative magnitude of discharge compared with all discharge values in a surrounding 65 h time window and event predictions from the previous time step. Applying the model reduced the uncertainty in event classification by 77.8 %, decreasing conditional entropy from 0.516 to 0.114 bits. To assess the quality of the proposed method, its results were binarized and validated through a holdout method and then compared to a physically based approach. The comparison showed similar behavior of both models (both with accuracy near 90 %), and the cross-validation reinforced the quality of the proposed model. Given enough data to build data-driven models, their potential lies in the way they learn and exploit relations between data unconstrained by functional or parametric assumptions and choices. And, beyond that, the use of these models to reproduce a hydrologist's way of identifying rainfall-runoff events is just one of many potential applications.

Download Full-text

Identifying rainfall-runoff events in discharge time series: A data-driven method based on Information Theory

10.5194/hess-2018-427 ◽

2018 ◽

Author(s):

Stephanie Thiesen ◽

Paul Darscheid ◽

Uwe Ehret

Keyword(s):

Predictive Power ◽

Conditional Entropy ◽

Data Driven ◽

Rainfall Runoff ◽

Discharge Time ◽

Time Step ◽

Data Set ◽

Discharge Data ◽

Previous Time Step ◽

Runoff Events

Abstract. In this study, we propose a data-driven approach to automatically identify rainfall-runoff events in discharge time series. The core of the concept is to construct and apply discrete multivariate probability distributions to obtain probabilistic predictions of each time step being part of an event. The approach permits any data to serve as predictors, and it is non-parametric in the sense that it can handle any kind of relation between the predictor(s) and the target. Each choice of a particular predictor data set is equivalent to formulating a model hypothesis. Among competing models, the best is found by comparing their predictive power in a training data set with user-classified events. For evaluation, we use measures from Information Theory such as Shannon Entropy and Conditional Entropy to select the best predictors and models and, additionally, measure the risk of overfitting via Cross Entropy and Kullback–Leibler Divergence. As all these measures are expressed in bit, we can combine them to identify models with the best tradeoff between predictive power and robustness given the available data. We applied the method to data from the Dornbirnerach catchment in Austria distinguishing three different model types: Models relying on discharge data, models using both discharge and precipitation data, and recursive models, i.e., models using their own predictions of a previous time step as an additional predictor. In the case study, the additional use of precipitation reduced predictive uncertainty only by a small amount, likely because the information provided by precipitation is already contained in the discharge data. More generally, we found that the robustness of a model quickly dropped with the increase in the number of predictors used (an effect well known as the Curse of Dimensionality), such that in the end, the best model was a recursive one applying four predictors (three standard and one recursive): discharge from two distinct time steps, the relative magnitude of discharge in a 65-hour time window and event predictions from the previous time step. Applying the model reduced the uncertainty about event classification by 77.8 %, decreasing Conditional Entropy from 0.516 to 0.114 bits. Given enough data to build data-driven models, their potential lies in the way they learn and exploit relations between data unconstrained by functional or parametric assumptions and choices. And, beyond that, the use of these models to reproduce a hydrologist's way to identify rainfall-runoff events is just one of many potential applications.

Download Full-text

Picturing and modeling catchments by representative hillslopes

Hydrology and Earth System Sciences ◽

10.5194/hess-21-1225-2017 ◽

2017 ◽

Vol 21 (2) ◽

pp. 1225-1249 ◽

Cited By ~ 19

Author(s):

Ralf Loritz ◽

Sibylle K. Hassler ◽

Conrad Jackisch ◽

Niklas Allroggen ◽

Loes van Schaik ◽

...

Keyword(s):

Expert Knowledge ◽

Water Dynamics ◽

Rainfall Runoff ◽

Data Set ◽

Physically Based Model ◽

Optimal Representation ◽

Physically Based ◽

Physically Based Models ◽

Streamflow Data ◽

Runoff Events

Abstract. This study explores the suitability of a single hillslope as a parsimonious representation of a catchment in a physically based model. We test this hypothesis by picturing two distinctly different catchments in perceptual models and translating these pictures into parametric setups of 2-D physically based hillslope models. The model parametrizations are based on a comprehensive field data set, expert knowledge and process-based reasoning. Evaluation against streamflow data highlights that both models predicted the annual pattern of streamflow generation as well as the hydrographs acceptably. However, a look beyond performance measures revealed deficiencies in streamflow simulations during the summer season and during individual rainfall–runoff events as well as a mismatch between observed and simulated soil water dynamics. Some of these shortcomings can be related to our perception of the systems and to the chosen hydrological model, while others point to limitations of the representative hillslope concept itself. Nevertheless, our results confirm that representative hillslope models are a suitable tool to assess the importance of different data sources as well as to challenge our perception of the dominant hydrological processes we want to represent therein. Consequently, these models are a promising step forward in the search for the optimal representation of catchments in physically based models.

Download Full-text

A Long–Term Response-Based Rainfall-Runoff Hydrologic Model: Case Study of The Upper Blue Nile

Hydrology ◽

10.3390/hydrology6030069 ◽

2019 ◽

Vol 6 (3) ◽

pp. 69 ◽

Cited By ~ 2

Author(s):

Eatemad Keshta ◽

Mohamed A. Gad ◽

Doaa Amin

Keyword(s):

River Flow ◽

Hydrologic Model ◽

Initial Assessment ◽

Rainfall Runoff ◽

Time Step ◽

Data Set ◽

Blue Nile ◽

Catchment Areas ◽

Surface And Groundwater

This study develops a response-based hydrologic model for long-term (continuous) rainfall-runoff simulations over the catchment areas of big rivers. The model overcomes the typical difficulties in estimating infiltration and evapotranspiration parameters using a modified version of the Soil Conservation Service curve number SCS-CN method. In addition, the model simulates the surface and groundwater hydrograph components using the response unit-hydrograph approach instead of using a linear reservoir routing approach for routing surface and groundwater to the basin outlet. The unit-responses are Geographic Information Systems (GIS)-pre-calculated on a semi-distributed short-term basis and applied in the simulation in every time step. The unit responses are based on the time-area technique that can better simulate the real routing behavior of the basin. The model is less sensitive to groundwater infiltration parameters since groundwater is actually controlled by the surface component and not the opposite. For that reason, the model is called the SCHydro model (Surface Controlled Hydrologic model). The model is tested on the upper Blue Nile catchment area using 28 years daily river flow data set for calibration and validation. The results show that SCHydro model can simulate the long-term transforming behavior of the upper Blue Nile basin. Our initial assessment of the model indicates that the model is a promising tool for long-term river flow simulations, especially for long-term forecasting purposes due to its stability in performing the water balance.

Download Full-text

A Comparison of Emotional Neural Network (ENN) and Artificial Neural Network (ANN) Approach for Rainfall-Runoff Modelling

Civil Engineering Journal ◽

10.28991/cej-2019-03091398 ◽

2019 ◽

Vol 5 (10) ◽

pp. 2120-2130 ◽

Cited By ~ 5

Author(s):

Suraj Kumar ◽

Thendiyath Roshni ◽

Dar Himayoun

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Data Sets ◽

Rainfall Runoff ◽

Data Set ◽

Discharge Data ◽

Artificial Neural ◽

Artificial Neural Network Ann ◽

Emotional Neural Network ◽

Selection Of

Reliable method of rainfall-runoff modeling is a prerequisite for proper management and mitigation of extreme events such as floods. The objective of this paper is to contrasts the hydrological execution of Emotional Neural Network (ENN) and Artificial Neural Network (ANN) for modelling rainfall-runoff in the Sone Command, Bihar as this area experiences flood due to heavy rainfall. ENN is a modified version of ANN as it includes neural parameters which enhance the network learning process. Selection of inputs is a crucial task for rainfall-runoff model. This paper utilizes cross correlation analysis for the selection of potential predictors. Three sets of input data: Set 1, Set 2 and Set 3 have been prepared using weather and discharge data of 2 raingauge stations and 1 discharge station located in the command for the period 1986-2014. Principal Component Analysis (PCA) has then been performed on the selected data sets for selection of data sets showing principal tendencies. The data sets obtained after PCA have then been used in the model development of ENN and ANN models. Performance indices were performed for the developed model for three data sets. The results obtained from Set 2 showed that ENN with R= 0.933, R2 = 0.870, Nash Sutcliffe = 0.8689, RMSE = 276.1359 and Relative Peak Error = 0.00879 outperforms ANN in simulating the discharge. Therefore, ENN model is suggested as a better model for rainfall-runoff discharge in the Sone command, Bihar.

Download Full-text

The Impact of Rainfall-Runoff Events on the Water Quality of the Upper Catchment of the Jordan River, Israel

Integrated Water Resources Management: Concept, Research and Implementation ◽

10.1007/978-3-319-25071-7_6 ◽

2016 ◽

pp. 129-146 ◽

Cited By ~ 2

Author(s):

Oren Reichmann ◽

Yona Chen ◽

Iggy M. Litaor

Keyword(s):

Water Quality ◽

Rainfall Runoff ◽

Jordan River ◽

The Impact ◽

Runoff Events

Download Full-text

Improving the estimates of nitrate concentrations at subsurface drained agricultural catchment scale using a new conceptual water quality model

10.5194/egusphere-egu21-14540 ◽

2021 ◽

Author(s):

Julien Tournebize ◽

Samy Chelil ◽

Hocine Henine ◽

Cedric Chaumont

Keyword(s):

Water Quality ◽

Soil Profile ◽

Model Calibration ◽

Input Data ◽

Performance Criteria ◽

Time Step ◽

Data Set ◽

Calibration And Validation ◽

Nitrate Concentrations

The agricultural source pollution, such as nutrient and pesticides, affect the quality of surface water and groundwater. The agricultural nonpoint source pollution due to the excessive land fertilization is considered by researchers and governments as a concerning and sensitive issue. At the scale of agricultural catchments, the modeling of nitrate-leaching losses has been widely addressed in several studies. However, most of developed models require a large number of input data and parameters. Some of them include a complex process of biogeochemical nitrogen process or a full agronomic module and could be computationally time-consuming. Moreover, the quality of the input data makes the model calibration less efficient.The objective of this study is to present a new conceptual and reservoir model (SIDRA-N), developed to better access the time-variation of nitrate concentrations [NO3-] at the outlet of subsurface drainage network. The model represent a simplified scheme of subsurface flow and nitrate transfer processes in the soil profile, between the drain and the mid-drain. The soil profile is decomposed into three interconnected compartments: the first compartment represents the rapid transfer of water and nitrate through the soil macroporosity; the two other compartments describe the progressive contribution of the horizontal transfer.The input data to the nitrate module consists on the Remaining pools of Nitrate at the Beginning of Winter season (RNBW), introduced before the winter of each hydrological year. This value should represent all biogeochemical transformations of nitrogen and agricultural practices from previous crop. This variable can explain until 80% of the total nitrate flux exported yearly. Hence, SIDRA-N model requires only two input variables: the drainage discharge and the RNBW. A set of parameters was introduced to regulate nitrate fluxes and discharge transiting through compartments to the drain outlet.Calibration and validation (C/V) procedures are fundamental to the assessment of the performance and the robustness of water quality models. In this study, the split sample test for the model calibration and validation (C/V) was carried out using data set from Rampillon study site (355 ha, data for 6 years), located East of Paris, in France. The C/V step was performed using high frequency observations (hourly time-step) of nitrate concentrations and drainage discharge. The results showed performance criteria of KGE greater than 0.5 and RMSE less than 5 mgN/l. These results confirm the very good quality of simulations. Finally, a seasonal model calibration was implemented to observe the yearly parameter variability and ensure the model stability and consistency.

Download Full-text

Subsurface runoff and recharge dynamics in a Mediterranean catchment based on StorAge Selection functions and end-member splitting analysis

10.5194/egusphere-egu2020-829 ◽

2020 ◽

Author(s):

Matthias Sprenger ◽

Pilar Llorens ◽

Francesc Gallart ◽

Jérôme Latron

Keyword(s):

Stream Water ◽

Pore Space ◽

Data Driven ◽

High Temporal Resolution ◽

Rainfall Runoff ◽

Catchment Scale ◽

Data Set ◽

Isotope Data ◽

Subsurface Runoff

Investigations at the long-term experimental catchment Vallcebre in the Pyrenees revealed that rainfall-runoff dynamics are highly variable due to the Mediterranean climatic conditions affecting the storage and release of water in the subsurface1. In a changing climate, to the consequences of which could lead to more variations in catchment wetness due to an increase in both droughts and high intensity rainfalls, there is a strong need to better understand subsurface storage and runoff processes.While our previous isotope studies (using 2H and 18O) demonstrated a pronounced heterogeneity of water flow in the unsaturated zone at the plot scale2, we also observed that the contributions of young waters to catchment runoff are highly dependent on the catchments wetness3. These analyses provided a basis from which we present new insights into the relationship between subsurface runoff and storage dynamics applying StorAge Selection functions4 and end-member splitting analysis5. Thus, we combined modeling and data-driven approaches to disentangle the partitioning of subsurface waters into storage and runoff based on water age dynamics.We gathered an extensive isotope data set with >550 rainfall samples and >980 stream samples taken at high temporal resolution (30 minutes to one week), with highest frequencies during high discharge to improve the coverage of rainfall-runoff events. Using this high-frequency isotope data set, we calibrated the StorAge Selection functions and put special emphasis on the representation of the isotopic response during high flow rainfall-runoff periods. We further tested if time-variant representations of StorAge Selection functions dependent on varying wetness improves the stream water isotope simulations and the ways in which isotope data from different compartments (groundwater and tree water) can assist in constraining the parameter space. Furthermore, end-member splitting analysis provided an independent view into the flow dynamics based on these long-term isotope data sets. As such, the analysis allowed us to derive estimates of the dynamics of rainfall partitioning into runoff and evapotranspiration. Therefore, the combination of the modeling and data-driven approaches enabled an assessment of the dynamics of subsurface runoff at the catchment scale underlining the relevance of heterogeneous flow pattern that were observed on the plot scale.References<ol><li>Llorens, P. et al. What have we learnt about mediterranean catchment hydrology? 30 years observing hydrological processes in the Vallcebre research catchments. Geogr. Res. Lett. 44, 475&#8211;502; 10.18172/cig.3432 (2018).</li> <li>Sprenger, M., Llorens, P., Cayuela, C., Gallart, F. & Latron, J. Mechanisms of consistently disjunct soil water pools over (pore) space and time. Hydrol. Earth Syst. Sci. 23, 2751&#8211;2762; 10.5194/hess-23-2751-2019 (2019).</li> <li>Gallart, F. et al. Investigating young water fractions in a small Mediterranean mountain catchment: both precipitation forcing and sampling frequency matter. Hydrol. Process. (in review).</li> <li>Benettin, P. & Bertuzzo, E. tran-SAS v1.0: a numerical model to compute catchment-scale hydrologic transport using StorAge Selection functions. Geosci. Model Dev. 11, 1627&#8211;1639; 10.5194/gmd-11-1627-2018 (2018).</li> <li>Kirchner, J. W. & Allen, S. T. Seasonal partitioning of precipitation between streamflow and evapotranspiration, inferred from end-member splitting analysis. Hydrology and Earth System Sciences, 24, 17&#8211;39; 10.5194/hess-24-17-2020 (2020).</li> </ol>

Download Full-text

Abstraction Hierarchy Based Explainable Artificial Intelligence

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/1071181320641073 ◽

2020 ◽

Vol 64 (1) ◽

pp. 319-323

Author(s):

Murat Dikmen ◽

Catherine Burns

Keyword(s):

Artificial Intelligence ◽

Domain Knowledge ◽

Evaluation Process ◽

Data Driven ◽

Data Set ◽

Knowledge Based ◽

Cognitive Work ◽

Explainable Artificial Intelligence ◽

Loan Approval

This work explores the application of Cognitive Work Analysis (CWA) in the context of Explainable Artificial Intelligence (XAI). We built an AI system using a loan evaluation data set and applied an XAI technique to obtain data-driven explanations for predictions. Using an Abstraction Hierarchy (AH), we generated domain knowledge-based explanations to accompany data-driven explanations. An online experiment was conducted to test the usefulness of AH-based explanations. Participants read financial profiles of loan applicants, the AI system’s loan approval/rejection decisions, and explanations that justify the decisions. Presence or absence of AH-based explanations was manipulated, and participants’ perceptions of the explanation quality was measured. The results showed that providing AH-based explanations helped participants learn about the loan evaluation process and improved the perceived quality of explanations. We conclude that a CWA approach can increase understandability when explaining the decisions made by AI systems.

Download Full-text

Addressing Overfitting Problem in Deep Learning-Based Solutions for Next Generation Data-Driven Networks

Wireless Communications and Mobile Computing ◽

10.1155/2021/8493795 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Mansheng Xiao ◽

Yuezhong Wu ◽

Guocai Zuo ◽

Shuangnan Fan ◽

Huijun Yu ◽

...

Keyword(s):

Deep Learning ◽

Data Driven ◽

Next Generation Networks ◽

Next Generation ◽

Data Set ◽

Practical Applications ◽

Maximum Value ◽

Network Problems ◽

Overfitting Problem

Next-generation networks are data-driven by design but face uncertainty due to various changing user group patterns and the hybrid nature of infrastructures running these systems. Meanwhile, the amount of data gathered in the computer system is increasing. How to classify and process the massive data to reduce the amount of data transmission in the network is a very worthy problem. Recent research uses deep learning to propose solutions for these and related issues. However, deep learning faces problems like overfitting that may undermine the effectiveness of its applications in solving different network problems. This paper considers the overfitting problem of convolutional neural network (CNN) models in practical applications. An algorithm for maximum pooling dropout and weight attenuation is proposed to avoid overfitting. First, design the maximum value pooling dropout in the pooling layer of the model to sparse the neurons and then introduce the regularization based on weight attenuation to reduce the complexity of the model when the gradient of the loss function is calculated by backpropagation. Theoretical analysis and experiments show that the proposed method can effectively avoid overfitting and can reduce the error rate of data set classification by more than 10% on average than other methods. The proposed method can improve the quality of different deep learning-based solutions designed for data management and processing in next-generation networks.

Download Full-text

Rainfall-Runoff modelling using Long-Short-Term-Memory (LSTM) networks

10.5194/hess-2018-247 ◽

2018 ◽

Cited By ~ 6

Author(s):

Frederik Kratzert ◽

Daniel Klotz ◽

Claire Brenner ◽

Karsten Schulz ◽

Mathew Herrnegger

Keyword(s):

Short Term Memory ◽

Regional Scale ◽

Model Performance ◽

Data Driven ◽

Rainfall Runoff ◽

Short Term ◽

Data Set ◽

Term Memory ◽

Soil Moisture Accounting ◽

Long Short Term Memory

Abstract. Rainfall-runoff modelling is one of the key challenges in the field of hydrology. Various approaches exist, ranging from physically based over conceptual to fully data driven models. In this paper, we propose a novel data driven approach, using the Long-Short-Term-Memory (LSTM) network, a special type of recurrent neural networks. The advantage of the LSTM is its ability to learn long-term dependencies between the provided input and output of the network, which are essential for modelling storage effects in e.g. catchments with snow influence. We use 241 catchments of the freely available CAMELS data set to test our approach and also compare the results to the well-known Sacramento Soil Moisture Accounting Model (SAC-SMA) coupled with the Snow-17 snow routine. We also show the potential of the LSTM as a regional hydrological model, in which one model predicts the discharge for a variety of catchments. In our last experiment, we show the possibility to transfer process understanding, learned at regional scale, to individual catchments and thereby increasing model performance when compared to a LSTM trained only on the data of single catchments. Using this approach, we were able to achieve better model performance as the SAC-SMA + Snow-17, which underlines the potential of the LSTM for hydrological modelling applications.

Download Full-text