scholarly journals Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 2: Application

2010 ◽  
Vol 14 (10) ◽  
pp. 1943-1961 ◽  
Author(s):  
A. Elshorbagy ◽  
G. Corzo ◽  
S. Srinivasulu ◽  
D. P. Solomatine

Abstract. In this second part of the two-part paper, the data driven modeling (DDM) experiment, presented and explained in the first part, is implemented. Inputs for the five case studies (half-hourly actual evapotranspiration, daily peat soil moisture, daily till soil moisture, and two daily rainfall-runoff datasets) are identified, either based on previous studies or using the mutual information content. Twelve groups (realizations) were randomly generated from each dataset by randomly sampling without replacement from the original dataset. Neural networks (ANNs), genetic programming (GP), evolutionary polynomial regression (EPR), Support vector machines (SVM), M5 model trees (M5), K-nearest neighbors (K-nn), and multiple linear regression (MLR) techniques are implemented and applied to each of the 12 realizations of each case study. The predictive accuracy and uncertainties of the various techniques are assessed using multiple average overall error measures, scatter plots, frequency distribution of model residuals, and the deterioration rate of prediction performance during the testing phase. Gamma test is used as a guide to assist in selecting the appropriate modeling technique. Unlike two nonlinear soil moisture case studies, the results of the experiment conducted in this research study show that ANNs were a sub-optimal choice for the actual evapotranspiration and the two rainfall-runoff case studies. GP is the most successful technique due to its ability to adapt the model complexity to the modeled data. EPR performance could be close to GP with datasets that are more linear than nonlinear. SVM is sensitive to the kernel choice and if appropriately selected, the performance of SVM can improve. M5 performs very well with linear and semi linear data, which cover wide range of hydrological situations. In highly nonlinear case studies, ANNs, K-nn, and GP could be more successful than other modeling techniques. K-nn is also successful in linear situations, and it should not be ignored as a potential modeling technique for hydrological applications.

2009 ◽  
Vol 6 (6) ◽  
pp. 7095-7142 ◽  
Author(s):  
A. Elshorbagy ◽  
G. Corzo ◽  
S. Srinivasulu ◽  
D. P. Solomatine

Abstract. In this second part of the two-part paper, the data driven modeling (DDM) experiment, presented and explained in the first part, is implemented. Inputs for the five case studies (half-hourly actual evapotranspiration, daily peat soil moisture, daily till soil moisture, and two daily rainfall-runoff datasets) are identified, either based on previous studies or using the mutual information content. Twelve groups (realizations) were randomly generated from each dataset by randomly sampling without replacement from the original dataset. Neural networks (ANNs), genetic programming (GP), evolutionary polynomial regression (EPR), Support vector machines (SVM), M5 model trees (M5), K nearest neighbors (K-nn), and multiple linear regression (MLR) techniques are implemented and applied to each of the 12 realizations of each case study. The predictive accuracy and uncertainties of the various techniques are assessed using multiple average overall error measures, scatter plots, frequency distribution of model residuals, and the deterioration rate of prediction performance during the testing phase. Gamma test is used as a guide to assist in selecting the appropriate modeling technique. Unlike the two nonlinear soil moisture case studies, the results of the experiment conducted in this research study show that ANNs were a sub-optimal choice for the actual evapotranspiration and the two rainfall-runoff case studies. GP is the most successful technique due to its ability to adapt the model complexity to the modeled data. EPR performance could be close to GP with datasets that are more linear than nonlinear. SVM is sensitive to the kernel choice and if appropriately selected, the performance of SVM can improve. M5 performs very well with linear and semi linear data, which cover wide range of hydrological situations. In highly nonlinear case studies, ANNs, K-nn, and GP could be more successful than other modeling techniques. K-nn is also successful in linear situations, and it should not be ignored as a potential modeling technique for hydrological applications.


2009 ◽  
Vol 6 (6) ◽  
pp. 7055-7093 ◽  
Author(s):  
A. Elshorbagy ◽  
G. Corzo ◽  
S. Srinivasulu ◽  
D. P. Solomatine

Abstract. A comprehensive data driven modeling experiment is presented in two-part paper. In this first part, an extensive data-driven modeling experiment is proposed. The most important concerns regarding the way data driven modeling (DDM) techniques and data were handled, compared, and evaluated, and the basis on which findings and conclusions were drawn are discussed. A concise review of key articles that presented comparisons among various DDM techniques is presented. Six DDM techniques, namely, neural networks, genetic programming, evolutionary polynomial regression, support vector machines, M5 model trees, and K-nearest neighbors are proposed and explained. Multiple linear regression and naïve models are also suggested as baseline for comparison with the various techniques. Five datasets from Canada and Europe representing evapotranspiration, upper and lower layer soil moisture content, and rainfall-runoff process are described and proposed for the modeling experiment. Twelve different realizations (groups) from each dataset are created by a procedure involving random sampling. Each group contains three subsets; training, cross-validation, and testing. Each modeling technique is proposed to be applied to each of the 12 groups of each dataset. This way, both predictive accuracy and uncertainty of the modeling techniques can be evaluated. The implementation of the modeling techniques, results and analysis, and the findings of the modeling experiment are deferred to the second part of this paper.


2010 ◽  
Vol 14 (10) ◽  
pp. 1931-1941 ◽  
Author(s):  
A. Elshorbagy ◽  
G. Corzo ◽  
S. Srinivasulu ◽  
D. P. Solomatine

Abstract. A comprehensive data driven modeling experiment is presented in a two-part paper. In this first part, an extensive data-driven modeling experiment is proposed. The most important concerns regarding the way data driven modeling (DDM) techniques and data were handled, compared, and evaluated, and the basis on which findings and conclusions were drawn are discussed. A concise review of key articles that presented comparisons among various DDM techniques is presented. Six DDM techniques, namely, neural networks, genetic programming, evolutionary polynomial regression, support vector machines, M5 model trees, and K-nearest neighbors are proposed and explained. Multiple linear regression and naïve models are also suggested as baseline for comparison with the various techniques. Five datasets from Canada and Europe representing evapotranspiration, upper and lower layer soil moisture content, and rainfall-runoff process are described and proposed, in the second paper, for the modeling experiment. Twelve different realizations (groups) from each dataset are created by a procedure involving random sampling. Each group contains three subsets; training, cross-validation, and testing. Each modeling technique is proposed to be applied to each of the 12 groups of each dataset. This way, both prediction accuracy and uncertainty of the modeling techniques can be evaluated. The description of the datasets, the implementation of the modeling techniques, results and analysis, and the findings of the modeling experiment are deferred to the second part of this paper.


Author(s):  
Vijaya V. N. Sriram Malladi ◽  
Mohammad I. Albakri ◽  
Pablo A. Tarazaga ◽  
Serkan Gugercin

Dispersion relations describe the frequency-dependent nature of elastic waves propagating in structures. Experimental determination of dispersion relations of structural components, such as the floor of a building, can be a tedious task, due to material inhomogeneity, complex boundary conditions, and the physical dimensions of the structure under test. In this work, data-driven modeling techniques are utilized to reconstruct dispersion relations over a predetermined frequency range. The feasibility of this approach is demonstrated on a one-dimensional beam where an exact solution of the dispersion relations is attainable. Frequency response functions of the beam are obtained numerically over the frequency range of 0–50kHz. Data-driven dynamical model, constructed by the vector fitting approach, is then deployed to develop a state-space model based on the simulated frequency response functions at 16 locations along the beam. This model is then utilized to construct dispersion relations of the structure through a series of numerical simulations. The techniques discussed in this paper are especially beneficial to such scenarios where it is neither possible to find analytical solutions to wave equations, nor it is feasible to measure dispersion curves experimentally. In the present work, actual experimental data is left for future work, but the complete framework is presented here.


Author(s):  
A. R. Nemati ◽  
M. Zakeri Niri ◽  
S. Moazami

Simulation of rainfall-runoff process is one of the most important research fields in hydrology and water resources. Generally, the models used in this section are divided into two conceptual and data-driven categories. In this study, a conceptual model and two data-driven models have been used to simulate rainfall-runoff process in Tamer sub-catchment located in Gorganroud watershed in Iran. The conceptual model used is HEC-HMS, and data-driven models are neural network model of multi-layer Perceptron (MLP) and support vector regression (SVR). In addition to simulation of rainfall-runoff process using the recorded land precipitation, the performance of four satellite algorithms of precipitation, that is, CMORPH, PERSIANN, TRMM 3B42 and TRMM 3B42RT were studied. In simulation of rainfall-runoff process, calibration and accuracy of the models were done based on satellite data. The results of the research based on three criteria of correlation coefficient (R), root mean square error (RMSE) and mean absolute error (MAE) showed that in this part the two models of SVR and MLP could perform the simulation of runoff in a relatively appropriate way, but in simulation of the maximum values of the flow, the error of models increased.


Author(s):  
Bob Vergauwen ◽  
Oscar Mauricio Agudelo ◽  
Raj Thilak Rajan ◽  
Frank Pasveer ◽  
Bart De Moor

2022 ◽  
Vol 32 (1) ◽  
pp. 1-33
Author(s):  
Jinghui Zhong ◽  
Dongrui Li ◽  
Zhixing Huang ◽  
Chengyu Lu ◽  
Wentong Cai

Data-driven crowd modeling has now become a popular and effective approach for generating realistic crowd simulation and has been applied to a range of applications, such as anomaly detection and game design. In the past decades, a number of data-driven crowd modeling techniques have been proposed, providing many options for people to generate virtual crowd simulation. This article provides a comprehensive survey of these state-of-the-art data-driven modeling techniques. We first describe the commonly used datasets for crowd modeling. Then, we categorize and discuss the state-of-the-art data-driven crowd modeling methods. After that, data-driven crowd model validation techniques are discussed. Finally, six promising future research topics of data-driven crowd modeling are discussed.


2015 ◽  
Vol 17 (6) ◽  
pp. 943-958 ◽  
Author(s):  
Carolina Massmann

The main objective of this paper is assessing the usefulness of parameter sensitivity information from conceptual hydrological models for data-driven models, an approach which might allow us to take advantage of the strengths of both data-based and process-based models. This study uses the parameter sensitivity of three widely used conceptual hydrological models (GR4J, Hymod and SAC-SMA) and combines them with M5 model trees. The study was carried out for three case studies dealing with different problems to which model trees are applied: one using model trees as error correctors and two case studies in which model trees were used as rainfall–runoff models and which differ in how the sensitivity information is used. The results show that sensitivity time series can improve the predictions of M5 model trees, especially when they do not include the time series of previous discharge as predictor variables. The use of parameter sensitivity information for clustering the time series resulted in model trees that had a structure consistent with the hydrological processes that were taking place in the considered cluster, indicating that the use of sensitivity indices could be a viable way of introducing hydrological knowledge into data-based models.


Sign in / Sign up

Export Citation Format

Share Document