scholarly journals Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 1: Concepts and methodology

2010 ◽  
Vol 14 (10) ◽  
pp. 1931-1941 ◽  
Author(s):  
A. Elshorbagy ◽  
G. Corzo ◽  
S. Srinivasulu ◽  
D. P. Solomatine

Abstract. A comprehensive data driven modeling experiment is presented in a two-part paper. In this first part, an extensive data-driven modeling experiment is proposed. The most important concerns regarding the way data driven modeling (DDM) techniques and data were handled, compared, and evaluated, and the basis on which findings and conclusions were drawn are discussed. A concise review of key articles that presented comparisons among various DDM techniques is presented. Six DDM techniques, namely, neural networks, genetic programming, evolutionary polynomial regression, support vector machines, M5 model trees, and K-nearest neighbors are proposed and explained. Multiple linear regression and naïve models are also suggested as baseline for comparison with the various techniques. Five datasets from Canada and Europe representing evapotranspiration, upper and lower layer soil moisture content, and rainfall-runoff process are described and proposed, in the second paper, for the modeling experiment. Twelve different realizations (groups) from each dataset are created by a procedure involving random sampling. Each group contains three subsets; training, cross-validation, and testing. Each modeling technique is proposed to be applied to each of the 12 groups of each dataset. This way, both prediction accuracy and uncertainty of the modeling techniques can be evaluated. The description of the datasets, the implementation of the modeling techniques, results and analysis, and the findings of the modeling experiment are deferred to the second part of this paper.

2009 ◽  
Vol 6 (6) ◽  
pp. 7055-7093 ◽  
Author(s):  
A. Elshorbagy ◽  
G. Corzo ◽  
S. Srinivasulu ◽  
D. P. Solomatine

Abstract. A comprehensive data driven modeling experiment is presented in two-part paper. In this first part, an extensive data-driven modeling experiment is proposed. The most important concerns regarding the way data driven modeling (DDM) techniques and data were handled, compared, and evaluated, and the basis on which findings and conclusions were drawn are discussed. A concise review of key articles that presented comparisons among various DDM techniques is presented. Six DDM techniques, namely, neural networks, genetic programming, evolutionary polynomial regression, support vector machines, M5 model trees, and K-nearest neighbors are proposed and explained. Multiple linear regression and naïve models are also suggested as baseline for comparison with the various techniques. Five datasets from Canada and Europe representing evapotranspiration, upper and lower layer soil moisture content, and rainfall-runoff process are described and proposed for the modeling experiment. Twelve different realizations (groups) from each dataset are created by a procedure involving random sampling. Each group contains three subsets; training, cross-validation, and testing. Each modeling technique is proposed to be applied to each of the 12 groups of each dataset. This way, both predictive accuracy and uncertainty of the modeling techniques can be evaluated. The implementation of the modeling techniques, results and analysis, and the findings of the modeling experiment are deferred to the second part of this paper.


2010 ◽  
Vol 14 (10) ◽  
pp. 1943-1961 ◽  
Author(s):  
A. Elshorbagy ◽  
G. Corzo ◽  
S. Srinivasulu ◽  
D. P. Solomatine

Abstract. In this second part of the two-part paper, the data driven modeling (DDM) experiment, presented and explained in the first part, is implemented. Inputs for the five case studies (half-hourly actual evapotranspiration, daily peat soil moisture, daily till soil moisture, and two daily rainfall-runoff datasets) are identified, either based on previous studies or using the mutual information content. Twelve groups (realizations) were randomly generated from each dataset by randomly sampling without replacement from the original dataset. Neural networks (ANNs), genetic programming (GP), evolutionary polynomial regression (EPR), Support vector machines (SVM), M5 model trees (M5), K-nearest neighbors (K-nn), and multiple linear regression (MLR) techniques are implemented and applied to each of the 12 realizations of each case study. The predictive accuracy and uncertainties of the various techniques are assessed using multiple average overall error measures, scatter plots, frequency distribution of model residuals, and the deterioration rate of prediction performance during the testing phase. Gamma test is used as a guide to assist in selecting the appropriate modeling technique. Unlike two nonlinear soil moisture case studies, the results of the experiment conducted in this research study show that ANNs were a sub-optimal choice for the actual evapotranspiration and the two rainfall-runoff case studies. GP is the most successful technique due to its ability to adapt the model complexity to the modeled data. EPR performance could be close to GP with datasets that are more linear than nonlinear. SVM is sensitive to the kernel choice and if appropriately selected, the performance of SVM can improve. M5 performs very well with linear and semi linear data, which cover wide range of hydrological situations. In highly nonlinear case studies, ANNs, K-nn, and GP could be more successful than other modeling techniques. K-nn is also successful in linear situations, and it should not be ignored as a potential modeling technique for hydrological applications.


2009 ◽  
Vol 6 (6) ◽  
pp. 7095-7142 ◽  
Author(s):  
A. Elshorbagy ◽  
G. Corzo ◽  
S. Srinivasulu ◽  
D. P. Solomatine

Abstract. In this second part of the two-part paper, the data driven modeling (DDM) experiment, presented and explained in the first part, is implemented. Inputs for the five case studies (half-hourly actual evapotranspiration, daily peat soil moisture, daily till soil moisture, and two daily rainfall-runoff datasets) are identified, either based on previous studies or using the mutual information content. Twelve groups (realizations) were randomly generated from each dataset by randomly sampling without replacement from the original dataset. Neural networks (ANNs), genetic programming (GP), evolutionary polynomial regression (EPR), Support vector machines (SVM), M5 model trees (M5), K nearest neighbors (K-nn), and multiple linear regression (MLR) techniques are implemented and applied to each of the 12 realizations of each case study. The predictive accuracy and uncertainties of the various techniques are assessed using multiple average overall error measures, scatter plots, frequency distribution of model residuals, and the deterioration rate of prediction performance during the testing phase. Gamma test is used as a guide to assist in selecting the appropriate modeling technique. Unlike the two nonlinear soil moisture case studies, the results of the experiment conducted in this research study show that ANNs were a sub-optimal choice for the actual evapotranspiration and the two rainfall-runoff case studies. GP is the most successful technique due to its ability to adapt the model complexity to the modeled data. EPR performance could be close to GP with datasets that are more linear than nonlinear. SVM is sensitive to the kernel choice and if appropriately selected, the performance of SVM can improve. M5 performs very well with linear and semi linear data, which cover wide range of hydrological situations. In highly nonlinear case studies, ANNs, K-nn, and GP could be more successful than other modeling techniques. K-nn is also successful in linear situations, and it should not be ignored as a potential modeling technique for hydrological applications.


Author(s):  
Hedieh Sajedi ◽  
Mehran Bahador

In this paper, a new approach for segmentation and recognition of Persian handwritten numbers is presented. This method utilizes the framing feature technique in combination with outer profile feature that we named this the adapted framing feature. In our proposed approach, segmentation of the numbers into digits has been carried out automatically. In the classification stage of the proposed method, Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN) are used. Experimentations are conducted on the IFHCDB database consisting 17,740 numeral images and HODA database consisting 102,352 numeral images. In isolated digit level on IFHCDB, the recognition rate of 99.27%, is achieved by using SVM with polynomial kernel. Furthermore, in isolated digit level on HODA, the recognition rate of 99.07% is achieved by using SVM with polynomial kernel. The experiments illustrate that applying our proposed method resulted higher accuracy compared to previous researches.


Proceedings ◽  
2019 ◽  
Vol 31 (1) ◽  
pp. 60 ◽  
Author(s):  
Irvin Hussein Lopez-Nava ◽  
Matias Garcia-Constantino ◽  
Jesus Favela

Activity recognition is an important task in many fields, such as ambient intelligence, pervasive healthcare, and surveillance. In particular, the recognition of human gait can be useful to identify the characteristics of the places or physical spaces, such as whether the person is walking on level ground or walking down stairs in which people move. For example, ascending or descending stairs can be a risky activity for older adults because of a possible fall, which can have more severe consequences than if it occurred on a flat surface. While portable and wearable devices have been widely used to detect Activities of Daily Living (ADLs), few research works in the literature have focused on characterizing only actions of human gait. In the present study, a method for recognizing gait activities using acceleration data obtained from a smartphone and a wearable inertial sensor placed on the ankle of people is introduced. The acceleration signals were segmented based on the automatic detection of strides, also called gait cycles. Subsequently, a feature vector of the segmented signals was extracted, which was used to train four classifiers using the Naive Bayes, C4.5, Support Vector Machines, and K-Nearest Neighbors algorithms. Data was collected from seven young subjects who performed five gait activities: (i) going down an incline, (ii) going up an incline, (iii) walking on level ground, (iv) going down stairs, and (v) going up stairs. The results demonstrate the viability of using the proposed method and technologies in ambient assisted living contexts.


2020 ◽  
Vol 12 (4) ◽  
pp. 297-308
Author(s):  
Chris H. Miller ◽  
Matthew D. Sacchet ◽  
Ian H. Gotlib

Support vector machines (SVMs) are being used increasingly in affective science as a data-driven classification method and feature reduction technique. Whereas traditional statistical methods typically compare group averages on selected variables, SVMs use a predictive algorithm to learn multivariate patterns that optimally discriminate between groups. In this review, we provide a framework for understanding the methods of SVM-based analyses and summarize the findings of seminal studies that use SVMs for classification or data reduction in the behavioral and neural study of emotion and affective disorders. We conclude by discussing promising directions and potential applications of SVMs in future research in affective science.


Author(s):  
Vijaya V. N. Sriram Malladi ◽  
Mohammad I. Albakri ◽  
Pablo A. Tarazaga ◽  
Serkan Gugercin

Dispersion relations describe the frequency-dependent nature of elastic waves propagating in structures. Experimental determination of dispersion relations of structural components, such as the floor of a building, can be a tedious task, due to material inhomogeneity, complex boundary conditions, and the physical dimensions of the structure under test. In this work, data-driven modeling techniques are utilized to reconstruct dispersion relations over a predetermined frequency range. The feasibility of this approach is demonstrated on a one-dimensional beam where an exact solution of the dispersion relations is attainable. Frequency response functions of the beam are obtained numerically over the frequency range of 0–50kHz. Data-driven dynamical model, constructed by the vector fitting approach, is then deployed to develop a state-space model based on the simulated frequency response functions at 16 locations along the beam. This model is then utilized to construct dispersion relations of the structure through a series of numerical simulations. The techniques discussed in this paper are especially beneficial to such scenarios where it is neither possible to find analytical solutions to wave equations, nor it is feasible to measure dispersion curves experimentally. In the present work, actual experimental data is left for future work, but the complete framework is presented here.


2020 ◽  
Vol 9 (9) ◽  
pp. 533 ◽  
Author(s):  
Ricardo Afonso ◽  
André Neves ◽  
Carlos Viegas Damásio ◽  
João Moura Pires ◽  
Fernando Birra ◽  
...  

Every year, wildfires strike the Portuguese territory and are a concern for public entities and the population. To prevent a wildfire progression and minimize its impact, Fuel Management Zones (FMZs) have been stipulated, by law, around buildings, settlements, along national roads, and other infrastructures. FMZs require monitoring of the vegetation condition to promptly proceed with the maintenance and cleaning of these zones. To improve FMZ monitoring, this paper proposes the use of satellite images, such as the Sentinel-1 and Sentinel-2, along with vegetation indices and extracted temporal characteristics (max, min, mean and standard deviation) associated with the vegetation within and outside the FMZs and to determine if they were treated. These characteristics feed machine-learning algorithms, such as XGBoost, Support Vector Machines, K-nearest neighbors and Random Forest. The results show that it is possible to detect an intervention in an FMZ with high accuracy, namely with an F1-score ranging from 90% up to 94% and a Kappa ranging from 0.80 up to 0.89.


Sign in / Sign up

Export Citation Format

Share Document