Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 1: Concepts and methodology

Abstract. A comprehensive data driven modeling experiment is presented in a two-part paper. In this first part, an extensive data-driven modeling experiment is proposed. The most important concerns regarding the way data driven modeling (DDM) techniques and data were handled, compared, and evaluated, and the basis on which findings and conclusions were drawn are discussed. A concise review of key articles that presented comparisons among various DDM techniques is presented. Six DDM techniques, namely, neural networks, genetic programming, evolutionary polynomial regression, support vector machines, M5 model trees, and K-nearest neighbors are proposed and explained. Multiple linear regression and naïve models are also suggested as baseline for comparison with the various techniques. Five datasets from Canada and Europe representing evapotranspiration, upper and lower layer soil moisture content, and rainfall-runoff process are described and proposed, in the second paper, for the modeling experiment. Twelve different realizations (groups) from each dataset are created by a procedure involving random sampling. Each group contains three subsets; training, cross-validation, and testing. Each modeling technique is proposed to be applied to each of the 12 groups of each dataset. This way, both prediction accuracy and uncertainty of the modeling techniques can be evaluated. The description of the datasets, the implementation of the modeling techniques, results and analysis, and the findings of the modeling experiment are deferred to the second part of this paper.

Download Full-text

Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology – Part 1: Concepts and methodology

Hydrology and Earth System Sciences Discussions ◽

10.5194/hessd-6-7055-2009 ◽

2009 ◽

Vol 6 (6) ◽

pp. 7055-7093 ◽

Cited By ~ 4

Author(s):

A. Elshorbagy ◽

G. Corzo ◽

S. Srinivasulu ◽

D. P. Solomatine

Keyword(s):

Polynomial Regression ◽

Predictive Accuracy ◽

Lower Layer ◽

Data Driven ◽

Support Vector ◽

K Nearest Neighbors ◽

Evolutionary Polynomial Regression ◽

Modeling Techniques ◽

Modeling Experiment ◽

Data Driven Modeling

Abstract. A comprehensive data driven modeling experiment is presented in two-part paper. In this first part, an extensive data-driven modeling experiment is proposed. The most important concerns regarding the way data driven modeling (DDM) techniques and data were handled, compared, and evaluated, and the basis on which findings and conclusions were drawn are discussed. A concise review of key articles that presented comparisons among various DDM techniques is presented. Six DDM techniques, namely, neural networks, genetic programming, evolutionary polynomial regression, support vector machines, M5 model trees, and K-nearest neighbors are proposed and explained. Multiple linear regression and naïve models are also suggested as baseline for comparison with the various techniques. Five datasets from Canada and Europe representing evapotranspiration, upper and lower layer soil moisture content, and rainfall-runoff process are described and proposed for the modeling experiment. Twelve different realizations (groups) from each dataset are created by a procedure involving random sampling. Each group contains three subsets; training, cross-validation, and testing. Each modeling technique is proposed to be applied to each of the 12 groups of each dataset. This way, both predictive accuracy and uncertainty of the modeling techniques can be evaluated. The implementation of the modeling techniques, results and analysis, and the findings of the modeling experiment are deferred to the second part of this paper.

Download Full-text

Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 2: Application

Hydrology and Earth System Sciences ◽

10.5194/hess-14-1943-2010 ◽

2010 ◽

Vol 14 (10) ◽

pp. 1943-1961 ◽

Cited By ~ 107

Author(s):

A. Elshorbagy ◽

G. Corzo ◽

S. Srinivasulu ◽

D. P. Solomatine

Keyword(s):

Soil Moisture ◽

Case Studies ◽

Data Driven ◽

Modeling Technique ◽

Actual Evapotranspiration ◽

Support Vector ◽

Rainfall Runoff ◽

Highly Nonlinear ◽

Modeling Techniques ◽

Data Driven Modeling

Abstract. In this second part of the two-part paper, the data driven modeling (DDM) experiment, presented and explained in the first part, is implemented. Inputs for the five case studies (half-hourly actual evapotranspiration, daily peat soil moisture, daily till soil moisture, and two daily rainfall-runoff datasets) are identified, either based on previous studies or using the mutual information content. Twelve groups (realizations) were randomly generated from each dataset by randomly sampling without replacement from the original dataset. Neural networks (ANNs), genetic programming (GP), evolutionary polynomial regression (EPR), Support vector machines (SVM), M5 model trees (M5), K-nearest neighbors (K-nn), and multiple linear regression (MLR) techniques are implemented and applied to each of the 12 realizations of each case study. The predictive accuracy and uncertainties of the various techniques are assessed using multiple average overall error measures, scatter plots, frequency distribution of model residuals, and the deterioration rate of prediction performance during the testing phase. Gamma test is used as a guide to assist in selecting the appropriate modeling technique. Unlike two nonlinear soil moisture case studies, the results of the experiment conducted in this research study show that ANNs were a sub-optimal choice for the actual evapotranspiration and the two rainfall-runoff case studies. GP is the most successful technique due to its ability to adapt the model complexity to the modeled data. EPR performance could be close to GP with datasets that are more linear than nonlinear. SVM is sensitive to the kernel choice and if appropriately selected, the performance of SVM can improve. M5 performs very well with linear and semi linear data, which cover wide range of hydrological situations. In highly nonlinear case studies, ANNs, K-nn, and GP could be more successful than other modeling techniques. K-nn is also successful in linear situations, and it should not be ignored as a potential modeling technique for hydrological applications.

Download Full-text

Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology – Part 2: Application

Hydrology and Earth System Sciences Discussions ◽

10.5194/hessd-6-7095-2009 ◽

2009 ◽

Vol 6 (6) ◽

pp. 7095-7142 ◽

Cited By ~ 6

Author(s):

A. Elshorbagy ◽

G. Corzo ◽

S. Srinivasulu ◽

D. P. Solomatine

Keyword(s):

Soil Moisture ◽

Case Studies ◽

Data Driven ◽

Modeling Technique ◽

Actual Evapotranspiration ◽

Support Vector ◽

Rainfall Runoff ◽

Highly Nonlinear ◽

Modeling Techniques ◽

Data Driven Modeling

Abstract. In this second part of the two-part paper, the data driven modeling (DDM) experiment, presented and explained in the first part, is implemented. Inputs for the five case studies (half-hourly actual evapotranspiration, daily peat soil moisture, daily till soil moisture, and two daily rainfall-runoff datasets) are identified, either based on previous studies or using the mutual information content. Twelve groups (realizations) were randomly generated from each dataset by randomly sampling without replacement from the original dataset. Neural networks (ANNs), genetic programming (GP), evolutionary polynomial regression (EPR), Support vector machines (SVM), M5 model trees (M5), K nearest neighbors (K-nn), and multiple linear regression (MLR) techniques are implemented and applied to each of the 12 realizations of each case study. The predictive accuracy and uncertainties of the various techniques are assessed using multiple average overall error measures, scatter plots, frequency distribution of model residuals, and the deterioration rate of prediction performance during the testing phase. Gamma test is used as a guide to assist in selecting the appropriate modeling technique. Unlike the two nonlinear soil moisture case studies, the results of the experiment conducted in this research study show that ANNs were a sub-optimal choice for the actual evapotranspiration and the two rainfall-runoff case studies. GP is the most successful technique due to its ability to adapt the model complexity to the modeled data. EPR performance could be close to GP with datasets that are more linear than nonlinear. SVM is sensitive to the kernel choice and if appropriately selected, the performance of SVM can improve. M5 performs very well with linear and semi linear data, which cover wide range of hydrological situations. In highly nonlinear case studies, ANNs, K-nn, and GP could be more successful than other modeling techniques. K-nn is also successful in linear situations, and it should not be ignored as a potential modeling technique for hydrological applications.

Download Full-text

Regional-Scale Mineral Prospectivity Mapping: Support Vector Machines and an Improved Data-Driven Multi-criteria Decision-Making Technique

Natural Resources Research ◽

10.1007/s11053-021-09842-4 ◽

2021 ◽

Author(s):

Reza Ghezelbash ◽

Abbas Maghsoudi ◽

Amirreza Bigdeli ◽

Emmanuel John M. Carranza

Keyword(s):

Decision Making ◽

Support Vector Machines ◽

Regional Scale ◽

Data Driven ◽

Support Vector ◽

Multi Criteria Decision Making ◽

Mineral Prospectivity Mapping ◽

Vector Machines ◽

Mineral Prospectivity ◽

Prospectivity Mapping

Download Full-text

Data-driven prognostic scheme for rolling-element bearings using a new health index and variants of least-square support vector machines

Mechanical Systems and Signal Processing ◽

10.1016/j.ymssp.2021.107853 ◽

2021 ◽

Vol 160 ◽

pp. 107853

Author(s):

M.M. Manjurul Islam ◽

Alexander E. Prosvirin ◽

Jong-Myon Kim

Keyword(s):

Support Vector Machines ◽

Least Square ◽

Data Driven ◽

Support Vector ◽

Rolling Element Bearings ◽

Health Index ◽

Vector Machines ◽

Rolling Element

Download Full-text

Persian Handwritten Number Recognition Using Adapted Framing Feature and Support Vector Machines

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026816500048 ◽

2016 ◽

Vol 15 (01) ◽

pp. 1650004 ◽

Cited By ~ 3

Author(s):

Hedieh Sajedi ◽

Mehran Bahador

Keyword(s):

Support Vector Machines ◽

Recognition Rate ◽

Nearest Neighbors ◽

Polynomial Kernel ◽

Support Vector ◽

K Nearest Neighbors ◽

New Approach ◽

Number Recognition ◽

Vector Machines

In this paper, a new approach for segmentation and recognition of Persian handwritten numbers is presented. This method utilizes the framing feature technique in combination with outer profile feature that we named this the adapted framing feature. In our proposed approach, segmentation of the numbers into digits has been carried out automatically. In the classification stage of the proposed method, Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN) are used. Experimentations are conducted on the IFHCDB database consisting 17,740 numeral images and HODA database consisting 102,352 numeral images. In isolated digit level on IFHCDB, the recognition rate of 99.27%, is achieved by using SVM with polynomial kernel. Furthermore, in isolated digit level on HODA, the recognition rate of 99.07% is achieved by using SVM with polynomial kernel. The experiments illustrate that applying our proposed method resulted higher accuracy compared to previous researches.

Download Full-text

Recognition of Gait Activities Using Acceleration Data from A Smartphone and A Wearable Device

Proceedings ◽

10.3390/proceedings2019031060 ◽

2019 ◽

Vol 31 (1) ◽

pp. 60 ◽

Cited By ~ 1

Author(s):

Irvin Hussein Lopez-Nava ◽

Matias Garcia-Constantino ◽

Jesus Favela

Keyword(s):

Assisted Living ◽

Inertial Sensor ◽

Ambient Assisted Living ◽

Human Gait ◽

Support Vector ◽

K Nearest Neighbors ◽

Acceleration Data ◽

Vector Machines ◽

Young Subjects ◽

Physical Spaces

Activity recognition is an important task in many fields, such as ambient intelligence, pervasive healthcare, and surveillance. In particular, the recognition of human gait can be useful to identify the characteristics of the places or physical spaces, such as whether the person is walking on level ground or walking down stairs in which people move. For example, ascending or descending stairs can be a risky activity for older adults because of a possible fall, which can have more severe consequences than if it occurred on a flat surface. While portable and wearable devices have been widely used to detect Activities of Daily Living (ADLs), few research works in the literature have focused on characterizing only actions of human gait. In the present study, a method for recognizing gait activities using acceleration data obtained from a smartphone and a wearable inertial sensor placed on the ankle of people is introduced. The acceleration signals were segmented based on the automatic detection of strides, also called gait cycles. Subsequently, a feature vector of the segmented signals was extracted, which was used to train four classifiers using the Naive Bayes, C4.5, Support Vector Machines, and K-Nearest Neighbors algorithms. Data was collected from seven young subjects who performed five gait activities: (i) going down an incline, (ii) going up an incline, (iii) walking on level ground, (iv) going down stairs, and (v) going up stairs. The results demonstrate the viability of using the proposed method and technologies in ambient assisted living contexts.

Download Full-text

Support Vector Machines and Affective Science

Emotion Review ◽

10.1177/1754073920930784 ◽

2020 ◽

Vol 12 (4) ◽

pp. 297-308

Author(s):

Chris H. Miller ◽

Matthew D. Sacchet ◽

Ian H. Gotlib

Keyword(s):

Support Vector Machines ◽

Reduction Technique ◽

Feature Reduction ◽

Data Driven ◽

Future Research ◽

Support Vector ◽

Predictive Algorithm ◽

Affective Science ◽

Vector Machines ◽

Potential Applications

Support vector machines (SVMs) are being used increasingly in affective science as a data-driven classification method and feature reduction technique. Whereas traditional statistical methods typically compare group averages on selected variables, SVMs use a predictive algorithm to learn multivariate patterns that optimally discriminate between groups. In this review, we provide a framework for understanding the methods of SVM-based analyses and summarize the findings of seminal studies that use SVMs for classification or data reduction in the behavioral and neural study of emotion and affective disorders. We conclude by discussing promising directions and potential applications of SVMs in future research in affective science.

Download Full-text

Data-Driven Modeling Techniques to Estimate Dispersion Relations of Structural Components

Volume 1: Development and Characterization of Multifunctional Materials; Modeling, Simulation, and Control of Adaptive Systems; Integrated System Design and Implementation ◽

10.1115/smasis2018-8135 ◽

2018 ◽

Author(s):

Vijaya V. N. Sriram Malladi ◽

Mohammad I. Albakri ◽

Pablo A. Tarazaga ◽

Serkan Gugercin

Keyword(s):

Frequency Response ◽

Dispersion Relations ◽

Data Driven ◽

Response Functions ◽

Structural Components ◽

Frequency Range ◽

Frequency Response Functions ◽

Material Inhomogeneity ◽

Modeling Techniques ◽

Data Driven Modeling

Dispersion relations describe the frequency-dependent nature of elastic waves propagating in structures. Experimental determination of dispersion relations of structural components, such as the floor of a building, can be a tedious task, due to material inhomogeneity, complex boundary conditions, and the physical dimensions of the structure under test. In this work, data-driven modeling techniques are utilized to reconstruct dispersion relations over a predetermined frequency range. The feasibility of this approach is demonstrated on a one-dimensional beam where an exact solution of the dispersion relations is attainable. Frequency response functions of the beam are obtained numerically over the frequency range of 0–50kHz. Data-driven dynamical model, constructed by the vector fitting approach, is then deployed to develop a state-space model based on the simulated frequency response functions at 16 locations along the beam. This model is then utilized to construct dispersion relations of the structure through a series of numerical simulations. The techniques discussed in this paper are especially beneficial to such scenarios where it is neither possible to find analytical solutions to wave equations, nor it is feasible to measure dispersion curves experimentally. In the present work, actual experimental data is left for future work, but the complete framework is presented here.

Download Full-text

Assessment of Interventions in Fuel Management Zones Using Remote Sensing

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9090533 ◽

2020 ◽

Vol 9 (9) ◽

pp. 533 ◽

Cited By ~ 2

Author(s):

Ricardo Afonso ◽

André Neves ◽

Carlos Viegas Damásio ◽

João Moura Pires ◽

Fernando Birra ◽

...

Keyword(s):

Satellite Images ◽

Vegetation Indices ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Support Vector ◽

Fuel Management ◽

K Nearest Neighbors ◽

Management Zones ◽

Vector Machines ◽

Sentinel 2

Every year, wildfires strike the Portuguese territory and are a concern for public entities and the population. To prevent a wildfire progression and minimize its impact, Fuel Management Zones (FMZs) have been stipulated, by law, around buildings, settlements, along national roads, and other infrastructures. FMZs require monitoring of the vegetation condition to promptly proceed with the maintenance and cleaning of these zones. To improve FMZ monitoring, this paper proposes the use of satellite images, such as the Sentinel-1 and Sentinel-2, along with vegetation indices and extracted temporal characteristics (max, min, mean and standard deviation) associated with the vegetation within and outside the FMZs and to determine if they were treated. These characteristics feed machine-learning algorithms, such as XGBoost, Support Vector Machines, K-nearest neighbors and Random Forest. The results show that it is possible to detect an intervention in an FMZ with high accuracy, namely with an F1-score ranging from 90% up to 94% and a Kappa ranging from 0.80 up to 0.89.

Download Full-text