scholarly journals A symbolic data-driven technique based on evolutionary polynomial regression

2006 ◽  
Vol 8 (3) ◽  
pp. 207-222 ◽  
Author(s):  
Orazio Giustolisi ◽  
Dragan A. Savic

This paper describes a new hybrid regression method that combines the best features of conventional numerical regression techniques with the genetic programming symbolic regression technique. The key idea is to employ an evolutionary computing methodology to search for a model of the system/process being modelled and to employ parameter estimation to obtain constants using least squares. The new technique, termed Evolutionary Polynomial Regression (EPR) overcomes shortcomings in the GP process, such as computational performance; number of evolutionary parameters to tune and complexity of the symbolic models. Similarly, it alleviates issues arising from numerical regression, including difficulties in using physical insight and over-fitting problems. This paper demonstrates that EPR is good, both in interpolating data and in scientific knowledge discovery. As an illustration, EPR is used to identify polynomial formulæ with progressively increasing levels of noise, to interpolate the Colebrook-White formula for a pipe resistance coefficient and to discover a formula for a resistance coefficient from experimental data.

2009 ◽  
Vol 6 (6) ◽  
pp. 7055-7093 ◽  
Author(s):  
A. Elshorbagy ◽  
G. Corzo ◽  
S. Srinivasulu ◽  
D. P. Solomatine

Abstract. A comprehensive data driven modeling experiment is presented in two-part paper. In this first part, an extensive data-driven modeling experiment is proposed. The most important concerns regarding the way data driven modeling (DDM) techniques and data were handled, compared, and evaluated, and the basis on which findings and conclusions were drawn are discussed. A concise review of key articles that presented comparisons among various DDM techniques is presented. Six DDM techniques, namely, neural networks, genetic programming, evolutionary polynomial regression, support vector machines, M5 model trees, and K-nearest neighbors are proposed and explained. Multiple linear regression and naïve models are also suggested as baseline for comparison with the various techniques. Five datasets from Canada and Europe representing evapotranspiration, upper and lower layer soil moisture content, and rainfall-runoff process are described and proposed for the modeling experiment. Twelve different realizations (groups) from each dataset are created by a procedure involving random sampling. Each group contains three subsets; training, cross-validation, and testing. Each modeling technique is proposed to be applied to each of the 12 groups of each dataset. This way, both predictive accuracy and uncertainty of the modeling techniques can be evaluated. The implementation of the modeling techniques, results and analysis, and the findings of the modeling experiment are deferred to the second part of this paper.


2010 ◽  
Vol 12 (4) ◽  
pp. 365-379 ◽  
Author(s):  
I. El-Baroudy ◽  
A. Elshorbagy ◽  
S. K. Carey ◽  
O. Giustolisi ◽  
D. Savic

Evapotranspiration is one of the main components of the hydrological cycle as it accounts for more than two-thirds of the precipitation losses at the global scale. Reliable estimates of actual evapotranspiration are crucial for effective watershed modelling and water resource management, yet direct measurements of the evapotranspiration losses are difficult and expensive. This research explores the utility and effectiveness of data-driven techniques in modelling actual evapotranspiration measured by an eddy covariance system. The authors compare the Evolutionary Polynomial Regression (EPR) performance to Artificial Neural Networks (ANNs) and Genetic Programming (GP). Furthermore, this research investigates the effect of previous states (time lags) of the meteorological input variables on characterizing actual evapotranspiration. The models developed using the EPR, based on the two case studies at the Mildred Lake mine, AB, Canada provided comparable performance to the models of GP and ANNs. Moreover, the EPR provided simpler models than those developed by the other data-driven techniques, particularly in one of the case studies. The inclusion of the previous states of the input variables slightly enhanced the performance of the developed model, which in turn indicates the dynamic nature of the evapotranspiration process.


2012 ◽  
Vol 43 (5) ◽  
pp. 589-602 ◽  
Author(s):  
S. Alvisi ◽  
E. Creaco ◽  
M. Franchini

A data-driven artificial neural network (ANN) model and a data-driven evolutionary polynomial regression (EPR) model are here used to set up two real-time crisp discharge forecasting models whose crisp parameters are estimated through the least-square criterion. In order to represent the total uncertainty of each model in performing the forecast, their parameters are then considered as grey numbers. Comparison of the results obtained through the application of the two models to a real case study shows that the crisp models based on ANN and EPR provide similar accuracy for short forecasting lead times; for long forecasting lead times, the performance of the EPR model deteriorates with respect to that of the ANN model. As regards the uncertainty bands produced by the grey formulation of the two data-driven models, it is shown that, in the ANN case, these bands are on average narrower than those obtained by using a standard technique such as the Box–Cox transformation of the errors; in the EPR case, these bands are on average larger. These results therefore suggest that the performance of a grey data-driven model depends on its inner structure and that, for the specific models here considered, the ANN is to be preferred.


2010 ◽  
Vol 14 (10) ◽  
pp. 1931-1941 ◽  
Author(s):  
A. Elshorbagy ◽  
G. Corzo ◽  
S. Srinivasulu ◽  
D. P. Solomatine

Abstract. A comprehensive data driven modeling experiment is presented in a two-part paper. In this first part, an extensive data-driven modeling experiment is proposed. The most important concerns regarding the way data driven modeling (DDM) techniques and data were handled, compared, and evaluated, and the basis on which findings and conclusions were drawn are discussed. A concise review of key articles that presented comparisons among various DDM techniques is presented. Six DDM techniques, namely, neural networks, genetic programming, evolutionary polynomial regression, support vector machines, M5 model trees, and K-nearest neighbors are proposed and explained. Multiple linear regression and naïve models are also suggested as baseline for comparison with the various techniques. Five datasets from Canada and Europe representing evapotranspiration, upper and lower layer soil moisture content, and rainfall-runoff process are described and proposed, in the second paper, for the modeling experiment. Twelve different realizations (groups) from each dataset are created by a procedure involving random sampling. Each group contains three subsets; training, cross-validation, and testing. Each modeling technique is proposed to be applied to each of the 12 groups of each dataset. This way, both prediction accuracy and uncertainty of the modeling techniques can be evaluated. The description of the datasets, the implementation of the modeling techniques, results and analysis, and the findings of the modeling experiment are deferred to the second part of this paper.


2009 ◽  
Vol 11 (3-4) ◽  
pp. 225-236 ◽  
Author(s):  
O. Giustolisi ◽  
D. A. Savic

Evolutionary Polynomial Regression (EPR) is a recently developed hybrid regression method that combines the best features of conventional numerical regression techniques with the genetic programming/symbolic regression technique. The original version of EPR works with formulae based on true or pseudo-polynomial expressions using a single-objective genetic algorithm. Therefore, to obtain a set of formulae with a variable number of pseudo-polynomial coefficients, the sequential search is performed in the formulae space. This article presents an improved EPR strategy that uses a multi-objective genetic algorithm instead. We demonstrate that multi-objective approach is a more feasible instrument for data analysis and model selection. Moreover, we show that EPR can also allow for simple uncertainty analysis (since it returns polynomial structures that are linear with respect to the estimated coefficients). The methodology is tested and the results are reported in a case study relating groundwater level predictions to total monthly rainfall.


2007 ◽  
Vol 4 (1) ◽  
pp. 189-210 ◽  
Author(s):  
X. Liu ◽  
P. Coulibaly ◽  
N. Evora

Abstract. This study investigates dynamically different data-driven methods, specifically a statistical downscaling model (SDSM), a time lagged feedforward neural network (TLFN), and an evolutionary polynomial regression (EPR) technique for downscaling numerical weather ensemble forecasts generated by a medium range forecast (MRF) model. Given the coarse resolution (about 200-km grid spacing) of the MRF model, an optimal use of the weather forecasts at the local or watershed scale, requires appropriate downscaling techniques. The selected methods are applied for downscaling ensemble daily precipitation and temperature series for the Chute-du-Diable basin located in northeastern Canada. The downscaling results show that the TLFN and EPR have similar performance in downscaling ensemble daily precipitation as well as daily maximum and minimum temperature series whatever the season. Both the TLFN and EPR are more efficient downscaling techniques than SDSM for both the ensemble daily precipitation and temperature.


2008 ◽  
Vol 12 (2) ◽  
pp. 615-624 ◽  
Author(s):  
◽  
P. Coulibaly ◽  
N. Evora

Abstract. This study investigates dynamically different data-driven methods, specifically a statistical downscaling model (SDSM), a time lagged feedforward neural network (TLFN), and an evolutionary polynomial regression (EPR) technique for downscaling numerical weather ensemble forecasts generated by a medium range forecast (MRF) model. Given the coarse resolution (about 200-km grid spacing) of the MRF model, an optimal use of the weather forecasts at the local or watershed scale, requires appropriate downscaling techniques. The selected methods are applied for downscaling ensemble daily precipitation and temperature series for the Chute-du-Diable basin located in northeastern Canada. The downscaling results show that the TLFN and EPR have similar performance in downscaling ensemble daily precipitation as well as daily maximum and minimum temperature series whatever the season. Both the TLFN and EPR are more efficient downscaling techniques than SDSM for both the ensemble daily precipitation and temperature.


Sign in / Sign up

Export Citation Format

Share Document