Accounting for high-dimensional predictors in RFA with MARS 

Author(s):  
Amina Msilini ◽  
Pierre Masselot ◽  
Taha B.M.J. Ouarda

<p>Hydrological processes and phenomena are naturally complex and nonlinear. Many physiographical variables such as those dealing with drainage network characteristics may influence streamflow characteristics and should be considered in regional frequency analysis (RFA). These variables have hence a significant impact on the effectiveness of flood quantile estimation techniques. Although many statistical tools are considered to estimate flood quantiles at ungauged sites in the hydrological literature, little attention has been given to the nonlinearity and to the high-dimensionality of physio-meteorological variable space. In this study, the multivariate adaptive regression splines (MARS) approach is introduced in RFA. This model allows to account simultaneously for non-linearity and interactions between variables hidden in high-dimensional data. MARS is hereby applied on two datasets of 151 hydrometric stations located in the southern part of the province of Quebec (Canada): a standard dataset (STA) including commonly used variables and an extended dataset (EXTD) combining STA with additional variables dealing with drainage network characteristics. It is then compared to generalized additive models (GAM), a state-of-the-art method for regional estimation. Numerical results show that MARS outperforms GAM, especially with the extensive database EXTD. The study suggests that MARS may be a promising tool to take into account the complexity of the hydrological phenomena involved and the increasing number of variables used in RFA.</p>

2015 ◽  
Vol 12 (10) ◽  
pp. 11083-11127 ◽  
Author(s):  
J. E. Shortridge ◽  
S. D. Guikema ◽  
B. F. Zaitchik

Abstract. In the past decade, certain methods for empirical rainfall–runoff modeling have seen extensive development and been proposed as a useful complement to physical hydrologic models, particularly in basins where data to support process-based models is limited. However, the majority of research has focused on a small number of methods, such as artificial neural networks, despite the development of multiple other approaches for non-parametric regression in recent years. Furthermore, this work has generally evaluated model performance based on predictive accuracy alone, while not considering broader objectives such as model interpretability and uncertainty that are important if such methods are to be used for planning and management decisions. In this paper, we use multiple regression and machine-learning approaches to simulate monthly streamflow in five highly-seasonal rivers in the highlands of Ethiopia and compare their performance in terms of predictive accuracy, error structure and bias, model interpretability, and uncertainty when faced with extreme climate conditions. While the relative predictive performance of models differed across basins, data-driven approaches were able to achieve reduced errors when compared to physical models developed for the region. Methods such as random forests and generalized additive models may have advantages in terms of visualization and interpretation of model structure, which can be useful in providing insights into physical watershed function. However, the uncertainty associated with model predictions under climate change should be carefully evaluated, since certain models (especially generalized additive models and multivariate adaptive regression splines) became highly variable when faced with high temperatures.


2016 ◽  
Vol 20 (7) ◽  
pp. 2611-2628 ◽  
Author(s):  
Julie E. Shortridge ◽  
Seth D. Guikema ◽  
Benjamin F. Zaitchik

Abstract. In the past decade, machine learning methods for empirical rainfall–runoff modeling have seen extensive development and been proposed as a useful complement to physical hydrologic models, particularly in basins where data to support process-based models are limited. However, the majority of research has focused on a small number of methods, such as artificial neural networks, despite the development of multiple other approaches for non-parametric regression in recent years. Furthermore, this work has often evaluated model performance based on predictive accuracy alone, while not considering broader objectives, such as model interpretability and uncertainty, that are important if such methods are to be used for planning and management decisions. In this paper, we use multiple regression and machine learning approaches (including generalized additive models, multivariate adaptive regression splines, artificial neural networks, random forests, and M5 cubist models) to simulate monthly streamflow in five highly seasonal rivers in the highlands of Ethiopia and compare their performance in terms of predictive accuracy, error structure and bias, model interpretability, and uncertainty when faced with extreme climate conditions. While the relative predictive performance of models differed across basins, data-driven approaches were able to achieve reduced errors when compared to physical models developed for the region. Methods such as random forests and generalized additive models may have advantages in terms of visualization and interpretation of model structure, which can be useful in providing insights into physical watershed function. However, the uncertainty associated with model predictions under extreme climate conditions should be carefully evaluated, since certain models (especially generalized additive models and multivariate adaptive regression splines) become highly variable when faced with high temperatures.


2020 ◽  
Vol 21 (12) ◽  
pp. 2777-2792
Author(s):  
A. Msilini ◽  
P. Masselot ◽  
T. B. M. J. Ouarda

AbstractHydrological systems are naturally complex and nonlinear. A large number of variables, many of which not yet well considered in regional frequency analysis (RFA), have a significant impact on hydrological dynamics and consequently on flood quantile estimates. Despite the increasing number of statistical tools used to estimate flood quantiles at ungauged sites, little attention has been dedicated to the development of new regional estimation (RE) models accounting for both nonlinear links and interactions between hydrological and physio-meteorological variables. The aim of this paper is to simultaneously take into account nonlinearity and interactions between variables by introducing the multivariate adaptive regression splines (MARS) approach in RFA. The predictive performances of MARS are compared with those obtained by one of the most robust RE models: the generalized additive model (GAM). Both approaches are applied to two datasets covering 151 hydrometric stations in the province of Quebec (Canada): a standard dataset (STA) containing commonly used variables and an extended dataset (EXTD) combining STA with additional variables dealing with drainage network characteristics. Results indicate that RE models using MARS with the EXTD outperform slightly RE models using GAM. Thus, MARS seems to allow for a better representation of the hydrological process and an increased predictive power in RFA.


Sign in / Sign up

Export Citation Format

Share Document