scholarly journals Probabilistic Hydrological Post-Processing at Scale: Why and How to Apply Machine-Learning Quantile Regression Algorithms

Water ◽  
2019 ◽  
Vol 11 (10) ◽  
pp. 2126 ◽  
Author(s):  
Georgia Papacharalampous ◽  
Hristos Tyralis ◽  
Andreas Langousis ◽  
Amithirigala W. Jayawardena ◽  
Bellie Sivakumar ◽  
...  

We conduct a large-scale benchmark experiment aiming to advance the use of machine-learning quantile regression algorithms for probabilistic hydrological post-processing “at scale” within operational contexts. The experiment is set up using 34-year-long daily time series of precipitation, temperature, evapotranspiration and streamflow for 511 catchments over the contiguous United States. Point hydrological predictions are obtained using the Génie Rural à 4 paramètres Journalier (GR4J) hydrological model and exploited as predictor variables within quantile regression settings. Six machine-learning quantile regression algorithms and their equal-weight combiner are applied to predict conditional quantiles of the hydrological model errors. The individual algorithms are quantile regression, generalized random forests for quantile regression, generalized random forests for quantile regression emulating quantile regression forests, gradient boosting machine, model-based boosting with linear models as base learners and quantile regression neural networks. The conditional quantiles of the hydrological model errors are transformed to conditional quantiles of daily streamflow, which are finally assessed using proper performance scores and benchmarking. The assessment concerns various levels of predictive quantiles and central prediction intervals, while it is made both independently of the flow magnitude and conditional upon this magnitude. Key aspects of the developed methodological framework are highlighted, and practical recommendations are formulated. In technical hydro-meteorological applications, the algorithms should be applied preferably in a way that maximizes the benefits and reduces the risks from their use. This can be achieved by (i) combining algorithms (e.g., by averaging their predictions) and (ii) integrating algorithms within systematic frameworks (i.e., by using the algorithms according to their identified skills), as our large-scale results point out.

2019 ◽  
Vol 577 ◽  
pp. 123957 ◽  
Author(s):  
Hristos Tyralis ◽  
Georgia Papacharalampous ◽  
Apostolos Burnetas ◽  
Andreas Langousis

2020 ◽  
Author(s):  
Florian Dupuy ◽  
Olivier Mestre ◽  
Léo Pfitzner

<p>Cloud cover is a crucial information for many applications such as planning land observation missions from space. However, cloud cover remains a challenging variable to forecast, and Numerical Weather Prediction (NWP) models suffer from significant biases, hence justifying the use of statistical post-processing techniques. In our application, the ground truth is a gridded cloud cover product derived from satellite observations over Europe, and predictors are spatial fields of various variables produced by ARPEGE (Météo-France global NWP) at the corresponding lead time.</p><p>In this study, ARPEGE cloud cover is post-processed using a convolutional neural network (CNN). CNN is the most popular machine learning tool to deal with images. In our case, CNN allows to integrate spatial information contained in NWP outputs. We show that a simple U-Net architecture produces significant improvements over Europe. Compared to the raw ARPEGE forecasts, MAE drops from 25.1 % to 17.8 % and RMSE decreases from 37.0 % to 31.6 %. Considering specific needs for earth observation, special interest was put on forecasts with low cloud cover conditions (< 10 %). For this particular nebulosity class, we show that hit rate jumps from 40.6 to 70.7 (which is the order of magnitude of what can be achieved using classical machine learning algorithms such as random forests) while false alarm decreases from 38.2 to 29.9. This is an excellent result, since improving hit rates by means of random forests usually also results in a slight increase of false alarms.</p>


2007 ◽  
Vol 135 (6) ◽  
pp. 2365-2378 ◽  
Author(s):  
P. Friederichs ◽  
A. Hense

Abstract A statistical downscaling approach for extremes using censored quantile regression is presented. Conditional quantiles of station data (e.g., daily precipitation sums) in Germany are estimated by means of the large-scale circulation as represented by the NCEP reanalysis data. It is shown that a mixed discrete–continuous response variable, such as a daily precipitation sum, can be statistically modeled by a censored variable. Furthermore, a conditional quantile skill score is formulated to assess the relative gain of a quantile forecast compared with a reference forecast. Just like multiple regression for expectation values, quantile regression provides a tool to formulate a model output statistics system for extremal quantiles.


2021 ◽  
Vol 596 ◽  
pp. 126086
Author(s):  
Jenny Sjåstad Hagen ◽  
Etienne Leblois ◽  
Deborah Lawrence ◽  
Dimitri Solomatine ◽  
Asgeir Sorteberg

2020 ◽  
Author(s):  
Andrew Bennett ◽  
Bart Nijssen ◽  
Yifan Cheng ◽  
Adi Stein ◽  
Marketa McGuire

<p>Water resources studies often rely on simulated streamflow from hydrologic models. Model-based streamflow estimates are often not directly usable in water resources studies because all models, no matter how well-calibrated, contain systematic errors. Water resources studies rely on simulated streamflow as inputs to compute reservoir releases and diversions and do not function well if those inputs are significantly biased in time and/or space. Post-processing is therefore used to reduce these systematic errors in model outputs. This post-processing step to remove model errors is typically referred to as bias-correction, and often impacts the entire distribution of flows rather than just the mean.</p><p>Existing post-processing techniques typically have three short-comings. First, simulated streamflow at unique locations are often bias-corrected independently, disregarding the connection between locations that is imposed by the river network. This destroys the spatial consistency of the streamflow across a river network. Second, bias-correction methods often rely on simple, time-invariant mappings between observed and simulated streamflow, without regard for the different hydrological processes that drive streamflow. For example, a hydrological model may have different systematic errors in representing snowmelt than in representing soil drainage, necessitating different corrections. Third, the application of a bias-correction method is often restricted to locations where observed and simulated streamflow exist, even though these locations represent only a small subset of streamflow input locations to a water resources model.</p><p>We present a post-processing method for streamflow that addresses all three of these shortcomings of existing streamflow bias-correction methods. The method accounts for the spatial relations imposed by the river network, allows for the incorporation of process-information, and applies the bias-correction for all reaches in a stream network. We develop a mapping from the modeled output at the gages with flow observations, which we use as the basis for training a machine learning (ML) model to perform the site-specific bias-correction. We then apply the ML model to local streamflow contributions for each river segment, including river segments without flow observations. Finally, we combine the local bias-corrections across the stream network, to create accumulated bias-corrected streamflow time series that are spatially-consistent across the stream network. We demonstrate our method for daily streamflow in a river basin in the western United States.</p>


2020 ◽  
Vol 590 ◽  
pp. 125206
Author(s):  
Shuyu Yang ◽  
Dawen Yang ◽  
Jinsong Chen ◽  
Jerasorn Santisirisomboon ◽  
Weiwei Lu ◽  
...  

2018 ◽  
Vol 22 (7) ◽  
pp. 3601-3617 ◽  
Author(s):  
Diana Lucatero ◽  
Henrik Madsen ◽  
Jens C. Refsgaard ◽  
Jacob Kidmose ◽  
Karsten H. Jensen

Abstract. In the present study we analyze the effect of bias adjustments in both meteorological and streamflow forecasts on the skill and statistical consistency of monthly streamflow and yearly minimum daily flow forecasts. Both raw and preprocessed meteorological seasonal forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF) are used as inputs to a spatially distributed, coupled surface–subsurface hydrological model based on the MIKE SHE code. Streamflow predictions are then generated up to 7 months in advance. In addition to this, we post-process streamflow predictions using an empirical quantile mapping technique. Bias, skill and statistical consistency are the qualities evaluated throughout the forecast-generating strategies and we analyze where the different strategies fall short to improve them. ECMWF System 4-based streamflow forecasts tend to show a lower accuracy level than those generated with an ensemble of historical observations, a method commonly known as ensemble streamflow prediction (ESP). This is particularly true at longer lead times, for the dry season and for streamflow stations that exhibit low hydrological model errors. Biases in the mean are better removed by post-processing that in turn is reflected in the higher level of statistical consistency. However, in general, the reduction of these biases is not sufficient to ensure a higher level of accuracy than the ESP forecasts. This is true for both monthly mean and minimum yearly streamflow forecasts. We discuss the importance of including a better estimation of the initial state of the catchment, which may increase the capability of the system to forecast streamflow at longer leads.


Author(s):  
Yuhang Zhang ◽  
Aizhong Ye

AbstractObtaining high-quality quantitative precipitation forecasts is a key precondition for hydrological forecast systems. Due to multisource uncertainties (e.g., initial conditions, model structures and parameters), raw forecasts are subject to systematic biases; hence, statistical post-processing is often required to reduce these errors before the forecasts can proceed to hydrological applications. Machine learning (ML) algorithms are canonical statistical models, and they are diverse in type and variation. It is important to verify and compare their performance in the same scenario (e.g., precipitation post-processing). In this paper, we conduct a large-scale comparison study for the major ML models with diverse model structures and regularization strategies as post-processors for improving the quality of precipitation forecasts. Specifically, we compare the efficiency and effectiveness of 21 ML algorithms on solving this task. Daily reforecast precipitation with lead times up to 8 days from the Global Ensemble Forecast System and corresponding observations are employed to determine the usability of different models in the Yalong River basin in China. The performance of each model is validated by a group of carefully designed experiments and statistical metrics. The results reveal that improvements in model structures are more effective than regularization strategies. Among these algorithms, the optimized extra-trees regressor exhibit the best performance, effectively reduce overestimation and achieve the best skill in forecasting precipitation. Eleven ensemble members and a 2-day time window can be used as predictors to obtain the best model performance. The systematic experiments and findings also offer useful guidelines for other related studies.


Author(s):  
Parthiban Loganathan ◽  
Amit Baburao Mahindrakar

Abstract The intercomparison of streamflow simulation and the prediction of discharge using various renowned machine learning techniques were performed. The daily streamflow discharge model was developed for 35 observation stations located in a large-scale river basin named Cauvery. Various hydrological indices were calculated for observed and predicted discharges for comparing and evaluating the replicability of local hydrological conditions. The model variance and bias observed from the proposed extreme gradient boosting decision tree model were less than 15%, which is compared with other machine learning techniques considered in this study. The model Nash–Sutcliffe efficiency and coefficient of determination values are above 0.7 for both the training and testing phases which demonstrate the effectiveness of model performance. The comparison of monthly observed and model-predicted discharges during the validation period illustrates the model's ability in representing the peaks and fall in high-, medium-, and low-flow zones. The assessment and comparison of hydrological indices between observed and predicted discharges illustrate the model's ability in representing the baseflow, high-spell, and low-spell statistics. Simulating streamflow and predicting discharge are essential for water resource planning and management, especially in large-scale river basins. The proposed machine learning technique demonstrates significant improvement in model efficiency by dropping variance and bias which, in turn, improves the replicability of local-scale hydrology.


Sign in / Sign up

Export Citation Format

Share Document