The skill of seasonal ensemble low flow forecasts for four different hydrological models
Abstract. This paper investigates the skill of 90 day low flow forecasts using two conceptual hydrological models and two data-driven models based on Artificial Neural Networks (ANNs) for the Moselle River. One data-driven model, ANN-Indicator (ANN-I), requires historical inputs on precipitation (P), potential evapotranspiration (PET), groundwater (G) and observed discharge (Q), whereas the other data-driven model, ANN-Ensemble (ANN-E), and the two conceptual models, HBV and GR4J, use forecasted meteorological inputs (P and PET), whereby we employ ensemble seasonal meteorological forecasts. We compared low flow forecasts without any meteorological forecasts as input (ANN-I) and five different cases of seasonal meteorological forcing: (1) ensemble P and PET forecasts; (2) ensemble P forecasts and observed climate mean PET; (3) observed climate mean P and ensemble PET forecasts; (4) observed climate mean P and PET and (5) zero P and ensemble PET forecasts as input for the other three models (GR4J, HBV and ANN-E). The ensemble P and PET forecasts, each consisting of 40 members, reveal the forecast ranges due to the model inputs. The five cases are compared for a lead time of 90 days based on model output ranges, whereas the four models are compared based on their skill of low flow forecasts for varying lead times up to 90 days. Before forecasting, the hydrological models are calibrated and validated for a period of 30 and 20 years respectively. The smallest difference between calibration and validation performance is found for HBV, whereas the largest difference is found for ANN-E. From the results, it appears that all models are prone to over-predict low flows using ensemble seasonal meteorological forcing. The largest range for 90 day low flow forecasts is found for the GR4J model when using ensemble seasonal meteorological forecasts as input. GR4J, HBV and ANN-E under-predicted 90 day ahead low flows in the very dry year 2003 without precipitation data, whereas ANN-I predicted the magnitude of the low flows better than the other three models. The results of the comparison of forecast skills with varying lead times show that GR4J is less skilful than ANN-E and HBV. Furthermore, the hit rate of ANN-E is higher than the two conceptual models for most lead times. However, ANN-I is not successful in distinguishing between low flow events and non-low flow events. Overall, the uncertainty from ensemble P forecasts has a larger effect on seasonal low flow forecasts than the uncertainty from ensemble PET forecasts and initial model conditions.