scholarly journals Generalization of the Discrete Brier and Ranked Probability Skill Scores for Weighted Multimodel Ensemble Forecasts

2007 ◽  
Vol 135 (7) ◽  
pp. 2778-2785 ◽  
Author(s):  
Andreas P. Weigel ◽  
Mark A. Liniger ◽  
Christof Appenzeller

Abstract This note describes how the widely used Brier and ranked probability skill scores (BSS and RPSS, respectively) can be correctly applied to quantify the potential skill of probabilistic multimodel ensemble forecasts. It builds upon the study of Weigel et al. where a revised RPSS, the so-called discrete ranked probability skill score (RPSSD), was derived, circumventing the known negative bias of the RPSS for small ensemble sizes. Since the BSS is a special case of the RPSS, a debiased discrete Brier skill score (BSSD) could be formulated in the same way. Here, the approach of Weigel et al., which so far was only applicable to single model ensembles, is generalized to weighted multimodel ensemble forecasts. By introducing an “effective ensemble size” characterizing the multimodel, the new generalized RPSSD can be expressed such that its structure becomes equivalent to the single model case. This is of practical importance for multimodel assessment studies, where the consequences of varying effective ensemble size need to be clearly distinguished from the true benefits of multimodel combination. The performance of the new generalized RPSSD formulation is illustrated in examples of weighted multimodel ensemble forecasts, both in a synthetic random forecasting context, and with real seasonal forecasts of operational models. A central conclusion of this study is that, for small ensemble sizes, multimodel assessment studies should not only be carried out on the basis of the classical RPSS, since true changes in predictability may be hidden by bias effects—a deficiency that can be overcome with the new generalized RPSSD.

2005 ◽  
Vol 18 (10) ◽  
pp. 1513-1523 ◽  
Author(s):  
W. A. Müller ◽  
C. Appenzeller ◽  
F. J. Doblas-Reyes ◽  
M. A. Liniger

Abstract The ranked probability skill score (RPSS) is a widely used measure to quantify the skill of ensemble forecasts. The underlying score is defined by the quadratic norm and is comparable to the mean squared error (mse) but it is applied in probability space. It is sensitive to the shape and the shift of the predicted probability distributions. However, the RPSS shows a negative bias for ensemble systems with small ensemble size, as recently shown. Here, two strategies are explored to tackle this flaw of the RPSS. First, the RPSS is examined for different norms L (RPSSL). It is shown that the RPSSL=1 based on the absolute rather than the squared difference between forecasted and observed cumulative probability distribution is unbiased; RPSSL defined with higher-order norms show a negative bias. However, the RPSSL=1 is not strictly proper in a statistical sense. A second approach is then investigated, which is based on the quadratic norm but with sampling errors in climatological probabilities considered in the reference forecasts. This technique is based on strictly proper scores and results in an unbiased skill score, which is denoted as the debiased ranked probability skill score (RPSSD) hereafter. Both newly defined skill scores are independent of the ensemble size, whereas the associated confidence intervals are a function of the ensemble size and the number of forecasts. The RPSSL=1 and the RPSSD are then applied to the winter mean [December–January–February (DJF)] near-surface temperature predictions of the ECMWF Seasonal Forecast System 2. The overall structures of the RPSSL=1 and the RPSSD are more consistent and largely independent of the ensemble size, unlike the RPSSL=2. Furthermore, the minimum ensemble size required to predict a climate anomaly given a known signal-to-noise ratio is determined by employing the new skill scores. For a hypothetical setup comparable to the ECMWF hindcast system (40 members and 15 hindcast years), statistically significant skill scores were only found for a signal-to-noise ratio larger than ∼0.3.


2013 ◽  
Vol 141 (10) ◽  
pp. 3477-3497 ◽  
Author(s):  
Mingyue Chen ◽  
Wanqiu Wang ◽  
Arun Kumar

Abstract An analysis of lagged ensemble seasonal forecasts from the National Centers for Environmental Prediction (NCEP) Climate Forecast System, version 2 (CFSv2), is presented. The focus of the analysis is on the construction of lagged ensemble forecasts with increasing lead time (thus allowing use of larger ensemble sizes) and its influence on seasonal prediction skill. Predictions of seasonal means of sea surface temperature (SST), 200-hPa height (z200), precipitation, and 2-m air temperature (T2m) over land are analyzed. Measures of prediction skill include deterministic (anomaly correlation and mean square error) and probabilistic [rank probability skill score (RPSS)]. The results show that for a fixed lead time, and as one would expect, the skill of seasonal forecast improves as the ensemble size increases, while for a fixed ensemble size the forecast skill decreases as the lead time becomes longer. However, when a forecast is based on a lagged ensemble, there exists an optimal lagged ensemble time (OLET) when positive influence of increasing ensemble size and negative influence due to an increasing lead time result in a maximum in seasonal prediction skill. The OLET is shown to depend on the geographical location and variable. For precipitation and T2m, OLET is relatively longer and skill gain is larger than that for SST and tropical z200. OLET is also dependent on the skill measure with RPSS having the longest OLET. Results of this analysis will be useful in providing guidelines on the design and understanding relative merits for different configuration of seasonal prediction systems.


2005 ◽  
Vol 18 (15) ◽  
pp. 2963-2978 ◽  
Author(s):  
T. E. LaRow ◽  
S. D. Cocke ◽  
D. W. Shin

Abstract A six-member multicoupled model ensemble is created by using six state-of-the-art deep atmospheric convective schemes. The six convective schemes are used inside a single model and make up the ensemble. This six-member ensemble is compared against a multianalysis ensemble, which is created by varying the initial start dates of the atmospheric component of the coupled model. Both ensembles were integrated for seven months (November–May) over a 12-yr period from 1987 to 1998. Examination of the sea surface temperature and precipitation show that while deterministic skill scores are slightly better for the multicoupled model ensemble the probabilistic skill scores favor the multimodel approach. Combining the two ensembles to create a larger ensemble size increases the probabilistic skill score compared to the multimodel. This altering physics approach to create a multimodel ensemble is seen as an easy way for small modeling centers to generate ensembles with better reliability than by only varying the initial conditions.


2012 ◽  
Vol 27 (1) ◽  
pp. 3-27 ◽  
Author(s):  
K. P. Sooraj ◽  
H. Annamalai ◽  
Arun Kumar ◽  
Hui Wang

Abstract The 15-member ensemble hindcasts performed with the National Centers for Environmental Prediction Climate Forecast System (CFS) for the period 1981–2005, as well as real-time forecasts for the period 2006–09, are assessed for seasonal prediction skills over the tropics, from deterministic (anomaly correlation), categorical (Heidke skill score), and probabilistic (rank probability skill score) perspectives. Further, persistence, signal-to-noise ratio, and root-mean-square error analyses are also performed. The CFS demonstrates high skill in forecasting El Niño–Southern Oscillation (ENSO) related sea surface temperature (SST) anomalies during developing and mature phases, including that of different types of El Niño. During ENSO, the space–time evolution of anomalous SST, 850-hPa wind, and rainfall along the equatorial Pacific, as well as the mechanisms involved in the teleconnection to the tropical Indian Ocean, are also well represented. During ENSO phase transition and in the summer, the skill of forecasting Pacific SST anomalies is modest. An examination of CFS ability in forecasting seasonal rainfall anomalies over the U.S. Affiliated Pacific Islands (USAPI) indicates that forecasting the persistence of dryness from El Niño winter into the following spring/summer is skillful at leads > 3 months. During strong El Niño years the persistence is predicted by all members with a 6-month lead time. Also, the model is skillful in predicting regional rainfall responses during different types of El Niño. Since both deterministic and probabilistic skill scores converge, the suggestion is that the forecast is useful. The model’s skill in the real-time forecasts for the period 2006–09 is also discussed. The results suggest the feasibility that a dynamical-system-based seasonal prediction of precipitation over the USAPI can be considered.


2020 ◽  
Vol 148 (6) ◽  
pp. 2591-2606 ◽  
Author(s):  
Luying Ji ◽  
Xiefei Zhi ◽  
Clemens Simmer ◽  
Shoupeng Zhu ◽  
Yan Ji

Abstract We analyzed 24-h accumulated precipitation forecasts over the 4-month period from 1 May to 31 August 2013 over an area located in East Asia covering the region 15.05°–58.95°N, 70.15°–139.95°E generated with the ensemble prediction systems (EPS) from ECMWF, NCEP, UKMO, JMA, and CMA contained in the TIGGE dataset. The forecasts are first evaluated with the Method for Object-Based Diagnostic Evaluation (MODE). Then a multimodel ensemble (MME) forecast technique that is based on weights derived from object-based scores is investigated and compared with the equally weighted MME and the traditional gridpoint-based MME forecast using weights derived from the point-to-point metric, mean absolute error (MAE). The object-based evaluation revealed that attributes of objects derived from the ensemble members of the five individual EPS forecasts and the observations differ consistently. For instance, their predicted centroid location is more southwestward, their shape is more circular, and their orientation is more meridional than in the observations. The sensitivity of the number of objects and their attributes to methodological parameters is also investigated. An MME prediction technique that is based on weights computed from the object-based scores, median of maximum interest, and object-based threat score is explored and the results are compared with the ensemble forecasts of the individual EPS, the equally weighted MME forecast, and the traditional superensemble forecast. When using MODE statistics for the forecast evaluation, the object-based MME prediction outperforms all other predictions. This is mainly because of a better prediction of the objects’ centroid locations. When using the precipitation-based fractions skill score, which is not used in either of the weighted MME forecasts, the object-based MME forecasts are slightly better than the equally weighted MME forecasts but are inferior to the traditional superensemble forecast that is based on weights derived from the point-to-point metric MAE.


2017 ◽  
Vol 21 (8) ◽  
pp. 4103-4114 ◽  
Author(s):  
Naze Candogan Yossef ◽  
Rens van Beek ◽  
Albrecht Weerts ◽  
Hessel Winsemius ◽  
Marc F. P. Bierkens

Abstract. In this study we assess the skill of seasonal streamflow forecasts with the global hydrological forecasting system Flood Early Warning System (FEWS)-World, which has been set up within the European Commission 7th Framework Programme Project Global Water Scarcity Information Service (GLOWASIS). FEWS-World incorporates the distributed global hydrological model PCR-GLOBWB (PCRaster Global Water Balance). We produce ensemble forecasts of monthly discharges for 20 large rivers of the world, with lead times of up to 6 months, forcing the system with bias-corrected seasonal meteorological forecast ensembles from the European Centre for Medium-range Weather Forecasts (ECMWF) and with probabilistic meteorological ensembles obtained following the ESP procedure. Here, the ESP ensembles, which contain no actual information on weather, serve as a benchmark to assess the additional skill that may be obtained using ECMWF seasonal forecasts. We use the Brier skill score (BSS) to quantify the skill of the system in forecasting high and low flows, defined as discharges higher than the 75th and lower than the 25th percentiles for a given month, respectively. We determine the theoretical skill by comparing the results against model simulations and the actual skill in comparison to discharge observations. We calculate the ratios of actual-to-theoretical skill in order to quantify the percentage of the potential skill that is achieved. The results suggest that the performance of ECMWF S3 forecasts is close to that of the ESP forecasts. While better meteorological forecasts could potentially lead to an improvement in hydrological forecasts, this cannot be achieved yet using the ECMWF S3 dataset.


2009 ◽  
Vol 137 (4) ◽  
pp. 1460-1479 ◽  
Author(s):  
Andreas P. Weigel ◽  
Mark A. Liniger ◽  
Christof Appenzeller

Abstract Multimodel ensemble combination (MMEC) has become an accepted technique to improve probabilistic forecasts from short- to long-range time scales. MMEC techniques typically widen ensemble spread, thus improving the dispersion characteristics and the reliability of the forecasts. This raises the question as to whether the same effect could be achieved in a potentially cheaper way by rescaling single model ensemble forecasts a posteriori such that they become reliable. In this study a climate conserving recalibration (CCR) technique is derived and compared with MMEC. With a simple stochastic toy model it is shown that both CCR and MMEC successfully improve forecast reliability. The difference between these two methods is that CCR conserves resolution but inevitably dilutes the potentially predictable signal while MMEC is in the ideal case able to fully retain the predictable signal and to improve resolution. Therefore, MMEC is conceptually to be preferred, particularly since the effect of CCR depends on the length of the data record and on distributional assumptions. In reality, however, multimodels consist only of a finite number of participating single models, and the model errors are often correlated. Under such conditions, and depending on the skill metric applied, CCR-corrected single models can on average have comparable skill as multimodel ensembles, particularly when the potential model predictability is low. Using seasonal near-surface temperature and precipitation forecasts of three models of the Development of a European Multimodel Ensemble System for Seasonal-to-Interannual Prediction (DEMETER) dataset, it is shown that the conclusions drawn from the toy-model experiments hold equally in a real multimodel ensemble prediction system. All in all, it is not possible to make a general statement on whether CCR or MMEC is the better method. Rather it seems that optimum forecasts can be obtained by a combination of both methods, but only if first MMEC and then CCR is applied. The opposite order—first CCR, then MMEC—is shown to be of only little effect, at least in the context of seasonal forecasts.


2019 ◽  
Vol 58 (8) ◽  
pp. 1709-1723 ◽  
Author(s):  
Dian Nur Ratri ◽  
Kirien Whan ◽  
Maurice Schmeits

AbstractDynamical seasonal forecasts are afflicted with biases, including seasonal ensemble precipitation forecasts from the new ECMWF seasonal forecast system 5 (SEAS5). In this study, biases have been corrected using empirical quantile mapping (EQM) bias correction (BC). We bias correct SEAS5 24-h rainfall accumulations at seven monthly lead times over the period 1981–2010 in Java, Indonesia. For the observations, we have used a new high-resolution (0.25°) land-only gridded rainfall dataset [Southeast Asia observations (SA-OBS)]. A comparative verification of both raw and bias-corrected reforecasts is performed using several verification metrics. In this verification, the daily rainfall data were aggregated to monthly accumulated rainfall. We focus on July, August, and September because these are agriculturally important months; if the rainfall accumulation exceeds 100 mm, farmers may decide to grow a third rice crop. For these months, the first 2-month lead times show improved and mostly positive continuous ranked probability skill scores after BC. According to the Brier skill score (BSS), the BC reforecasts improve upon the raw reforecasts for the lower precipitation thresholds at the 1-month lead time. Reliability diagrams show that the BC reforecasts have good reliability for events exceeding the agriculturally relevant 100-mm threshold. A cost/loss analysis, comparing the potential economic value of the raw and BC reforecasts for this same threshold, shows that the value of the BC reforecasts is larger than that of the raw ones, and that the BC reforecasts have value for a wider range of users at 1- to 7-month lead times.


2004 ◽  
Vol 85 (6) ◽  
pp. 853-872 ◽  
Author(s):  
T. N. Palmer ◽  
A. Alessandri ◽  
U. Andersen ◽  
P. Cantelaube ◽  
M. Davey ◽  
...  

A multi-model ensemble-based system for seasonal-to-interannual prediction has been developed in a joint European project known as DEMETER (Development of a European Multimodel Ensemble Prediction System for Seasonal to Interannual Prediction). The DEMETER system comprises seven global atmosphere–ocean coupled models, each running from an ensemble of initial conditions. Comprehensive hindcast evaluation demonstrates the enhanced reliability and skill of the multimodel ensemble over a more conventional single-model ensemble approach. In addition, innovative examples of the application of seasonal ensemble forecasts in malaria and crop yield prediction are discussed. The strategy followed in DEMETER deals with important problems such as communication across disciplines, downscaling of climate simulations, and use of probabilistic forecast information in the applications sector, illustrating the economic value of seasonal-to-interannual prediction for society as a whole.


2007 ◽  
Vol 135 (1) ◽  
pp. 118-124 ◽  
Author(s):  
Andreas P. Weigel ◽  
Mark A. Liniger ◽  
Christof Appenzeller

Abstract The Brier skill score (BSS) and the ranked probability skill score (RPSS) are widely used measures to describe the quality of categorical probabilistic forecasts. They quantify the extent to which a forecast strategy improves predictions with respect to a (usually climatological) reference forecast. The BSS can thereby be regarded as the special case of an RPSS with two forecast categories. From the work of Müller et al., it is known that the RPSS is negatively biased for ensemble prediction systems with small ensemble sizes, and that a debiased version, the RPSSD, can be obtained quasi empirically by random resampling from the reference forecast. In this paper, an analytical formula is derived to directly calculate the RPSS bias correction for any ensemble size and combination of probability categories, thus allowing an easy implementation of the RPSSD. The correction term itself is identified as the “intrinsic unreliability” of the ensemble prediction system. The performance of this new formulation of the RPSSD is illustrated in two examples. First, it is applied to a synthetic random white noise climate, and then, using the ECMWF Seasonal Forecast System 2, to seasonal predictions of near-surface temperature in several regions of different predictability. In both examples, the skill score is independent of ensemble size while the associated confidence thresholds decrease as the number of ensemble members and forecast/observation pairs increase.


Sign in / Sign up

Export Citation Format

Share Document