The Discrete Brier and Ranked Probability Skill Scores

Abstract The Brier skill score (BSS) and the ranked probability skill score (RPSS) are widely used measures to describe the quality of categorical probabilistic forecasts. They quantify the extent to which a forecast strategy improves predictions with respect to a (usually climatological) reference forecast. The BSS can thereby be regarded as the special case of an RPSS with two forecast categories. From the work of Müller et al., it is known that the RPSS is negatively biased for ensemble prediction systems with small ensemble sizes, and that a debiased version, the RPSSD, can be obtained quasi empirically by random resampling from the reference forecast. In this paper, an analytical formula is derived to directly calculate the RPSS bias correction for any ensemble size and combination of probability categories, thus allowing an easy implementation of the RPSSD. The correction term itself is identified as the “intrinsic unreliability” of the ensemble prediction system. The performance of this new formulation of the RPSSD is illustrated in two examples. First, it is applied to a synthetic random white noise climate, and then, using the ECMWF Seasonal Forecast System 2, to seasonal predictions of near-surface temperature in several regions of different predictability. In both examples, the skill score is independent of ensemble size while the associated confidence thresholds decrease as the number of ensemble members and forecast/observation pairs increase.

Download Full-text

Merits of a 108-Member Ensemble System in ENSO and IOD Predictions

Journal of Climate ◽

10.1175/jcli-d-18-0193.1 ◽

2019 ◽

Vol 32 (3) ◽

pp. 957-972 ◽

Cited By ~ 14

Author(s):

Takeshi Doi ◽

Swadhin K. Behera ◽

Toshio Yamagata

Keyword(s):

Seasonal Prediction ◽

Ensemble Prediction ◽

Economic Losses ◽

Prediction System ◽

Ensemble Size ◽

Ensemble Prediction System ◽

Skill Scores ◽

Operational Systems ◽

The Indian Ocean ◽

Climate Events

This paper explores merits of 100-ensemble simulations from a single dynamical seasonal prediction system by evaluating differences in skill scores between ensembles predictions with few (~10) and many (~100) ensemble members. A 100-ensemble retrospective seasonal forecast experiment for 1983–2015 is beyond current operational capability. Prediction of extremely strong ENSO and the Indian Ocean dipole (IOD) events is significantly improved in the larger ensemble. It indicates that the ensemble size of 10 members, used in some operational systems, is not adequate for the occurrence of 15% tails of extreme climate events, because only about 1 or 2 members (approximately 15% of 12) will agree with the observations. We also showed an ensemble size of about 50 members may be adequate for the extreme El Niño and positive IOD predictions at least in the present prediction system. Even if running a large-ensemble prediction system is quite costly, improved prediction of disastrous extreme events is useful for minimizing risks of possible human and economic losses.

Download Full-text

Representation of model error in a convective-scale ensemble prediction system

Nonlinear Processes in Geophysics ◽

10.5194/npg-21-19-2014 ◽

2014 ◽

Vol 21 (1) ◽

pp. 19-39 ◽

Cited By ~ 30

Author(s):

L. H. Baker ◽

A. C. Rudd ◽

S. Migliorini ◽

R. N. Bannister

Keyword(s):

Model Error ◽

Ensemble Prediction ◽

Initial Condition ◽

Prediction System ◽

Ensemble Prediction System ◽

Near Surface ◽

Skill Scores ◽

The Uk ◽

Parameter Values ◽

The Impact

Abstract. In this paper ensembles of forecasts (of up to six hours) are studied from a convection-permitting model with a representation of model error due to unresolved processes. The ensemble prediction system (EPS) used is an experimental convection-permitting version of the UK Met Office's 24-member Global and Regional Ensemble Prediction System (MOGREPS). The method of representing model error variability, which perturbs parameters within the model's parameterisation schemes, has been modified and we investigate the impact of applying this scheme in different ways. These are: a control ensemble where all ensemble members have the same parameter values; an ensemble where the parameters are different between members, but fixed in time; and ensembles where the parameters are updated randomly every 30 or 60 min. The choice of parameters and their ranges of variability have been determined from expert opinion and parameter sensitivity tests. A case of frontal rain over the southern UK has been chosen, which has a multi-banded rainfall structure. The consequences of including model error variability in the case studied are mixed and are summarised as follows. The multiple banding, evident in the radar, is not captured for any single member. However, the single band is positioned in some members where a secondary band is present in the radar. This is found for all ensembles studied. Adding model error variability with fixed parameters in time does increase the ensemble spread for near-surface variables like wind and temperature, but can actually decrease the spread of the rainfall. Perturbing the parameters periodically throughout the forecast does not further increase the spread and exhibits "jumpiness" in the spread at times when the parameters are perturbed. Adding model error variability gives an improvement in forecast skill after the first 2–3 h of the forecast for near-surface temperature and relative humidity. For precipitation skill scores, adding model error variability has the effect of improving the skill in the first 1–2 h of the forecast, but then of reducing the skill after that. Complementary experiments were performed where the only difference between members was the set of parameter values (i.e. no initial condition variability). The resulting spread was found to be significantly less than the spread from initial condition variability alone.

Download Full-text

The Impact of Horizontal Resolution and Ensemble Size on Probabilistic Forecasts of Precipitation by the ECMWF Ensemble Prediction System

Weather and Forecasting ◽

10.1175/1520-0434(2002)017<0173:tiohra>2.0.co;2 ◽

2002 ◽

Vol 17 (2) ◽

pp. 173-191 ◽

Cited By ~ 52

Author(s):

Steven L. Mullen ◽

Roberto Buizza

Keyword(s):

Horizontal Resolution ◽

Ensemble Prediction ◽

Prediction System ◽

Ensemble Size ◽

Ensemble Prediction System ◽

Probabilistic Forecasts ◽

The Impact

Download Full-text

Multimodel Ensembling of Subseasonal Precipitation Forecasts over North America

Monthly Weather Review ◽

10.1175/mwr-d-17-0092.1 ◽

2017 ◽

Vol 145 (10) ◽

pp. 3913-3928 ◽

Cited By ~ 31

Author(s):

N. Vigaud ◽

A. W. Robertson ◽

M. K. Tippett

Keyword(s):

United States ◽

Lead Time ◽

Grid Point ◽

Ensemble Prediction ◽

Southwestern United States ◽

Good Reliability ◽

Ensemble Prediction System ◽

Regression Parameters ◽

Skill Scores ◽

Probabilistic Forecasts

Probabilistic forecasts of weekly and week 3–4 averages of precipitation are constructed using extended logistic regression (ELR) applied to three models (ECMWF, NCEP, and CMA) from the Subseasonal-to-Seasonal (S2S) project. Individual and multimodel ensemble (MME) forecasts are verified over the common period 1999–2010. The regression parameters are fitted separately at each grid point and lead time for the three ensemble prediction system (EPS) reforecasts with starts during January–March and July–September. The ELR produces tercile category probabilities for each model that are then averaged with equal weighting. The resulting MME forecasts are characterized by good reliability but low sharpness. A clear benefit of multimodel ensembling is to largely remove negative skill scores present in individual forecasts. The forecast skill of weekly averages is higher in winter than summer and decreases with lead time, with steep decreases after one and two weeks. Week 3–4 forecasts have more skill along the U.S. East Coast and the southwestern United States in winter, as well as over west/central U.S. regions and the intra-American sea/east Pacific during summer. Skill is also enhanced when the regression parameters are fit using spatially smoothed observations and forecasts. The skill of week 3–4 precipitation outlooks has a modest, but statistically significant, relation with ENSO and the MJO, particularly in winter over the southwestern United States.

Download Full-text

Seasonal Ensemble Forecasts: Are Recalibrated Single Models Better than Multimodels?

Monthly Weather Review ◽

10.1175/2008mwr2773.1 ◽

2009 ◽

Vol 137 (4) ◽

pp. 1460-1479 ◽

Cited By ~ 45

Author(s):

Andreas P. Weigel ◽

Mark A. Liniger ◽

Christof Appenzeller

Keyword(s):

Ensemble Prediction ◽

Model Errors ◽

Seasonal Forecasts ◽

Multimodel Ensemble ◽

Ensemble Forecasts ◽

Ensemble Prediction System ◽

Near Surface ◽

Temperature And Precipitation ◽

Probabilistic Forecasts ◽

The Difference

Abstract Multimodel ensemble combination (MMEC) has become an accepted technique to improve probabilistic forecasts from short- to long-range time scales. MMEC techniques typically widen ensemble spread, thus improving the dispersion characteristics and the reliability of the forecasts. This raises the question as to whether the same effect could be achieved in a potentially cheaper way by rescaling single model ensemble forecasts a posteriori such that they become reliable. In this study a climate conserving recalibration (CCR) technique is derived and compared with MMEC. With a simple stochastic toy model it is shown that both CCR and MMEC successfully improve forecast reliability. The difference between these two methods is that CCR conserves resolution but inevitably dilutes the potentially predictable signal while MMEC is in the ideal case able to fully retain the predictable signal and to improve resolution. Therefore, MMEC is conceptually to be preferred, particularly since the effect of CCR depends on the length of the data record and on distributional assumptions. In reality, however, multimodels consist only of a finite number of participating single models, and the model errors are often correlated. Under such conditions, and depending on the skill metric applied, CCR-corrected single models can on average have comparable skill as multimodel ensembles, particularly when the potential model predictability is low. Using seasonal near-surface temperature and precipitation forecasts of three models of the Development of a European Multimodel Ensemble System for Seasonal-to-Interannual Prediction (DEMETER) dataset, it is shown that the conclusions drawn from the toy-model experiments hold equally in a real multimodel ensemble prediction system. All in all, it is not possible to make a general statement on whether CCR or MMEC is the better method. Rather it seems that optimum forecasts can be obtained by a combination of both methods, but only if first MMEC and then CCR is applied. The opposite order—first CCR, then MMEC—is shown to be of only little effect, at least in the context of seasonal forecasts.

Download Full-text

A Debiased Ranked Probability Skill Score to Evaluate Probabilistic Ensemble Forecasts with Small Ensemble Sizes

Journal of Climate ◽

10.1175/jcli3361.1 ◽

2005 ◽

Vol 18 (10) ◽

pp. 1513-1523 ◽

Cited By ~ 73

Author(s):

W. A. Müller ◽

C. Appenzeller ◽

F. J. Doblas-Reyes ◽

M. A. Liniger

Keyword(s):

Signal To Noise Ratio ◽

Skill Score ◽

Negative Bias ◽

Ensemble Size ◽

Signal To Noise ◽

Ensemble Forecasts ◽

Near Surface ◽

Skill Scores ◽

Small Ensemble ◽

Noise Ratio

Abstract The ranked probability skill score (RPSS) is a widely used measure to quantify the skill of ensemble forecasts. The underlying score is defined by the quadratic norm and is comparable to the mean squared error (mse) but it is applied in probability space. It is sensitive to the shape and the shift of the predicted probability distributions. However, the RPSS shows a negative bias for ensemble systems with small ensemble size, as recently shown. Here, two strategies are explored to tackle this flaw of the RPSS. First, the RPSS is examined for different norms L (RPSSL). It is shown that the RPSSL=1 based on the absolute rather than the squared difference between forecasted and observed cumulative probability distribution is unbiased; RPSSL defined with higher-order norms show a negative bias. However, the RPSSL=1 is not strictly proper in a statistical sense. A second approach is then investigated, which is based on the quadratic norm but with sampling errors in climatological probabilities considered in the reference forecasts. This technique is based on strictly proper scores and results in an unbiased skill score, which is denoted as the debiased ranked probability skill score (RPSSD) hereafter. Both newly defined skill scores are independent of the ensemble size, whereas the associated confidence intervals are a function of the ensemble size and the number of forecasts. The RPSSL=1 and the RPSSD are then applied to the winter mean [December–January–February (DJF)] near-surface temperature predictions of the ECMWF Seasonal Forecast System 2. The overall structures of the RPSSL=1 and the RPSSD are more consistent and largely independent of the ensemble size, unlike the RPSSL=2. Furthermore, the minimum ensemble size required to predict a climate anomaly given a known signal-to-noise ratio is determined by employing the new skill scores. For a hypothetical setup comparable to the ECMWF hindcast system (40 members and 15 hindcast years), statistically significant skill scores were only found for a signal-to-noise ratio larger than ∼0.3.

Download Full-text

A calibrated and consistent combination of probabilistic forecasts for the exceedance of several precipitation thresholds using neural networks.

10.5194/ems2021-220 ◽

2021 ◽

Author(s):

Peter Schaumann ◽

Reinhold Hess ◽

Martin Rempel ◽

Ulrich Blahak ◽

Volker Schmidt

Keyword(s):

Neural Networks ◽

Processing System ◽

Skill Score ◽

Horizontal Resolution ◽

Grid Size ◽

Ensemble Prediction ◽

Lead Times ◽

Ensemble Prediction System ◽

Probabilistic Forecasts ◽

Precipitation Thresholds

<p>In this talk we present a new statistical method for the seamless combination of two different ensemble precipitation forecasts (Nowcasting and NWP) using neural networks (NNs), see [1]. The method generates probabilistic forecasts for the exceedance of a set of predetermined thresholds (from 0.1mm up to 5mm). The aim of the combination model is to produce seamless and calibrated forecasts which outperform both input forecasts for all lead times and which are consistent regarding the considered thresholds. First, the hyper-parameters of the NNs are chosen according to a certain hyper-parameter optimization algorithm (not to be confused with the training of the NNs itself) on a 3-month dataset (dataset A). Then, the resulting NNs are tested via a rolling origin validation scheme on two 3-month datasets (datasets B & C) with different input forecasts each. Datasets A & B contain forecasts of DWD's RadVOR, a radar-based nowcasting system, and Ensemble-MOS, a post-processing system of NWP ensembles made by COSMO-DE-EPS, with a horizontal resolution of 20km, which is a predecessor of ICON-D2-EPS. Ensemble-MOS forecasts were provided for up to +6h, while RadVOR forecasts were available up to +2h. For dataset C, forecasts with a grid size of 2.2km are used from STEPS-DWD, a new implementation of the Short-term Ensemble Prediction System (STEPS) by&#160; DWD, and ICON-D2-EPS as a NWP ensemble system. Forecasts were made up to +6h. In both validation datasets (B & C), the forecasts show the well-known behavior that the nowcasting systems RadVOR & STEPS are superior for short lead times, while NWP forecasts (Ensemble-MOS & ICON-D2-EPS) outperform these systems for later lead times. Based on the comparison of several validation scores (bias, Brier skill score, reliability and reliability diagram) we can show that the combination is indeed calibrated, consistent and outperforms both input forecasts for all lead times. It should be noted that the combination works on dataset C, although the hyper-parameters were chosen based on dataset A, which contains different forecasts for a different grid size.<br><br>[1] P. Schaumann, R. Hess, M. Rempel, U. Blahak and V. Schmidt, A calibrated and consistent combination of probabilistic forecasts for the exceedance of several precipitation thresholds using neural networks. Weather and Forecasting (in print)</p>

Download Full-text

Fog Prediction from a Multimodel Mesoscale Ensemble Prediction System

Weather and Forecasting ◽

10.1175/2009waf2222289.1 ◽

2010 ◽

Vol 25 (1) ◽

pp. 303-322 ◽

Cited By ~ 72

Author(s):

Binbin Zhou ◽

Jun Du

Keyword(s):

Eastern China ◽

Ensemble Prediction ◽

Prediction System ◽

Model Ensemble ◽

Ensemble Size ◽

Detection Scheme ◽

Ensemble Prediction System ◽

Fog Forecasting ◽

Probabilistic Forecasts

Abstract A new multivariable-based diagnostic fog-forecasting method has been developed at NCEP. The selection of these variables, their thresholds, and the influences on fog forecasting are discussed. With the inclusion of the algorithm in the model postprocessor, the fog forecast can now be provided centrally as direct NWP model guidance. The method can be easily adapted to other NWP models. Currently, knowledge of how well fog forecasts based on operational NWP models perform is lacking. To verify the new method and assess fog forecast skill, as well as to account for forecast uncertainty, this fog-forecasting algorithm is applied to a multimodel-based Mesoscale Ensemble Prediction System (MEPS). MEPS consists of 10 members using two regional models [the NCEP Nonhydrostatic Mesoscale Model (NMM) version of the Weather Research and Forecasting (WRF) model and the NCAR Advanced Research version of WRF (ARW)] with 15-km horizontal resolution. Each model has five members (one control and four perturbed members) using the breeding technique to perturb the initial conditions and was run once per day out to 36 h over eastern China for seven months (February–September 2008). Both deterministic and probabilistic forecasts were produced based on individual members, a one-model ensemble, and two-model ensembles. A case study and statistical verification, using both deterministic and probabilistic measuring scores, were performed against fog observations from 13 cities in eastern China. The verification was focused on the 12- and 36-h forecasts. By applying the various approaches, including the new fog detection scheme, ensemble technique, multimodel approach, and the increase in ensemble size, the fog forecast accuracy was steadily and dramatically improved in each of the approaches: from basically no skill at all [equitable threat score (ETS) = 0.063] to a skill level equivalent to that of warm-season precipitation forecasts of the current NWP models (0.334). Specifically, 1) the multivariable-based fog diagnostic method has a much higher detection capability than the liquid water content (LWC)-only based approach. Reasons why the multivariable approach works better than the LWC-only method were also illustrated. 2) The ensemble-based forecasts are, in general, superior to a single control forecast measured both deterministically and probabilistically. The case study also demonstrates that the ensemble approach could provide more societal value than a single forecast to end users, especially for low-probability significant events like fog. Deterministically, a forecast close to the ensemble median is particularly helpful. 3) The reliability of probabilistic forecasts can be effectively improved by using a multimodel ensemble instead of a single-model ensemble. For a small ensemble such as the one in this study, the increase in ensemble size is also important in improving probabilistic forecasts, although this effect is expected to decrease with the increase in ensemble size.

Download Full-text

Case Dependence of Multiscale Interactions between Multisource Perturbations for Convection-Permitting Ensemble Forecasting during SCMREX

Monthly Weather Review ◽

10.1175/mwr-d-20-0316.1 ◽

2021 ◽

Author(s):

Xubin Zhang

Keyword(s):

Perturbation Methods ◽

Monsoon Rainfall ◽

Southern China ◽

Ensemble Forecasting ◽

Ensemble Prediction ◽

Synoptic Scale ◽

Ensemble Prediction System ◽

Control Member ◽

Probabilistic Forecasts ◽

Perturbation Energy

AbstractThis study examines the case dependence of the multiscale characteristics of initial condition (IC) and model physics (MO) perturbations and their interactions in a convection-permitting ensemble prediction system (CPEPS), focusing on the 12-h forecasts of precipitation perturbation energy. The case dependence of forecast performances of various ensemble configurations is also examined to gain guidance for CPEPS design. Heavy-rainfall cases over Southern China during the Southern China Monsoon Rainfall Experiment (SCMREX) in May 2014 were discriminated between the strongly and weakly forced events in terms of synoptic-scale forcing, each of which included 10 cases. In the cases with weaker forcing, MO perturbations showed larger influences while the enhancements of convective activities relative to the control member due to IC perturbations were less evident, leading to smaller dispersion reduction due to adding MO perturbations to IC perturbations. Such dispersion reduction was more sensitive to IC and MO perturbation methods in the weakly and strongly forced cases, respectively. The dispersion reduction improved the probabilistic forecasts of precipitation, with more evident improvements in the cases with weaker forcing. To improve the benefits of dispersion reduction in forecasts, it is instructive to elaborately consider the case dependence of dispersion reduction, especially the various sensitivities of dispersion reduction to different-source perturbation methods in various cases, in CPEPS design.

Download Full-text

Verification of Ensemble-Based Uncertainty Circles around Tropical Cyclone Track Forecasts

Weather and Forecasting ◽

10.1175/waf-d-11-00007.1 ◽

2011 ◽

Vol 26 (5) ◽

pp. 664-676 ◽

Cited By ~ 21

Author(s):

Thierry Dupont ◽

Matthieu Plu ◽

Philippe Caroff ◽

Ghislain Faure

Keyword(s):

Tropical Cyclone ◽

Large Error ◽

Ensemble Prediction ◽

Probabilistic Forecasting ◽

Cyclone Track ◽

Ensemble Prediction System ◽

Weather Forecasts ◽

Probabilistic Forecasts ◽

Medium Range ◽

Uncertainty Information

Abstract Several tropical cyclone forecasting centers issue uncertainty information with regard to their official track forecasts, generally using the climatological distribution of position error. However, such methods are not able to convey information that depends on the situation. The purpose of the present study is to assess the skill of the Ensemble Prediction System (EPS) from the European Centre for Medium-Range Weather Forecasts (ECMWF) at measuring the uncertainty of up to 3-day track forecasts issued by the Regional Specialized Meteorological Centre (RSMC) La Réunion in the southwestern Indian Ocean. The dispersion of cyclone positions in the EPS is extracted and translated at the RSMC forecast position. The verification relies on existing methods for probabilistic forecasts that are presently adapted to a cyclone-position metric. First, the probability distribution of forecast positions is compared to the climatological distribution using Brier scores. The probabilistic forecasts have better scores than the climatology, particularly after applying a simple calibration scheme. Second, uncertainty circles are built by fixing the probability at 75%. Their skill at detecting small and large error values is assessed. The circles have some skill for large errors up to the 3-day forecast (and maybe after); but the detection of small radii is skillful only up to 2-day forecasts. The applied methodology may be used to assess and to compare the skill of different probabilistic forecasting systems of cyclone position.

Download Full-text