Sampling distributions of the Brier score and Brier skill score under serial dependence

2010 ◽  
Vol 136 (653) ◽  
pp. 2109-2118 ◽  
Author(s):  
D.S. Wilks
2008 ◽  
Vol 23 (5) ◽  
pp. 992-1006 ◽  
Author(s):  
A. Allen Bradley ◽  
Stuart S. Schwartz ◽  
Tempei Hashino

Abstract For probability forecasts, the Brier score and Brier skill score are commonly used verification measures of forecast accuracy and skill. Using sampling theory, analytical expressions are derived to estimate their sampling uncertainties. The Brier score is an unbiased estimator of the accuracy, and an exact expression defines its sampling variance. The Brier skill score (with climatology as a reference forecast) is a biased estimator, and approximations are needed to estimate its bias and sampling variance. The uncertainty estimators depend only on the moments of the forecasts and observations, so it is easy to routinely compute them at the same time as the Brier score and skill score. The resulting uncertainty estimates can be used to construct error bars or confidence intervals for the verification measures, or perform hypothesis testing. Monte Carlo experiments using synthetic forecasting examples illustrate the performance of the expressions. In general, the estimates provide very reliable information on uncertainty. However, the quality of an estimate depends on both the sample size and the occurrence frequency of the forecast event. The examples also illustrate that with infrequently occurring events, verification sample sizes of a few hundred forecast–observation pairs are needed to establish that a forecast is skillful because of the large uncertainties that exist.


2012 ◽  
Vol 8 (2) ◽  
pp. 953-986 ◽  
Author(s):  
B. Kurnik ◽  
L. Kajfež-Bogataj ◽  
A. Ceglar

Abstract. We corrected monthly precipitation from 8 regional climate models using statistical bias correction. All models were corrected according to observations and parameters for bias correction were obtained for all models separately in every grid cells over European domain, using data between 1961 and 1990. Bias correction was validated in the period between 1991 and 2010 with RMSE, Brier score and Brier skill score. The results are encouraging, as mean and extremes were effectively corrected. After applying correction, large biases over Alps, at the East Adriatic cost, west coast of Norway and at the east end of the domain were removed. RMSE of corrected precipitation was lower than RMSE of simulated in 85% of European area and correction for all models failed in only 1.5% of European area. Also extremes were effectively corrected. According to the Brier skill score the probability for dry months was corrected in more than 52% of the European area and heavy precipitation events were corrected in almost 90% of the area. All validation measures suggest the correction of monthly precipitation was successful and therefore we can argue that the corrected precipitation fields will improve results of the climate impact models.


2007 ◽  
Vol 22 (6) ◽  
pp. 1287-1303 ◽  
Author(s):  
Huiling Yuan ◽  
Xiaogang Gao ◽  
Steven L. Mullen ◽  
Soroosh Sorooshian ◽  
Jun Du ◽  
...  

Abstract A feed-forward neural network is configured to calibrate the bias of a high-resolution probabilistic quantitative precipitation forecast (PQPF) produced by a 12-km version of the NCEP Regional Spectral Model (RSM) ensemble forecast system. Twice-daily forecasts during the 2002–2003 cool season (1 November–31 March, inclusive) are run over four U.S. Geological Survey (USGS) hydrologic unit regions of the southwest United States. Calibration is performed via a cross-validation procedure, where four months are used for training and the excluded month is used for testing. The PQPFs before and after the calibration over a hydrological unit region are evaluated by comparing the joint probability distribution of forecasts and observations. Verification is performed on the 4-km stage IV grid, which is used as “truth.” The calibration procedure improves the Brier score (BrS), conditional bias (reliability) and forecast skill, such as the Brier skill score (BrSS) and the ranked probability skill score (RPSS), relative to the sample frequency for all geographic regions and most precipitation thresholds. However, the procedure degrades the resolution of the PQPFs by systematically producing more forecasts with low nonzero forecast probabilities that drive the forecast distribution closer to the climatology of the training sample. The problem of degrading the resolution is most severe over the Colorado River basin and the Great Basin for relatively high precipitation thresholds where the sample of observed events is relatively small.


2009 ◽  
Vol 10 (3) ◽  
pp. 807-819 ◽  
Author(s):  
F. Pappenberger ◽  
A. Ghelli ◽  
R. Buizza ◽  
K. Bódis

Abstract A methodology for evaluating ensemble forecasts, taking into account observational uncertainties for catchment-based precipitation averages, is introduced. Probability distributions for mean catchment precipitation are derived with the Generalized Likelihood Uncertainty Estimation (GLUE) method. The observation uncertainty includes errors in the measurements, uncertainty as a result of the inhomogeneities in the rain gauge network, and representativeness errors introduced by the interpolation methods. The closeness of the forecast probability distribution to the observed fields is measured using the Brier skill score, rank histograms, relative entropy, and the ratio between the ensemble spread and the error of the ensemble-median forecast (spread–error ratio). Four different methods have been used to interpolate observations on the catchment regions. Results from a 43-day period (20 July–31 August 2002) show little sensitivity to the interpolation method used. The rank histograms and the relative entropy better show the effect of introducing observation uncertainty, although this effect on the Brier skill score and the spread–error ratio is not very large. The case study indicates that overall observation uncertainty should be taken into account when evaluating forecast skill.


2008 ◽  
Vol 136 (4) ◽  
pp. 1505-1510 ◽  
Author(s):  
Ian T. Jolliffe ◽  
David B. Stephenson

Abstract Verification is an important part of any forecasting system. It is usually achieved by computing the value of some measure or score that indicates how good the forecasts are. Many possible verification measures have been proposed, and to choose between them a number of desirable properties have been defined. For probability forecasts of a binary event, two of the best known of these properties are propriety and equitability. A proof that the two properties are incompatible for a wide class of verification measures is given in this paper, after briefly reviewing the two properties and some recent attempts to improve properties for the well-known Brier skill score.


2015 ◽  
Vol 143 (2) ◽  
pp. 471-490 ◽  
Author(s):  
Paul J. Roebber

Abstract An ensemble forecast method using evolutionary programming, including various forms of genetic exchange, disease, mutation, and the training of solutions within ecological niches, is presented. A 2344-member ensemble generated in this way is tested for 60-h minimum temperature forecasts for Chicago, Illinois. The ensemble forecasts are superior in both ensemble average root-mean-square error and Brier skill score to those obtained from a 21-member operational ensemble model output statistics (MOS) forecast. While both ensembles are underdispersive, spread calibration produces greater gains in probabilistic skill for the evolutionary program ensemble than for the MOS ensemble. When a Bayesian model combination calibration is used, the skill advantage for the evolutionary program ensemble relative to the MOS ensemble increases for root-mean-square error, but decreases for Brier skill score. Further improvement in root-mean-square error is obtained when the raw evolutionary program and MOS forecasts are pooled, and a new Bayesian model combination ensemble is produced. Future extensions to the method are discussed, including those capable of producing more complex forms, those involving 1000-fold increases in training populations, and adaptive methods.


2010 ◽  
Vol 138 (9) ◽  
pp. 3387-3399 ◽  
Author(s):  
Steven V. Weijs ◽  
Ronald van Nooijen ◽  
Nick van de Giesen

Abstract This paper presents a score that can be used for evaluating probabilistic forecasts of multicategory events. The score is a reinterpretation of the logarithmic score or ignorance score, now formulated as the relative entropy or Kullback–Leibler divergence of the forecast distribution from the observation distribution. Using the information–theoretical concepts of entropy and relative entropy, a decomposition into three components is presented, analogous to the classic decomposition of the Brier score. The information–theoretical twins of the components uncertainty, resolution, and reliability provide diagnostic information about the quality of forecasts. The overall score measures the information conveyed by the forecast. As was shown recently, information theory provides a sound framework for forecast verification. The new decomposition, which has proven to be very useful for the Brier score and is widely used, can help acceptance of the logarithmic score in meteorology.


Sign in / Sign up

Export Citation Format

Share Document