scholarly journals Sampling Uncertainty and Confidence Intervals for the Brier Score and Brier Skill Score

2008 ◽  
Vol 23 (5) ◽  
pp. 992-1006 ◽  
Author(s):  
A. Allen Bradley ◽  
Stuart S. Schwartz ◽  
Tempei Hashino

Abstract For probability forecasts, the Brier score and Brier skill score are commonly used verification measures of forecast accuracy and skill. Using sampling theory, analytical expressions are derived to estimate their sampling uncertainties. The Brier score is an unbiased estimator of the accuracy, and an exact expression defines its sampling variance. The Brier skill score (with climatology as a reference forecast) is a biased estimator, and approximations are needed to estimate its bias and sampling variance. The uncertainty estimators depend only on the moments of the forecasts and observations, so it is easy to routinely compute them at the same time as the Brier score and skill score. The resulting uncertainty estimates can be used to construct error bars or confidence intervals for the verification measures, or perform hypothesis testing. Monte Carlo experiments using synthetic forecasting examples illustrate the performance of the expressions. In general, the estimates provide very reliable information on uncertainty. However, the quality of an estimate depends on both the sample size and the occurrence frequency of the forecast event. The examples also illustrate that with infrequently occurring events, verification sample sizes of a few hundred forecast–observation pairs are needed to establish that a forecast is skillful because of the large uncertainties that exist.

2012 ◽  
Vol 8 (2) ◽  
pp. 953-986 ◽  
Author(s):  
B. Kurnik ◽  
L. Kajfež-Bogataj ◽  
A. Ceglar

Abstract. We corrected monthly precipitation from 8 regional climate models using statistical bias correction. All models were corrected according to observations and parameters for bias correction were obtained for all models separately in every grid cells over European domain, using data between 1961 and 1990. Bias correction was validated in the period between 1991 and 2010 with RMSE, Brier score and Brier skill score. The results are encouraging, as mean and extremes were effectively corrected. After applying correction, large biases over Alps, at the East Adriatic cost, west coast of Norway and at the east end of the domain were removed. RMSE of corrected precipitation was lower than RMSE of simulated in 85% of European area and correction for all models failed in only 1.5% of European area. Also extremes were effectively corrected. According to the Brier skill score the probability for dry months was corrected in more than 52% of the European area and heavy precipitation events were corrected in almost 90% of the area. All validation measures suggest the correction of monthly precipitation was successful and therefore we can argue that the corrected precipitation fields will improve results of the climate impact models.


2019 ◽  
Vol 34 (6) ◽  
pp. 1965-1977 ◽  
Author(s):  
Shouwen Zhang ◽  
Hua Jiang ◽  
Hui Wang

Abstract Based on historical forecasts of four individual forecasting systems, we conducted multimodel ensembles (MME) to predict the sea surface temperature anomaly (SSTA) variability and assessed these methods from a deterministic and probabilistic point of view. To investigate the advantages and drawbacks of different deterministic MME methods, we used simple averaged MME with equal weighs (SCM) and the stepwise pattern projection method (SPPM). We measured the probabilistic forecast accuracy by Brier skill score (BSS) combined with its two components: reliability (Brel) and resolution (Bres). The results indicated that SCM showed a high predictability in the tropical Pacific Ocean, with a correlation exceeding 0.8 with a 6-month lead time. In general, the SCM outperformed the SPPM in the tropics, while the SPPM tend to show some positive effect on the correction when at long lead times. Corrections occurred for the spring predictability barrier of ENSO, in particular for improvements when the correlation was low or the RMSE was large using the SCM method. These qualitative results are not susceptible to the selection of the hindcast periods, it is as a rule rather by chance of these individual systems. Performance of our probabilistic MME was better than the Climate Forecast System version2 (CFSv2) forecasts in forecasting COLD, NEUTRAL, and WARM SSTA categories for most regions, mainly due to the contribution of Brel, indicating more adequate ensemble construction strategies of the MME system superior to the CFSv2.


2005 ◽  
Vol 20 (1) ◽  
pp. 82-100 ◽  
Author(s):  
A. J. M. Jacobs ◽  
N. Maat

Abstract Numerical guidance methods for decision making support of aviation meteorological forecasters are presented. The methods have been developed to enhance the usefulness of numerical weather prediction (NWP) model data and local and upstream observations in the production of terminal aerodrome forecasts (TAFs) and trend-type forecasts (TRENDs) for airports. In this paper two newly developed methods are described and it is shown how they are used to derive numerical guidance products for aviation. The first is a combination of statistical and physical postprocessing of NWP model data and in situ observations. This method is used to derive forecasts for all aviation-related meteorological parameters at the airport. The second is a high-resolution wind transformation method, a technique used to derive local wind at airports from grid-box-averaged NWP model winds. For operational use of the numerical guidance products encoding software is provided for automatic production of an alphanumeric TAF and TREND code. A graphical user interface with an integrated code editor enables the forecaster to modify the suggested automatic codes. For aviation, the most important parameters in the numerical guidance are visibility and cloud-base height. Both have been subjected to a statistical verification analysis, together with their automatically produced codes. The results in terms of skill score are compared to the skill of the forecasters’ TAF and TREND code. The statistical measures suggest that the guidance has the best skill at lead times of +4 h and more. For the short term, mainly trend-type forecasts, the persistence forecast based on recent observations is difficult to beat. Verification has also shown that the wind transformation method, which has been applied to generate 10-m winds at Amsterdam Airport Schiphol, reduces the mean error in the (grid box averaged) NWP model wind significantly. Among the potential benefits of these numerical guidance methods is increasing forecast accuracy. As a result, the related numerical guidance products and encoding software have been integrated in the operational environment for the production of TAFs and TRENDs.


2007 ◽  
Vol 22 (6) ◽  
pp. 1287-1303 ◽  
Author(s):  
Huiling Yuan ◽  
Xiaogang Gao ◽  
Steven L. Mullen ◽  
Soroosh Sorooshian ◽  
Jun Du ◽  
...  

Abstract A feed-forward neural network is configured to calibrate the bias of a high-resolution probabilistic quantitative precipitation forecast (PQPF) produced by a 12-km version of the NCEP Regional Spectral Model (RSM) ensemble forecast system. Twice-daily forecasts during the 2002–2003 cool season (1 November–31 March, inclusive) are run over four U.S. Geological Survey (USGS) hydrologic unit regions of the southwest United States. Calibration is performed via a cross-validation procedure, where four months are used for training and the excluded month is used for testing. The PQPFs before and after the calibration over a hydrological unit region are evaluated by comparing the joint probability distribution of forecasts and observations. Verification is performed on the 4-km stage IV grid, which is used as “truth.” The calibration procedure improves the Brier score (BrS), conditional bias (reliability) and forecast skill, such as the Brier skill score (BrSS) and the ranked probability skill score (RPSS), relative to the sample frequency for all geographic regions and most precipitation thresholds. However, the procedure degrades the resolution of the PQPFs by systematically producing more forecasts with low nonzero forecast probabilities that drive the forecast distribution closer to the climatology of the training sample. The problem of degrading the resolution is most severe over the Colorado River basin and the Great Basin for relatively high precipitation thresholds where the sample of observed events is relatively small.


Water ◽  
2020 ◽  
Vol 12 (9) ◽  
pp. 2631
Author(s):  
Xinchi Chen ◽  
Xiaohong Chen ◽  
Dong Huang ◽  
Huamei Liu

Precipitation is one of the most important factors affecting the accuracy and uncertainty of hydrological forecasting. Considerable progress has been made in numerical weather prediction after decades of development, but the forecast products still cannot be used directly for hydrological forecasting. This study used ensemble pro-processor (EPP) to post-process the Global Ensemble Forecast System (GEFS) and Climate Forecast System version 2 (CFSv2) with four designed schemes, and then integrated them to investigate the forecast accuracy in longer time scales based on the best scheme. Many indices such as correlation coefficient, Nash efficiency coefficient, rank histogram, and continuous ranked probability skill score were used to evaluate the results in different aspects. The results show that EPP can improve the accuracy of raw forecast significantly, and the scheme considering cumulative forecast precipitation is better than that only considers single-day forecast. Moreover, the scheme that considers some observed precipitation would help to improve the accuracy and reduce the uncertainty. In terms of medium- and long-term forecasts, the integrated forecast based on GEFS and CFSv2 after post-processed would be better than CFSv2 significantly. The results of this study would be a very important demonstration to remove the deviation of ensemble forecast and improve the accuracy of hydrological forecasting in different time scales.


1999 ◽  
Vol 25 (6) ◽  
pp. 803-828 ◽  
Author(s):  
J. Bryan Fuller ◽  
Kim Hester

An extensive comparison of the sample-weighted method (Hunter & Schmidt, 1990), and a newer unweighted method (Osburn & Callender, 1992) of meta-analysis is presented using actual data. Several of the advantages of the unweighted method predicted by Osburn and Callendar’s simulation research did not always hold in actual application. Specifically, the unweighted method did not always produce larger estimates of observed variance, credibility intervals, and confidence intervals than the sample-weighted method when large sample outliers are present. Also, Osburn and Callender’s research on mean sampling variance formulae did not generalize to meta-analysis using the average correlation estimator to measure sample error variance. Finally, results show that while both methods may generate similar parameter and variance estimates in primary meta-analysis, they may lead researchers to reach different substantive conclusions in the analysis of moderators.


2009 ◽  
Vol 10 (3) ◽  
pp. 807-819 ◽  
Author(s):  
F. Pappenberger ◽  
A. Ghelli ◽  
R. Buizza ◽  
K. Bódis

Abstract A methodology for evaluating ensemble forecasts, taking into account observational uncertainties for catchment-based precipitation averages, is introduced. Probability distributions for mean catchment precipitation are derived with the Generalized Likelihood Uncertainty Estimation (GLUE) method. The observation uncertainty includes errors in the measurements, uncertainty as a result of the inhomogeneities in the rain gauge network, and representativeness errors introduced by the interpolation methods. The closeness of the forecast probability distribution to the observed fields is measured using the Brier skill score, rank histograms, relative entropy, and the ratio between the ensemble spread and the error of the ensemble-median forecast (spread–error ratio). Four different methods have been used to interpolate observations on the catchment regions. Results from a 43-day period (20 July–31 August 2002) show little sensitivity to the interpolation method used. The rank histograms and the relative entropy better show the effect of introducing observation uncertainty, although this effect on the Brier skill score and the spread–error ratio is not very large. The case study indicates that overall observation uncertainty should be taken into account when evaluating forecast skill.


Sign in / Sign up

Export Citation Format

Share Document