Calibrated Surface Temperature Forecasts from the Canadian Ensemble Prediction System Using Bayesian Model Averaging

2007 ◽  
Vol 135 (4) ◽  
pp. 1364-1385 ◽  
Author(s):  
Laurence J. Wilson ◽  
Stephane Beauregard ◽  
Adrian E. Raftery ◽  
Richard Verret

Abstract Bayesian model averaging (BMA) has recently been proposed as a way of correcting underdispersion in ensemble forecasts. BMA is a standard statistical procedure for combining predictive distributions from different sources. The output of BMA is a probability density function (pdf), which is a weighted average of pdfs centered on the bias-corrected forecasts. The BMA weights reflect the relative contributions of the component models to the predictive skill over a training sample. The variance of the BMA pdf is made up of two components, the between-model variance, and the within-model error variance, both estimated from the training sample. This paper describes the results of experiments with BMA to calibrate surface temperature forecasts from the 16-member Canadian ensemble system. Using one year of ensemble forecasts, BMA was applied for different training periods ranging from 25 to 80 days. The method was trained on the most recent forecast period, then applied to the next day’s forecasts as an independent sample. This process was repeated through the year, and forecast quality was evaluated using rank histograms, the continuous rank probability score, and the continuous rank probability skill score. An examination of the BMA weights provided a useful comparative evaluation of the component models, both for the ensemble itself and for the ensemble augmented with the unperturbed control forecast and the higher-resolution deterministic forecast. Training periods around 40 days provided a good calibration of the ensemble dispersion. Both full regression and simple bias-correction methods worked well to correct the bias, except that the full regression failed to completely remove seasonal trend biases in spring and fall. Simple correction of the bias was sufficient to produce positive forecast skill out to 10 days with respect to climatology, which was improved by the BMA. The addition of the control forecast and the full-resolution model forecast to the ensemble produced modest improvement in the forecasts for ranges out to about 7 days. Finally, BMA produced significantly narrower 90% prediction intervals compared to a simple Gaussian bias correction, while achieving similar overall accuracy.

2010 ◽  
Vol 138 (11) ◽  
pp. 4199-4211 ◽  
Author(s):  
Maurice J. Schmeits ◽  
Kees J. Kok

Abstract Using a 20-yr ECMWF ensemble reforecast dataset of total precipitation and a 20-yr dataset of a dense precipitation observation network in the Netherlands, a comparison is made between the raw ensemble output, Bayesian model averaging (BMA), and extended logistic regression (LR). A previous study indicated that BMA and conventional LR are successful in calibrating multimodel ensemble forecasts of precipitation for a single forecast projection. However, a more elaborate comparison between these methods has not yet been made. This study compares the raw ensemble output, BMA, and extended LR for single-model ensemble reforecasts of precipitation; namely, from the ECMWF ensemble prediction system (EPS). The raw EPS output turns out to be generally well calibrated up to 6 forecast days, if compared to the area-mean 24-h precipitation sum. Surprisingly, BMA is less skillful than the raw EPS output from forecast day 3 onward. This is due to the bias correction in BMA, which applies model output statistics to individual ensemble members. As a result, the spread of the bias-corrected ensemble members is decreased, especially for the longer forecast projections. Here, an additive bias correction is applied instead and the equation for the probability of precipitation in BMA is also changed. These modifications to BMA are referred to as “modified BMA” and lead to a significant improvement in the skill of BMA for the longer projections. If the area-maximum 24-h precipitation sum is used as a predictand, both modified BMA and extended LR improve the raw EPS output significantly for the first 5 forecast days. However, the difference in skill between modified BMA and extended LR does not seem to be statistically significant. Yet, extended LR might be preferred, because incorporating predictors that are different from the predictand is straightforward, in contrast to BMA.


2021 ◽  
Vol 893 (1) ◽  
pp. 012028
Author(s):  
Robi Muharsyah ◽  
Dian Nur Ratri ◽  
Damiana Fitria Kussatiti

Abstract Prediction of Sea Surface Temperature (SST) in Niño3.4 region (170 W - 120 W; 5S - 5N) is important as a valuable indicator to identify El Niño Southern Oscillation (ENSO), i.e., El Niño, La Niña, and Neutral condition for coming months. More accurate prediction Niño3.4 SST can be used to determine the response of ENSO phenomenon to rainfall over Indonesia region. SST predictions are routinely released by meteorological institutions such as the European Center for Medium-Range Weather Forecasts (ECMWF). However, SST predictions from the direct output (RAW) of global models such as ECMWF seasonal forecast is suffering from bias that affects the poor quality of SST predictions. As a result, it also increases the potential errors in predicting the ENSO events. This study uses SST from the output Ensemble Prediction System (EPS) of ECMWF seasonal forecast, namely SEAS5. SEAS5 SST is downloaded from The Copernicus Climate Change Service (C3S) for period 1993-2020. One value representing SST over Niño3.4 region is calculated for each lead-time (LT), LT0-LT6. Bayesian Model Averaging (BMA) is selected as one of the post-processing methods to improve the prediction quality of SEAS5-RAW. The advantage of BMA over other post-processing methods is its ability to quantify the uncertainty in EPS, which is expressed as probability density function (PDF) predictive. It was found that the BMA calibration process reaches optimal performance using 160 months training window. The result show, prediction quality of Niño3.4 SST of BMA output is superior to SEAS5-RAW, especially for LT0, LT1, and LT2. In term deterministic prediction, BMA shows a lower Root Mean Square Error (RMSE), higher Proportion of Correct (PC). In term probabilistic prediction, the error rate of BMA, which is showed by the Brier Score is lower than RAW. Moreover, BMA shows a good ability to discriminating ENSO events which indicates by AUC ROC close to a perfect score.


2011 ◽  
Vol 29 (7) ◽  
pp. 1295-1303 ◽  
Author(s):  
I. Soltanzadeh ◽  
M. Azadi ◽  
G. A. Vakili

Abstract. Using Bayesian Model Averaging (BMA), an attempt was made to obtain calibrated probabilistic numerical forecasts of 2-m temperature over Iran. The ensemble employs three limited area models (WRF, MM5 and HRM), with WRF used with five different configurations. Initial and boundary conditions for MM5 and WRF are obtained from the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS) and for HRM the initial and boundary conditions come from analysis of Global Model Europe (GME) of the German Weather Service. The resulting ensemble of seven members was run for a period of 6 months (from December 2008 to May 2009) over Iran. The 48-h raw ensemble outputs were calibrated using BMA technique for 120 days using a 40 days training sample of forecasts and relative verification data. The calibrated probabilistic forecasts were assessed using rank histogram and attribute diagrams. Results showed that application of BMA improved the reliability of the raw ensemble. Using the weighted ensemble mean forecast as a deterministic forecast it was found that the deterministic-style BMA forecasts performed usually better than the best member's deterministic forecast.


2015 ◽  
Vol 143 (9) ◽  
pp. 3628-3641 ◽  
Author(s):  
Jiangshan Zhu ◽  
Fanyou Kong ◽  
Lingkun Ran ◽  
Hengchi Lei

Abstract To study the impact of training sample heterogeneity on the performance of Bayesian model averaging (BMA), two BMA experiments were performed on probabilistic quantitative precipitation forecasts (PQPFs) in the northern China region in July and August of 2010 generated from an 11-member short-range ensemble forecasting system. One experiment, as in many conventional BMA studies, used an overall training sample that consisted of all available cases in the training period, while the second experiment used stratified sampling BMA by first dividing all available training cases into subsamples according to their ensemble spread, and then performing BMA on each subsample. The results showed that ensemble spread is a good criterion to divide ensemble precipitation cases into subsamples, and that the subsamples have different statistical properties. Pooling the subsamples together forms a heterogeneous overall sample. Conventional BMA is incapable of interpreting heterogeneous samples, and produces unreliable PQPF. It underestimates the forecast probability at high-threshold PQPF and local rainfall maxima in BMA percentile forecasts. BMA with stratified sampling according to ensemble spread overcomes the problem reasonably well, producing sharper predictive probability density functions and BMA percentile forecasts, and more reliable PQPF than the conventional BMA approach. The continuous ranked probability scores, Brier skill scores, and reliability diagrams of the two BMA experiments were examined for all available forecast days, along with a logistic regression experiment. Stratified sampling BMA outperformed the raw ensemble and conventional BMA in all verifications, and also showed better skill than logistic regression in low-threshold forecasts.


2012 ◽  
Vol 20 (3) ◽  
pp. 271-291 ◽  
Author(s):  
Jacob M. Montgomery ◽  
Florian M. Hollenbach ◽  
Michael D. Ward

We present ensemble Bayesian model averaging (EBMA) and illustrate its ability to aid scholars in the social sciences to make more accurate forecasts of future events. In essence, EBMA improves prediction by pooling information from multiple forecast models to generate ensemble predictions similar to a weighted average of component forecasts. The weight assigned to each forecast is calibrated via its performance in some validation period. The aim is not to choose some “best” model, but rather to incorporate the insights and knowledge implicit in various forecasting efforts via statistical postprocessing. After presenting the method, we show that EBMA increases the accuracy of out-of-sample forecasts relative to component models in three applied examples: predicting the occurrence of insurgencies around the Pacific Rim, forecasting vote shares in U.S. presidential elections, and predicting the votes of U.S. Supreme Court Justices.


2018 ◽  
Vol 33 (2) ◽  
pp. 369-388 ◽  
Author(s):  
Peter Vogel ◽  
Peter Knippertz ◽  
Andreas H. Fink ◽  
Andreas Schlueter ◽  
Tilmann Gneiting

AbstractAccumulated precipitation forecasts are of high socioeconomic importance for agriculturally dominated societies in northern tropical Africa. In this study, the performance of nine operational global ensemble prediction systems (EPSs) is analyzed relative to climatology-based forecasts for 1–5-day accumulated precipitation based on the monsoon seasons during 2007–14 for three regions within northern tropical Africa. To assess the full potential of raw ensemble forecasts across spatial scales, state-of-the-art statistical postprocessing methods were applied in the form of Bayesian model averaging (BMA) and ensemble model output statistics (EMOS), and results were verified against station and spatially aggregated, satellite-based gridded observations. Raw ensemble forecasts are uncalibrated and unreliable, and often underperform relative to climatology, independently of region, accumulation time, monsoon season, and ensemble. The differences between raw ensemble and climatological forecasts are large and partly stem from poor prediction for low precipitation amounts. BMA and EMOS postprocessed forecasts are calibrated, reliable, and strongly improve on the raw ensembles but, somewhat disappointingly, typically do not outperform climatology. Most EPSs exhibit slight improvements over the period 2007–14, but overall they have little added value compared to climatology. The suspicion is that parameterization of convection is a potential cause for the sobering lack of ensemble forecast skill in a region dominated by mesoscale convective systems.


Sign in / Sign up

Export Citation Format

Share Document