Ensemble Averaging and the Curse of Dimensionality

When comparing climate models to observations, it is often observed that the mean over many models has smaller errors than most or all of the individual models. This paper will show that a general consequence of the nonintuitive geometric properties of high-dimensional spaces is that the ensemble mean often outperforms the individual ensemble members. This also explains why the ensemble mean often has an error that is 30% smaller than the median error of the individual ensemble members. The only assumption that needs to be made is that the observations and the models are independently drawn from the same distribution. An important and relevant property of high-dimensional spaces is that independent random vectors are almost always orthogonal. Furthermore, while the lengths of random vectors are large and almost equal, the ensemble mean is special, as it is located near the otherwise vacant center. The theory is first explained by an analysis of Gaussian- and uniformly distributed vectors in high-dimensional spaces. A subset of 17 models from the CMIP5 multimodel ensemble is then used to demonstrate the validity and robustness of the theory in realistic settings.

Download Full-text

On the Filtering Properties of Ensemble Averaging for Storm-Scale Precipitation Forecasts

Monthly Weather Review ◽

10.1175/mwr-d-13-00134.1 ◽

2014 ◽

Vol 142 (3) ◽

pp. 1093-1105 ◽

Cited By ~ 39

Author(s):

Madalina Surcel ◽

Isztar Zawadzki ◽

M. K. Yau

Keyword(s):

Spectral Structure ◽

Probability Matching ◽

Ensemble Averaging ◽

Precipitation Field ◽

Small Scales ◽

Ensemble Mean ◽

The Mean ◽

The Individual ◽

Filtering Effect ◽

Probability Density Function Pdf

Abstract The mean (ENM) of an ensemble of precipitation forecasts is generally more skillful than any of the members as verified against observations. A major reason is that the averaging filters out nonpredictable features on which the members disagree. Previous research showed that the nonpredictable features occur at small scales, in both numerical forecasts and Lagrangian persistence nowcasts. Hence, it is plausible that the unpredictable features filtered through ensemble averaging would also occur at small scales. In this study, the exact range of scales affected by averaging is determined by comparing the statistical properties of precipitation fields between the ENM and the individual members from a Storm-Scale Ensemble Forecasting (SSEF) system run during NOAA’s 2008 Hazardous Weather Testbed (HWT) Spring Experiment. The filtering effect of ensemble averaging results in a low-intensity bias for the ENM forecasts. It has been previously proposed to correct the ENM forecasts by recalibrating the intensities in the ENM using the probability density function (PDF) of rainfall values from the ensemble members. This procedure, probability matching (PM), leads to a new ensemble mean, the probability matched mean (PMM). Past studies have shown that the PMM appears more realistic and yields better skill as evaluated using traditional scores. However, the authors demonstrate here that despite the PMM having the same PDF of rainfall intensities as the ensemble members, the spectral structure and the spatial distribution of the precipitation field differs from that of the members. It is the lesser variability of the PMM fields at small scales that causes the better scores of the PMM relative to the ensemble members.

Download Full-text

An efficient training scheme for supermodels

Earth System Dynamics ◽

10.5194/esd-8-429-2017 ◽

2017 ◽

Vol 8 (2) ◽

pp. 429-438 ◽

Cited By ~ 5

Author(s):

Francine J. Schevenhoven ◽

Frank M. Selten

Keyword(s):

Climate Models ◽

State Of The Art ◽

Atmospheric Model ◽

High Dimensional ◽

Model Errors ◽

Global Atmospheric Model ◽

Training Scheme ◽

Skill Scores ◽

Weather And Climate ◽

The Individual

Abstract. Weather and climate models have improved steadily over time as witnessed by objective skill scores, although significant model errors remain. Given these imperfect models, predictions might be improved by combining them dynamically into a so-called supermodel. In this paper a new training scheme to construct such a supermodel is explored using a technique called cross pollination in time (CPT). In the CPT approach the models exchange states during the prediction. The number of possible predictions grows quickly with time, and a strategy to retain only a small number of predictions, called pruning, needs to be developed. The method is explored using low-order dynamical systems and applied to a global atmospheric model. The results indicate that the CPT training is efficient and leads to a supermodel with improved forecast quality as compared to the individual models. Due to its computational efficiency, the technique is suited for application to state-of-the art high-dimensional weather and climate models.

Download Full-text

Analysis of Ensemble Mean Forecasts: The Blessings of High Dimensionality

Monthly Weather Review ◽

10.1175/mwr-d-18-0211.1 ◽

2019 ◽

Vol 147 (5) ◽

pp. 1699-1712 ◽

Cited By ~ 3

Author(s):

Bo Christiansen

Keyword(s):

High Dimensional ◽

Lead Times ◽

Ensemble Forecasts ◽

Analytical Results ◽

Weather And Climate ◽

Ensemble Mean ◽

The Individual ◽

Almost All ◽

Community Standard ◽

Better Than

Abstract In weather and climate sciences ensemble forecasts have become an acknowledged community standard. It is often found that the ensemble mean not only has a low error relative to the typical error of the ensemble members but also that it outperforms all the individual ensemble members. We analyze ensemble simulations based on a simple statistical model that allows for bias and that has different variances for observations and the model ensemble. Using generic simplifying geometric properties of high-dimensional spaces we obtain analytical results for the error of the ensemble mean. These results include a closed form for the rank of the ensemble mean among the ensemble members and depend on two quantities: the ensemble variance and the bias both normalized with the variance of observations. The analytical results are used to analyze the GEFS reforecast where the variances and bias depend on lead time. For intermediate lead times between 20 and 100 h the two terms are both around 0.5 and the ensemble mean is only slightly better than individual ensemble members. For lead times larger than 240 h the variance term is close to 1 and the bias term is near 0.5. For these lead times the ensemble mean outperforms almost all individual ensemble members and its relative error comes close to −30%. These results are in excellent agreement with the theory. The simplifying properties of high-dimensional spaces can be applied not only to the ensemble mean but also to, for example, the ensemble spread.

Download Full-text

Radiative forcing and climate metrics for ozone precursor emissions: the impact of multi-model averaging

Atmospheric Chemistry and Physics Discussions ◽

10.5194/acpd-14-27195-2014 ◽

2014 ◽

Vol 14 (19) ◽

pp. 27195-27231

Author(s):

C. R. MacIntosh ◽

K. P. Shine ◽

W. J. Collins

Keyword(s):

Radiative Forcing ◽

Transport Model ◽

Task Force ◽

Mean Field ◽

Model Averaging ◽

Chemical Transport Model ◽

Ensemble Mean ◽

The Mean ◽

The Individual ◽

The Impact

Abstract. Multi-model ensembles are frequently used to assess understanding of the response of ozone and methane lifetime to changes in emissions of ozone precursors such as NOx, VOC and CO. When these ozone changes are used to calculate radiative forcing (RF) (and climate metrics such as the global warming potential (GWP) and global temperature potential (GTP)) there is a methodological choice, determined partly by the available computing resources, as to whether the mean ozone (and methane lifetime) changes are input to the radiation code, or whether each model's ozone and methane changes are used as input, with the average RF computed from the individual model RFs. We use data from the Task Force on Hemispheric Transport of Air Pollution Source-Receptor global chemical transport model ensemble to assess the impact of this choice for emission changes in 4 regions (East Asia, Europe, North America and South Asia). We conclude that using the multi-model mean ozone and methane responses is accurate for calculating the mean RF, with differences up to 0.6% for CO, 0.7% for VOC and 2% for NOx. Differences of up to 60% for NOx 7% for VOC and 3% for CO are introduced into the 20 year GWP as a result of the exponential decay terms, with similar values for the 20 years GTP. However, estimates of the SD calculated from the ensemble-mean input fields (where the SD at each point on the model grid is added to or subtracted from the mean field) are almost always substantially larger in RF, GWP and GTP metrics than the true SD, and can be larger than the model range for short-lived ozone RF, and for the 20 and 100 year GWP and 100 year GTP. We find that the effect is generally most marked for the case of NOx emissions, where the net effect is a smaller residual of terms of opposing signs. For example, the SD for the 20 year GWP is two to three times larger using the ensemble-mean fields than using the individual models to calculate the RF. Hence, while the average of multi-model fields are appropriate for calculating mean RF, GWP and GTP, they are not a reliable method for calculating the uncertainty in these fields, and in general overestimate the uncertainty.

Download Full-text

Reply to “Comment on ‘Comparison of Low-Frequency Internal Climate Variability in CMIP5 Models and Observations’”

Journal of Climate ◽

10.1175/jcli-d-17-0531.1 ◽

2017 ◽

Vol 30 (23) ◽

pp. 9773-9782 ◽

Cited By ~ 2

Author(s):

Anson H. Cheung ◽

Michael E. Mann ◽

Byron A. Steinman ◽

Leela M. Frankcombe ◽

Matthew H. England ◽

...

Keyword(s):

Low Frequency ◽

Coupled Model ◽

Internal Variability ◽

Cmip5 Models ◽

Multimodel Ensemble ◽

Internal Climate Variability ◽

Instrumental Record ◽

Ensemble Mean ◽

Intercomparison Project ◽

The Individual

In a comment on a 2017 paper by Cheung et al., Kravtsov states that the results of Cheung et al. are invalidated by errors in the method used to estimate internal variability in historical surface temperatures, which involves using the ensemble mean of simulations from phase 5 of the Coupled Model Intercomparison Project (CMIP5) to estimate the forced signal. Kravtsov claims that differences between the forced signals in the individual models and as defined by the multimodel ensemble mean lead to errors in the assessment of internal variability in both model simulations and the instrumental record. Kravtsov proposes a different method, which instead uses CMIP5 models with at least four realizations to define the forced component. Here, it is shown that the conclusions of Cheung et al. are valid regardless of whether the method of Cheung et al. or that of Kravtsov is applied. Furthermore, many of the points raised by Kravtsov are discussed in Cheung et al., and the disagreements of Kravtsov appear to be mainly due to a misunderstanding of the aims of Cheung et al.

Download Full-text

An efficient training scheme that improves the forecast skill of a supermodel

10.5194/esd-2017-6 ◽

2017 ◽

Author(s):

Francine Schevenhoven ◽

Frank Selten

Keyword(s):

Climate Models ◽

State Of The Art ◽

Atmospheric Model ◽

High Dimensional ◽

Model Errors ◽

Global Atmospheric Model ◽

Training Scheme ◽

Skill Scores ◽

Weather And Climate ◽

The Individual

Abstract. Weather and climate models have improved steadily over time as witnessed by objective skill scores, although significant model errors remain. Given these imperfect models, predictions might be improved by combining them dynamically into a so-called supermodel. In this paper a new training scheme to construct such a supermodel is explored using a technique called Cross Pollination in Time (CPT). In the CPT approach the models exchange states during the prediction. The number of possible predictions grows quickly with time and a strategy to retain only a small number of predictions, called pruning, needs to be developed. The method is explored using low-order dynamical systems and applied to a global atmospheric model. The results indicate that the CPT training is efficient and leads to a supermodel with improved forecast quality as compared to the individual models. Due to its computational efficiency, the technique is suited for application to state-of-the art high-dimensional weather and climate models.

Download Full-text

Radiative forcing and climate metrics for ozone precursor emissions: the impact of multi-model averaging

Atmospheric Chemistry and Physics ◽

10.5194/acp-15-3957-2015 ◽

2015 ◽

Vol 15 (7) ◽

pp. 3957-3969 ◽

Cited By ~ 4

Author(s):

C. R. MacIntosh ◽

K. P. Shine ◽

W. J. Collins

Keyword(s):

Standard Deviation ◽

Radiative Forcing ◽

Transport Model ◽

Task Force ◽

Chemical Transport Model ◽

Ensemble Mean ◽

Concentration Changes ◽

The Mean ◽

The Individual ◽

The Impact

Abstract. Multi-model ensembles are frequently used to assess understanding of the response of ozone and methane lifetime to changes in emissions of ozone precursors such as NOx, VOCs (volatile organic compounds) and CO. When these ozone changes are used to calculate radiative forcing (RF) (and climate metrics such as the global warming potential (GWP) and global temperature-change potential (GTP)) there is a methodological choice, determined partly by the available computing resources, as to whether the mean ozone (and methane) concentration changes are input to the radiation code, or whether each model's ozone and methane changes are used as input, with the average RF computed from the individual model RFs. We use data from the Task Force on Hemispheric Transport of Air Pollution source–receptor global chemical transport model ensemble to assess the impact of this choice for emission changes in four regions (East Asia, Europe, North America and South Asia). We conclude that using the multi-model mean ozone and methane responses is accurate for calculating the mean RF, with differences up to 0.6% for CO, 0.7% for VOCs and 2% for NOx. Differences of up to 60% for NOx 7% for VOCs and 3% for CO are introduced into the 20 year GWP. The differences for the 20 year GTP are smaller than for the GWP for NOx, and similar for the other species. However, estimates of the standard deviation calculated from the ensemble-mean input fields (where the standard deviation at each point on the model grid is added to or subtracted from the mean field) are almost always substantially larger in RF, GWP and GTP metrics than the true standard deviation, and can be larger than the model range for short-lived ozone RF, and for the 20 and 100 year GWP and 100 year GTP. The order of averaging has most impact on the metrics for NOx, as the net values for these quantities is the residual of the sum of terms of opposing signs. For example, the standard deviation for the 20 year GWP is 2–3 times larger using the ensemble-mean fields than using the individual models to calculate the RF. The source of this effect is largely due to the construction of the input ozone fields, which overestimate the true ensemble spread. Hence, while the average of multi-model fields are normally appropriate for calculating mean RF, GWP and GTP, they are not a reliable method for calculating the uncertainty in these fields, and in general overestimate the uncertainty.

Download Full-text

Presenting Climate Projection Ensembles as Mean and Reasonable Worst Case, with Application to EURO-CORDEX Precipitation

10.21203/rs.3.rs-494689/v1 ◽

2021 ◽

Author(s):

Stephen Jewson ◽

Gabriele Messori ◽

Giuliana Barbato ◽

Paola Mercogliano ◽

Jaroslav Mysiak ◽

...

Keyword(s):

Weather Forecasting ◽

Ensemble Member ◽

Climate Projection ◽

Climate Projections ◽

Case Method ◽

Worst Case ◽

Ensemble Mean ◽

The Mean ◽

The Individual ◽

Directional Component

Abstract Users of ensemble climate projections have choices with respect to how they interpret and apply the ensemble. A simplistic approach is to consider just the ensemble mean and ignore the individual ensemble members. A more thorough approach is to consider every ensemble member, although for complex impact models this may be unfeasible. Building on previous work in ensemble weather forecasting we explore an approach in-between these two extremes, in which the ensemble is represented by the mean and a reasonable worst case. The reasonable worst case is calculated using Directional Component Analysis (DCA), which is a simple statistical method that gives a robust estimate of worst-case for a given linear metric of impact, and which has various advantages relative to alternative definitions of worst-case. We present new mathematical results that clarify the interpretation of DCA and we illustrate DCA with an extensive set of synthetic examples. We then apply the mean and worst-case method based on DCA to EURO-CORDEX projections of future precipitation in Europe, with two different impact metrics. We conclude that the mean and worst-case method based on DCA is suitable for climate projection users who wish to explore the implications of the uncertainty around the ensemble mean without having to calculate the impacts of every ensemble member.

Download Full-text

Diagnosing Northern Hemisphere Jet Portrayal in 17 CMIP3 Global Climate Models: Twentieth-Century Intermodel Variability

Journal of Climate ◽

10.1175/jcli-d-12-00337.1 ◽

2013 ◽

Vol 26 (14) ◽

pp. 4910-4929 ◽

Cited By ~ 20

Author(s):

Sharon C. Delcambre ◽

David J. Lorenz ◽

Daniel J. Vimont ◽

Jonathan E. Martin

Keyword(s):

Twentieth Century ◽

Climate Models ◽

Global Climate ◽

Global Climate Models ◽

Jet Stream ◽

The Pacific ◽

Ensemble Mean ◽

The Mean ◽

Upper Level ◽

Mean State

Abstract The present study focuses on diagnosing the intermodel variability of nonzonally averaged NH winter jet stream portrayal in 17 global climate models (GCMs) from phase three of the Coupled Model Intercomparison Project (CMIP3). Relative to the reanalysis, the ensemble-mean 300-hPa Atlantic jet is too zonally extended and located too far equatorward in GCMs. The Pacific jet varies significantly between modeling groups, with large biases in the vicinity of the jet exit region that cancel in the ensemble mean. After seeking relationships between twentieth-century model wind biases and 1) the internal modes of jet variability or 2) tropical sea surface temperatures (SSTs), it is found that biases in upper-level winds are strongly related to an ENSO-like pattern in winter-mean tropical Pacific Ocean SST biases. The spatial structure of the leading modes of variability of the upper-level jet in the twentieth century is found to be accurately modeled in all 17 GCMs. Also, it is shown that Pacific model biases in the longitude of EOFs 1 and 2 are strongly linked to the modeled longitude of the Pacific jet exit, indicating that the improved characterization of the mean state of the Pacific jet may positively impact the modeled variability. This work suggests that improvements in model portrayal of the tropical Pacific mean state may significantly advance the portrayal of the mean state of the Pacific and Atlantic jets, which will consequently improve the modeled jet stream variability in the Pacific. To complement these findings, a companion paper examines the twenty-first-century GCM projections of the nonzonally averaged NH jet streams.

Download Full-text

Evaluation of MJO Predictive Skill in Multiphysics and Multimodel Global Ensembles

Monthly Weather Review ◽

10.1175/mwr-d-16-0419.1 ◽

2017 ◽

Vol 145 (7) ◽

pp. 2555-2574 ◽

Cited By ~ 11

Author(s):

Benjamin W. Green ◽

Shan Sun ◽

Rainer Bleck ◽

Stanley G. Benjamin ◽

Georg A. Grell

Keyword(s):

Deep Convection ◽

Ocean Model ◽

Multimodel Ensemble ◽

Atmospheric Flow ◽

Predictive Skill ◽

Ensemble Mean ◽

Time Lagged ◽

The Individual ◽

The Impact ◽

Mean Square Errors

Monthlong hindcasts of the Madden–Julian oscillation (MJO) from the atmospheric Flow-following Icosahedral Model coupled with an icosahedral-grid version of the Hybrid Coordinate Ocean Model (FIM-iHYCOM), and from the coupled Climate Forecast System, version 2 (CFSv2), are evaluated over the 12-yr period 1999–2010. Two sets of FIM-iHYCOM hindcasts are run to test the impact of using Grell–Freitas (FIM-CGF) versus simplified Arakawa–Schubert (FIM-SAS) deep convection parameterizations. Each hindcast set consists of four time-lagged ensemble members initialized weekly every 6 h from 1200 UTC Tuesday to 0600 UTC Wednesday. The ensemble means of FIM-CGF, FIM-SAS, and CFSv2 produce skillful forecasts of a variant of the Real-time Multivariate MJO (RMM) index out to 19, 17, and 17 days, respectively; this is consistent with FIM-CGF having the lowest root-mean-square errors (RMSEs) for zonal winds at both 850 and 200 hPa. FIM-CGF and CFSv2 exhibit similar RMSEs in RMM, and their multimodel ensemble mean extends skillful RMM prediction out to 21 days. Conversely, adding FIM-SAS—with much higher RMSEs—to CFSv2 (as a multimodel ensemble) or FIM-CGF (as a multiphysics ensemble) yields either little benefit, or even a degradation, compared to the better single-model ensemble mean. This suggests that multiphysics/multimodel ensemble mean forecasts may only add value when the individual models possess similar skill and error. An atmosphere-only version of FIM-CGF loses skill after 11 days, highlighting the importance of ocean coupling. Further examination reveals some sensitivity in skill and error metrics to the choice of MJO index.

Download Full-text