Extreme Metrics and Large Ensembles
Abstract. We consider the problem of estimating the ensemble sizes required to characterize the forced component and the internal variability of a range of extreme metrics. While we exploit existing large ensembles contributed to the CLIVAR Large Ensemble Project, our perspective is that of a modeling center wanting to estimate a-priori such sizes on the basis of an existing small ensemble (we use five members here). We therefore ask if such small-size ensemble is sufficient to estimate the population variance in a way accurate enough to apply a well established formula that quantifies the expected error as a function of n (the ensemble size). We find that indeed we can anticipate errors in the estimation of the forced component for temperature and precipitation extreme metrics as a function of n by applying the population variance derived by five members in the formula. For a range of spatial and temporal scales, forcing levels (we use RCP8.5 simulations), and both models considered here as our proof of concept, CESM1-CAM5 and CanESM2, it appears that an ensemble size of 20 or 25 members can provide estimates of the forced component for the extreme metrics considered that remain within small absolute and percentage errors. Additional members beyond 20 or 25 add only marginal precision to the estimate, which remains true when extreme value analysis is used. We then ask about the ensemble size required to estimate the ensemble variance (a measure of internal variability) along the length of the simulation, and – importantly – about the ensemble size required to detect significant changes in such variance along the simulation with increased external forcings. When an F-test is applied to the ratio of the variances in question, one estimated on the basis of only 5 or 10 ensemble members, one estimated using the full ensemble (up to 50 members in our study) we do not obtain significant results even when the analysis is conducted at the grid-point scale. While we recognize that there will always exist applications and metric definitions requiring larger statistical power and therefore ensemble sizes, our results suggest that for a wide range of analysis targets and scales an effective estimate of both forced component and internal variability can be achieved with sizes below 30 members. This invites consideration of the possibility of exploring additional sources of uncertainty, like physics parameter settings, when designing ensemble simulations.