scholarly journals Model dependence in multi-model climate ensembles: weighting, sub-selection and out-of-sample testing

Author(s):  
Gab Abramowitz ◽  
Nadja Herger ◽  
Ethan Gutmann ◽  
Dorit Hammerling ◽  
Reto Knutti ◽  
...  

Abstract. The rationale for using multi-model ensembles in climate change projections and impacts research is often based on the expectation that different models constitute independent estimates, so that a range of models allows a better characterisation of the uncertainties in the representation of the climate system than a single model. However, it is known that research groups share literature, ideas for representations of processes, parameterisations, evaluation data sets and even sections of model code. Thus, nominally different models might have similar biases because of similarities in the way they represent a subset of processes, or even be near duplicates of others, weakening the assumption that they constitute independent estimates. If there are near-replicates of some models, then treating all models equally is likely to bias the inferences made using these ensembles. The challenge is to establish the degree to which this might be true for any given application. While this issue is recognized by many in the community, quantifying and accounting for model dependence in anything other than an ad-hoc way is challenging. Here we present a synthesis of the range of disparate attempts to define, quantify and address model dependence in multi-model climate ensembles in a common conceptual framework, and provide guidance on how users can test the efficacy of approaches that move beyond the equally weighted ensemble. In the upcoming Coupled Model Intercomparison Project phase 6 (CMIP6), several new models that are closely related to existing models are anticipated, as well as large ensembles from some models. We argue that quantitatively accounting for dependence in addition to model performance, and thoroughly testing the effectiveness of the approach used will be key to a sound interpretation of the CMIP ensembles in future scientific studies.

2019 ◽  
Vol 10 (1) ◽  
pp. 91-105 ◽  
Author(s):  
Gab Abramowitz ◽  
Nadja Herger ◽  
Ethan Gutmann ◽  
Dorit Hammerling ◽  
Reto Knutti ◽  
...  

Abstract. The rationale for using multi-model ensembles in climate change projections and impacts research is often based on the expectation that different models constitute independent estimates; therefore, a range of models allows a better characterisation of the uncertainties in the representation of the climate system than a single model. However, it is known that research groups share literature, ideas for representations of processes, parameterisations, evaluation data sets and even sections of model code. Thus, nominally different models might have similar biases because of similarities in the way they represent a subset of processes, or even be near-duplicates of others, weakening the assumption that they constitute independent estimates. If there are near-replicates of some models, then treating all models equally is likely to bias the inferences made using these ensembles. The challenge is to establish the degree to which this might be true for any given application. While this issue is recognised by many in the community, quantifying and accounting for model dependence in anything other than an ad-hoc way is challenging. Here we present a synthesis of the range of disparate attempts to define, quantify and address model dependence in multi-model climate ensembles in a common conceptual framework, and provide guidance on how users can test the efficacy of approaches that move beyond the equally weighted ensemble. In the upcoming Coupled Model Intercomparison Project phase 6 (CMIP6), several new models that are closely related to existing models are anticipated, as well as large ensembles from some models. We argue that quantitatively accounting for dependence in addition to model performance, and thoroughly testing the effectiveness of the approach used will be key to a sound interpretation of the CMIP ensembles in future scientific studies.


JAMIA Open ◽  
2021 ◽  
Vol 4 (3) ◽  
Author(s):  
Anthony Finch ◽  
Alexander Crowell ◽  
Yung-Chieh Chang ◽  
Pooja Parameshwarappa ◽  
Jose Martinez ◽  
...  

Abstract Objective Attention networks learn an intelligent weighted averaging mechanism over a series of entities, providing increases to both performance and interpretability. In this article, we propose a novel time-aware transformer-based network and compare it to another leading model with similar characteristics. We also decompose model performance along several critical axes and examine which features contribute most to our model’s performance. Materials and methods Using data sets representing patient records obtained between 2017 and 2019 by the Kaiser Permanente Mid-Atlantic States medical system, we construct four attentional models with varying levels of complexity on two targets (patient mortality and hospitalization). We examine how incorporating transfer learning and demographic features contribute to model success. We also test the performance of a model proposed in recent medical modeling literature. We compare these models with out-of-sample data using the area under the receiver-operator characteristic (AUROC) curve and average precision as measures of performance. We also analyze the attentional weights assigned by these models to patient diagnoses. Results We found that our model significantly outperformed the alternative on a mortality prediction task (91.96% AUROC against 73.82% AUROC). Our model also outperformed on the hospitalization task, although the models were significantly more competitive in that space (82.41% AUROC against 80.33% AUROC). Furthermore, we found that demographic features and transfer learning features which are frequently omitted from new models proposed in the EMR modeling space contributed significantly to the success of our model. Discussion We proposed an original construction of deep learning electronic medical record models which achieved very strong performance. We found that our unique model construction outperformed on several tasks in comparison to a leading literature alternative, even when input data was held constant between them. We obtained further improvements by incorporating several methods that are frequently overlooked in new model proposals, suggesting that it will be useful to explore these options further in the future.


2015 ◽  
Vol 28 (6) ◽  
pp. 2332-2348 ◽  
Author(s):  
G. Abramowitz ◽  
C. H. Bishop

Abstract Obtaining multiple estimates of future climate for a given emissions scenario is key to understanding the likelihood and uncertainty associated with climate-related impacts. This is typically done by collating model estimates from different research institutions internationally with the assumption that they constitute independent samples. Heuristically, however, several factors undermine this assumption: shared treatment of processes between models, shared observed data for evaluation, and even shared model code. Here, a “perfect model” approach is used to test whether a previously proposed ensemble dependence transformation (EDT) can improve twenty-first-century Coupled Model Intercomparison Project (CMIP) projections. In these tests, where twenty-first-century model simulations are used as out-of-sample “observations,” the mean-square difference between the transformed ensemble mean and “observations” is on average 30% less than for the untransformed ensemble mean. In addition, the variance of the transformed ensemble matches the variance of the ensemble mean about the “observations” much better than in the untransformed ensemble. Results show that the EDT has a significant effect on twenty-first-century projections of both surface air temperature and precipitation. It changes projected global average temperature increases by as much as 16% (0.2°C for B1 scenario), regional average temperatures by as much as 2.6°C (RCP8.5 scenario), and regional average annual rainfall by as much as 410 mm (RCP6.0 scenario). In some regions, however, the effect is minimal. It is also found that the EDT causes changes to temperature projections that differ in sign for different emissions scenarios. This may be as much a function of the makeup of the ensembles as the nature of the forcing conditions.


2020 ◽  
Author(s):  
Christopher O'Reilly ◽  
Daniel Befort ◽  
Antje Weisheimer

<p>In this study methods of calibrating the output of large single model ensembles are examined. The methods broadly involve fitting seasonal ensemble data to observations over a reference period and scaling the ensemble signal and spread so as to optimize the fit over the reference period. These calibration methods are then applied to the future (or out-of-sample) projections. The calibration methods are tested and give indistinguishable results so the simplest of these methods, namely Homogenous Gaussian Regression, is selected. An extension to this method, applying it to dynamically decomposed data (in which the underlying data is separated into dynamical and residual components), is found to improve the reliability of the calibrated projections. The calibration methods were tested and verified using an “imperfect model” approach using the historical/RCP8.5 simulations from the CMIP5 archive. The verification indicates that this relatively straight-forward calibration produces more reliable and accurate projections than the uncalibrated (bias-corrected) ensemble for projections of future climate over Europe. When the two large ensembles are applied to observational data, the 2041-2060 climate projections for Europe for the RCP 8.5 scenario are more consistent between the two ensembles, with a slight reduction in warming but an increase in the uncertainty of the projected changes. </p>


2020 ◽  
Author(s):  
Matt Amos ◽  
Paul J. Young ◽  
J. Scott Hosking ◽  
Jean-François Lamarque ◽  
N. Luke Abraham ◽  
...  

Abstract. The current method for averaging model ensembles, which is to calculate a multi model mean, assumes model independence and equal model skill. Sharing of model components amongst families of models and research centres, conflated by growing ensemble size, means model independence cannot be assumed and is hard to quantify. We present a methodology to produce a weighted model ensemble projection, accounting for model performance and model independence. Model weights are calculated by comparing model hindcasts to a selection of metrics chosen for their physical relevance to the process or phenomena of interest. This weighting methodology is applied to the Chemistry-Climate Model Initiative (CCMI) ensemble, to investigate Antarctic ozone depletion and subsequent recovery. The weighted mean projects an ozone recovery to 1980 levels, by 2056 with a 95 % confidence interval (2052–2060), 4 years earlier than the most recent study. Perfect model testing and out-of-sample testing validate the results and show a greater projective skill than a standard multi model mean. Interestingly, the construction of a weighted mean also provides insight into model performance and dependence between the models. This weighting methodology is robust to both model and metric choices and therefore has potential applications throughout the climate and chemistry-climate modelling communities.


2021 ◽  
pp. 1-59
Author(s):  
Benjamin D. Santer ◽  
Stephen Po-Chedley ◽  
Carl Mears ◽  
John C. Fyfe ◽  
Nathan Gillett ◽  
...  

AbstractWe compare atmospheric temperature changes in satellite data and in model ensembles performed under phases 5 and 6 of the Coupled Model Intercomparison Project (CMIP5 and CMIP6). In the lower stratosphere, multi-decadal stratospheric cooling during the period of strong ozone depletion is smaller in newer CMIP6 simulations than in CMIP5 or satellite data. In the troposphere, however, despite forcing and climate sensitivity differences between the two CMIP ensembles, their ensemble-average global warming over 1979-2019 is very similar. We also examine four properties of tropical behavior governed by basic physical processes. The first three are ratios between trends inwater vapor (WV) and trends in sea surface temperature (SST), lower tropospheric temperature (TLT), and mid- to upper tropospheric temperature (TMT). The fourth property is the ratio between TMT and SST trends. All four ratios are tightly constrained in CMIP simulations but diverge markedly in observations. Model trend ratios between WV and temperature are closest to observed ratios when the latter are calculated with data sets exhibiting larger tropical warming of the ocean surface and troposphere. For the TMT/SST ratio, model-data consistency depends on the combination of observations used to estimate TMT and SST trends. If model expectations of these four covariance relationships are realistic, our findings reflect either a systematic low bias in satellite tropospheric temperature trends or an overestimate of the observed atmospheric moistening signal. It is currently difficult to determine which interpretation is more credible. Nevertheless, our analysis reveals anomalous covariance behavior in several observational data sets and illustrates the diagnostic power of simultaneously considering multiple complementary variables.


2020 ◽  
Vol 11 (3) ◽  
pp. 807-834 ◽  
Author(s):  
Anna Louise Merrifield ◽  
Lukas Brunner ◽  
Ruth Lorenz ◽  
Iselin Medhaug ◽  
Reto Knutti

Abstract. Multi-model ensembles can be used to estimate uncertainty in projections of regional climate, but this uncertainty often depends on the constituents of the ensemble. The dependence of uncertainty on ensemble composition is clear when single-model initial condition large ensembles (SMILEs) are included within a multi-model ensemble. SMILEs allow for the quantification of internal variability, a non-negligible component of uncertainty on regional scales, but may also serve to inappropriately narrow uncertainty by giving a single model many additional votes. In advance of the mixed multi-model, the SMILE Coupled Model Intercomparison version 6 (CMIP6) ensemble, we investigate weighting approaches to incorporate 50 members of the Community Earth System Model (CESM1.2.2-LE), 50 members of the Canadian Earth System Model (CanESM2-LE), and 100 members of the MPI Grand Ensemble (MPI-GE) into an 88-member Coupled Model Intercomparison Project Phase 5 (CMIP5) ensemble. The weights assigned are based on ability to reproduce observed climate (performance) and scaled by a measure of redundancy (dependence). Surface air temperature (SAT) and sea level pressure (SLP) predictors are used to determine the weights, and relationships between present and future predictor behavior are discussed. The estimated residual thermodynamic trend is proposed as an alternative predictor to replace 50-year regional SAT trends, which are more susceptible to internal variability. Uncertainty in estimates of northern European winter and Mediterranean summer end-of-century warming is assessed in a CMIP5 and a combined SMILE–CMIP5 multi-model ensemble. Five different weighting strategies to account for the mix of initial condition (IC) ensemble members and individually represented models within the multi-model ensemble are considered. Allowing all multi-model ensemble members to receive either equal weight or solely a performance weight (based on the root mean square error (RMSE) between members and observations over nine predictors) is shown to lead to uncertainty estimates that are dominated by the presence of SMILEs. A more suitable approach includes a dependence assumption, scaling either by 1∕N, the number of constituents representing a “model”, or by the same RMSE distance metric used to define model performance. SMILE contributions to the weighted ensemble are smallest (<10 %) when a model is defined as an IC ensemble and increase slightly (<20 %) when the definition of a model expands to include members from the same institution and/or development stream. SMILE contributions increase further when dependence is defined by RMSE (over nine predictors) amongst members because RMSEs between SMILE members can be as large as RMSEs between SMILE members and other models. We find that an alternative RMSE distance metric, derived from global SAT and hemispheric SLP climatology, is able to better identify IC members in general and SMILE members in particular as members of the same model. Further, more subtle dependencies associated with resolution differences and component similarities are also identified by the global predictor set.


2019 ◽  
Vol 23 (10) ◽  
pp. 4323-4331 ◽  
Author(s):  
Wouter J. M. Knoben ◽  
Jim E. Freer ◽  
Ross A. Woods

Abstract. A traditional metric used in hydrology to summarize model performance is the Nash–Sutcliffe efficiency (NSE). Increasingly an alternative metric, the Kling–Gupta efficiency (KGE), is used instead. When NSE is used, NSE = 0 corresponds to using the mean flow as a benchmark predictor. The same reasoning is applied in various studies that use KGE as a metric: negative KGE values are viewed as bad model performance, and only positive values are seen as good model performance. Here we show that using the mean flow as a predictor does not result in KGE = 0, but instead KGE =1-√2≈-0.41. Thus, KGE values greater than −0.41 indicate that a model improves upon the mean flow benchmark – even if the model's KGE value is negative. NSE and KGE values cannot be directly compared, because their relationship is non-unique and depends in part on the coefficient of variation of the observed time series. Therefore, modellers who use the KGE metric should not let their understanding of NSE values guide them in interpreting KGE values and instead develop new understanding based on the constitutive parts of the KGE metric and the explicit use of benchmark values to compare KGE scores against. More generally, a strong case can be made for moving away from ad hoc use of aggregated efficiency metrics and towards a framework based on purpose-dependent evaluation metrics and benchmarks that allows for more robust model adequacy assessment.


2021 ◽  
Author(s):  
Ali Abdolali ◽  
Andre van der Westhuysen ◽  
Zaizhong Ma ◽  
Avichal Mehra ◽  
Aron Roland ◽  
...  

AbstractVarious uncertainties exist in a hindcast due to the inabilities of numerical models to resolve all the complicated atmosphere-sea interactions, and the lack of certain ground truth observations. Here, a comprehensive analysis of an atmospheric model performance in hindcast mode (Hurricane Weather and Research Forecasting model—HWRF) and its 40 ensembles during severe events is conducted, evaluating the model accuracy and uncertainty for hurricane track parameters, and wind speed collected along satellite altimeter tracks and at stationary source point observations. Subsequently, the downstream spectral wave model WAVEWATCH III is forced by two sets of wind field data, each includes 40 members. The first ones are randomly extracted from original HWRF simulations and the second ones are based on spread of best track parameters. The atmospheric model spread and wave model error along satellite altimeters tracks and at stationary source point observations are estimated. The study on Hurricane Irma reveals that wind and wave observations during this extreme event are within ensemble spreads. While both Models have wide spreads over areas with landmass, maximum uncertainty in the atmospheric model is at hurricane eye in contrast to the wave model.


Sign in / Sign up

Export Citation Format

Share Document