Towards improved and more routine Earth system model evaluation in CMIP
Abstract. The Coupled Model Intercomparison Project (CMIP) has successfully provided the climate community with a rich collection of simulation output from Earth system models (ESMs) that can be used to understand past climate changes and make projections and uncertainty estimates of the future. Confidence in ESMs can be gained because the models are based on physical principles and reproduce many important aspects of observed climate. Scientifically more research is required to identify the processes that are most responsible for systematic biases and the magnitude and uncertainty of future projections so that more relevant performance tests can be developed. At the same time, there are many aspects of ESM evaluation that are well-established and considered an essential part of systematic evaluation but are currently implemented ad hoc with little community coordination. Given the diversity and complexity of ESM model analysis, we argue that the CMIP community has reached a critical juncture at which many baseline aspects of model evaluation need to be performed much more efficiently to enable a systematic, open and rapid performance assessment of the large and diverse number of models that will participate in current and future phases of CMIP. Accomplishing this could also free up valuable resources as many scientists are frequently "re-inventing the wheel" by re-writing analysis routines for well-established analysis methods. A more systematic approach for the community would be to develop evaluation tools that are well suited for routine use and provide a wide range of diagnostics and performance metrics that comprehensively characterize model behaviour as soon as the output is published to the Earth System Grid Federation (ESGF). The CMIP infrastructure enforces data standards and conventions for model output accessible via ESGF, additionally publishing observations (obs4MIPs) and reanalyses (ana4MIPs) for Model Intercomparison Projects using the same data structure and organization. This largely facilitates routine evaluation of the models, but to be able to process the data automatically alongside the ESGF, the infrastructure needs to be extended with processing capabilities at the ESGF data nodes where the evaluation tools can be executed on a routine basis. Efforts are already underway to develop community-based evaluation tools, and we encourage experts to provide additional diagnostic codes that would enhance this capability for CMIP. At the same time, we encourage the community to contribute observations for model evaluation to the obs4MIPs archive. The intention is to produce through ESGF a widely accepted quasi-operational evaluation framework for climate models that would routinely execute a series of standardized evaluation tasks. Over time, as the capability matures, we expect to produce an increasingly systematic characterization of models, which, compared with early phases of CMIP, will more quickly and openly identify the strengths and weaknesses of the simulations. This will also expose whether long-standing model errors remain evident in newer models and will assist modelling groups in improving their models. This framework will be designed to readily incorporate updates, including new observations and additional diagnostics and metrics as they become available from the research community.