scholarly journals Towards improved and more routine Earth system model evaluation in CMIP

2016 ◽  
Author(s):  
Veronika Eyring ◽  
Peter J. Gleckler ◽  
Christoph Heinze ◽  
Ronald J. Stouffer ◽  
Karl E. Taylor ◽  
...  

Abstract. The Coupled Model Intercomparison Project (CMIP) has successfully provided the climate community with a rich collection of simulation output from Earth system models (ESMs) that can be used to understand past climate changes and make projections and uncertainty estimates of the future. Confidence in ESMs can be gained because the models are based on physical principles and reproduce many important aspects of observed climate. Scientifically more research is required to identify the processes that are most responsible for systematic biases and the magnitude and uncertainty of future projections so that more relevant performance tests can be developed. At the same time, there are many aspects of ESM evaluation that are well-established and considered an essential part of systematic evaluation but are currently implemented ad hoc with little community coordination. Given the diversity and complexity of ESM model analysis, we argue that the CMIP community has reached a critical juncture at which many baseline aspects of model evaluation need to be performed much more efficiently to enable a systematic, open and rapid performance assessment of the large and diverse number of models that will participate in current and future phases of CMIP. Accomplishing this could also free up valuable resources as many scientists are frequently "re-inventing the wheel" by re-writing analysis routines for well-established analysis methods. A more systematic approach for the community would be to develop evaluation tools that are well suited for routine use and provide a wide range of diagnostics and performance metrics that comprehensively characterize model behaviour as soon as the output is published to the Earth System Grid Federation (ESGF). The CMIP infrastructure enforces data standards and conventions for model output accessible via ESGF, additionally publishing observations (obs4MIPs) and reanalyses (ana4MIPs) for Model Intercomparison Projects using the same data structure and organization. This largely facilitates routine evaluation of the models, but to be able to process the data automatically alongside the ESGF, the infrastructure needs to be extended with processing capabilities at the ESGF data nodes where the evaluation tools can be executed on a routine basis. Efforts are already underway to develop community-based evaluation tools, and we encourage experts to provide additional diagnostic codes that would enhance this capability for CMIP. At the same time, we encourage the community to contribute observations for model evaluation to the obs4MIPs archive. The intention is to produce through ESGF a widely accepted quasi-operational evaluation framework for climate models that would routinely execute a series of standardized evaluation tasks. Over time, as the capability matures, we expect to produce an increasingly systematic characterization of models, which, compared with early phases of CMIP, will more quickly and openly identify the strengths and weaknesses of the simulations. This will also expose whether long-standing model errors remain evident in newer models and will assist modelling groups in improving their models. This framework will be designed to readily incorporate updates, including new observations and additional diagnostics and metrics as they become available from the research community.

2016 ◽  
Vol 7 (4) ◽  
pp. 813-830 ◽  
Author(s):  
Veronika Eyring ◽  
Peter J. Gleckler ◽  
Christoph Heinze ◽  
Ronald J. Stouffer ◽  
Karl E. Taylor ◽  
...  

Abstract. The Coupled Model Intercomparison Project (CMIP) has successfully provided the climate community with a rich collection of simulation output from Earth system models (ESMs) that can be used to understand past climate changes and make projections and uncertainty estimates of the future. Confidence in ESMs can be gained because the models are based on physical principles and reproduce many important aspects of observed climate. More research is required to identify the processes that are most responsible for systematic biases and the magnitude and uncertainty of future projections so that more relevant performance tests can be developed. At the same time, there are many aspects of ESM evaluation that are well established and considered an essential part of systematic evaluation but have been implemented ad hoc with little community coordination. Given the diversity and complexity of ESM analysis, we argue that the CMIP community has reached a critical juncture at which many baseline aspects of model evaluation need to be performed much more efficiently and consistently. Here, we provide a perspective and viewpoint on how a more systematic, open, and rapid performance assessment of the large and diverse number of models that will participate in current and future phases of CMIP can be achieved, and announce our intention to implement such a system for CMIP6. Accomplishing this could also free up valuable resources as many scientists are frequently "re-inventing the wheel" by re-writing analysis routines for well-established analysis methods. A more systematic approach for the community would be to develop and apply evaluation tools that are based on the latest scientific knowledge and observational reference, are well suited for routine use, and provide a wide range of diagnostics and performance metrics that comprehensively characterize model behaviour as soon as the output is published to the Earth System Grid Federation (ESGF). The CMIP infrastructure enforces data standards and conventions for model output and documentation accessible via the ESGF, additionally publishing observations (obs4MIPs) and reanalyses (ana4MIPs) for model intercomparison projects using the same data structure and organization as the ESM output. This largely facilitates routine evaluation of the ESMs, but to be able to process the data automatically alongside the ESGF, the infrastructure needs to be extended with processing capabilities at the ESGF data nodes where the evaluation tools can be executed on a routine basis. Efforts are already underway to develop community-based evaluation tools, and we encourage experts to provide additional diagnostic codes that would enhance this capability for CMIP. At the same time, we encourage the community to contribute observations and reanalyses for model evaluation to the obs4MIPs and ana4MIPs archives. The intention is to produce through the ESGF a widely accepted quasi-operational evaluation framework for CMIP6 that would routinely execute a series of standardized evaluation tasks. Over time, as this capability matures, we expect to produce an increasingly systematic characterization of models which, compared with early phases of CMIP, will more quickly and openly identify the strengths and weaknesses of the simulations. This will also reveal whether long-standing model errors remain evident in newer models and will assist modelling groups in improving their models. This framework will be designed to readily incorporate updates, including new observations and additional diagnostics and metrics as they become available from the research community.


2019 ◽  
Author(s):  
Duane Waliser ◽  
Peter J. Gleckler ◽  
Robert Ferraro ◽  
Karl E. Taylor ◽  
Sasha Ames ◽  
...  

Abstract. The Observations for Model Intercomparison Projects (Obs4MIPs) was initiated in 2010 to facilitate the use of observations in climate model evaluation and research, with a particular target being the Coupled Model Intercomparison Project (CMIP), a major initiative of the World Climate Research Programme (WCRP). To this end, Obs4MIPs: 1) targets observed variables that can be compared to CMIP model variables, 2) utilizes dataset formatting specifications and metadata requirements closely aligned with CMIP model output, 3) provides brief technical documentation for each dataset, designed for non-experts and tailored towards relevance for model evaluation, including information on uncertainty, dataset merits and limitations, and 4) disseminates the data through the Earth System Grid Federation (ESGF) platforms, making the observations searchable and accessible via the same portals as the model output. Taken together, these characteristics of the organization and structure of obs4MIPs should entice a more diverse community of researchers to engage in the comparison of model output with observations and to contribute to a more comprehensive evaluation of the climate models. At present, the number of obs4MIPs datasets has grown to about 80, many undergoing updates, with another 20 or so in preparation, and more than 100 proposed and under consideration. Current global satellite-based datasets include, but are not limited to, humidity and temperature profiles; a wide range of cloud and aerosol observations; ocean surface wind, temperature, height, and sea ice fraction; surface and top of atmosphere longwave and shortwave radiation; along with ozone (O3), methane (CH4) and carbon dioxide (CO2) products. Proposed products expected for inclusion for CMIP6 analysis include, but are not limited to, alternative products for the above quantities, along with additional products for ocean surface flux and chlorophyll products, a number of vegetation products (e.g. FAPAR, LAI, burnt area fraction), ice sheet mass and height, carbon monoxide (CO) and nitrogen dioxide (NO2). While most obs4MIPs datasets are delivered as monthly and global, greater emphasis is being places on products with higher time resolution (e.g. daily) and/or regional products. Along with an increasing number of datasets, obs4MIPs has implemented a number of capability upgrades including: 1) an updated obs4MIPs data specifications document that provides for additional search facets and generally improves congruence with CMIP6 specifications for model datasets, 2) a set of six easily understood indicators that help guide users as to a dataset’s maturity and suitability for application, and 3) an option to supply supplemental information about a dataset beyond what can be found in the standard metadata. With the maturation of the obs4MIPs framework, the dataset inclusion process, and the dataset formatting guidelines and resources, the scope of the observations being considered is expected to grow to include gridded in-situ datasets as well as datasets with a regional focus, and the ultimate intent is to judiciously expand this scope to any observation dataset that has applicability for evaluation of the types of Earth System models used in CMIP.


2020 ◽  
Vol 13 (7) ◽  
pp. 2945-2958 ◽  
Author(s):  
Duane Waliser ◽  
Peter J. Gleckler ◽  
Robert Ferraro ◽  
Karl E. Taylor ◽  
Sasha Ames ◽  
...  

Abstract. The Observations for Model Intercomparison Project (Obs4MIPs) was initiated in 2010 to facilitate the use of observations in climate model evaluation and research, with a particular target being the Coupled Model Intercomparison Project (CMIP), a major initiative of the World Climate Research Programme (WCRP). To this end, Obs4MIPs (1) targets observed variables that can be compared to CMIP model variables; (2) utilizes dataset formatting specifications and metadata requirements closely aligned with CMIP model output; (3) provides brief technical documentation for each dataset, designed for nonexperts and tailored towards relevance for model evaluation, including information on uncertainty, dataset merits, and limitations; and (4) disseminates the data through the Earth System Grid Federation (ESGF) platforms, making the observations searchable and accessible via the same portals as the model output. Taken together, these characteristics of the organization and structure of obs4MIPs should entice a more diverse community of researchers to engage in the comparison of model output with observations and to contribute to a more comprehensive evaluation of the climate models. At present, the number of obs4MIPs datasets has grown to about 80; many are undergoing updates, with another 20 or so in preparation, and more than 100 are proposed and under consideration. A partial list of current global satellite-based datasets includes humidity and temperature profiles; a wide range of cloud and aerosol observations; ocean surface wind, temperature, height, and sea ice fraction; surface and top-of-atmosphere longwave and shortwave radiation; and ozone (O3), methane (CH4), and carbon dioxide (CO2) products. A partial list of proposed products expected to be useful in analyzing CMIP6 results includes the following: alternative products for the above quantities, additional products for ocean surface flux and chlorophyll products, a number of vegetation products (e.g., FAPAR, LAI, burned area fraction), ice sheet mass and height, carbon monoxide (CO), and nitrogen dioxide (NO2). While most existing obs4MIPs datasets consist of monthly-mean gridded data over the global domain, products with higher time resolution (e.g., daily) and/or regional products are now receiving more attention. Along with an increasing number of datasets, obs4MIPs has implemented a number of capability upgrades including (1) an updated obs4MIPs data specifications document that provides additional search facets and generally improves congruence with CMIP6 specifications for model datasets, (2) a set of six easily understood indicators that help guide users as to a dataset's maturity and suitability for application, and (3) an option to supply supplemental information about a dataset beyond what can be found in the standard metadata. With the maturation of the obs4MIPs framework, the dataset inclusion process, and the dataset formatting guidelines and resources, the scope of the observations being considered is expected to grow to include gridded in situ datasets as well as datasets with a regional focus, and the ultimate intent is to judiciously expand this scope to any observation dataset that has applicability for evaluation of the types of Earth system models used in CMIP.


2021 ◽  
Vol 166 (1-2) ◽  
Author(s):  
Charlie Wilson ◽  
Céline Guivarch ◽  
Elmar Kriegler ◽  
Bas van Ruijven ◽  
Detlef P. van Vuuren ◽  
...  

AbstractProcess-based integrated assessment models (IAMs) project long-term transformation pathways in energy and land-use systems under what-if assumptions. IAM evaluation is necessary to improve the models’ usefulness as scientific tools applicable in the complex and contested domain of climate change mitigation. We contribute the first comprehensive synthesis of process-based IAM evaluation research, drawing on a wide range of examples across six different evaluation methods including historical simulations, stylised facts, and model diagnostics. For each evaluation method, we identify progress and milestones to date, and draw out lessons learnt as well as challenges remaining. We find that each evaluation method has distinctive strengths, as well as constraints on its application. We use these insights to propose a systematic evaluation framework combining multiple methods to establish the appropriateness, interpretability, credibility, and relevance of process-based IAMs as useful scientific tools for informing climate policy. We also set out a programme of evaluation research to be mainstreamed both within and outside the IAM community.


2020 ◽  
Vol 13 (7) ◽  
pp. 3383-3438 ◽  
Author(s):  
Veronika Eyring ◽  
Lisa Bock ◽  
Axel Lauer ◽  
Mattia Righi ◽  
Manuel Schlund ◽  
...  

Abstract. The Earth System Model Evaluation Tool (ESMValTool) is a community diagnostics and performance metrics tool designed to improve comprehensive and routine evaluation of Earth system models (ESMs) participating in the Coupled Model Intercomparison Project (CMIP). It has undergone rapid development since the first release in 2016 and is now a well-tested tool that provides end-to-end provenance tracking to ensure reproducibility. It consists of (1) an easy-to-install, well-documented Python package providing the core functionalities (ESMValCore) that performs common preprocessing operations and (2) a diagnostic part that includes tailored diagnostics and performance metrics for specific scientific applications. Here we describe large-scale diagnostics of the second major release of the tool that supports the evaluation of ESMs participating in CMIP Phase 6 (CMIP6). ESMValTool v2.0 includes a large collection of diagnostics and performance metrics for atmospheric, oceanic, and terrestrial variables for the mean state, trends, and variability. ESMValTool v2.0 also successfully reproduces figures from the evaluation and projections chapters of the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5) and incorporates updates from targeted analysis packages, such as the NCAR Climate Variability Diagnostics Package for the evaluation of modes of variability, the Thermodynamic Diagnostic Tool (TheDiaTo) to evaluate the energetics of the climate system, as well as parts of AutoAssess that contains a mix of top–down performance metrics. The tool has been fully integrated into the Earth System Grid Federation (ESGF) infrastructure at the Deutsches Klimarechenzentrum (DKRZ) to provide evaluation results from CMIP6 model simulations shortly after the output is published to the CMIP archive. A result browser has been implemented that enables advanced monitoring of the evaluation results by a broad user community at much faster timescales than what was possible in CMIP5.


2021 ◽  
Author(s):  
Bouwe Andela ◽  
Fakhereh Alidoost ◽  
Lukas Brunner ◽  
Jaro Camphuijsen ◽  
Bas Crezee ◽  
...  

<p>The Earth System Model Evaluation Tool (ESMValTool) is a free and open-source community diagnostic and performance metrics tool for the evaluation of Earth system models such as those participating in the Coupled Model Intercomparison Project (CMIP). Version 2 of the tool (Righi et al. 2020, www.esmvaltool.org) features a brand new design composed of a core that finds and processes data according to a ‘recipe’ and an extensive collection of ready-to-use recipes and associated diagnostic codes for reproducing results from published papers. Development and discussion of the tool (mostly) takes place in public on https://github.com/esmvalgroup and anyone with an interest in climate model evaluation is welcome to join there.</p><p> </p><p>Since the initial release of version 2 in the summer of 2020, many improvements have been made to the tool. It is now more user friendly with extensive documentation available on docs.esmvaltool.org and a step by step online tutorial. Regular releases, currently planned three times a year, ensure that recent contributions become available quickly while still ensuring a high level of quality control. The tool can be installed from conda, but portable docker and singularity containers are also available.</p><p> </p><p>Recent new features include a more user-friendly command-line interface, citation information per figure including CMIP6 data citation using ES-DOC, more and faster preprocessor functions that require less memory, automatic corrections for a larger number of CMIP6 datasets, support for more observational and reanalysis datasets, and more recipes and diagnostics.</p><p> </p><p>The tool is now also more reliable, with improved automated testing through more unit tests for the core, as well as a recipe testing service running at DKRZ for testing the scientific recipes and diagnostics that are bundled into the tool. The community maintaining and developing the tool is growing, making the project less dependent on individual contributors. There are now technical and scientific review teams that review new contributions for technical quality and scientific correctness and relevance respectively, two new principal investigators for generating a larger support base in the community, and a newly created user engagement team that is taking care of improving the overall user experience.</p>


2021 ◽  
Author(s):  
Jerome Servonnat ◽  
Eric Guilyardi ◽  
Zofia Stott ◽  
Kim Serradell ◽  
Axel Lauer ◽  
...  

<p>Developing an Earth system model evaluation tool for a broad user community is a real challenge, as the potential users do not necessarily have the same needs or expectations. While many evaluation tasks across user communities include common steps, significant differences are also apparent, not least the investment by institutions and individuals in bespoke tools. A key question is whether there is sufficient common ground to pursue a community tool with broad appeal and application.</p><p>We present the main results of a survey carried out by Assimila for the H2020 IS-ENES3 project to review the model evaluation needs of European Earth System Modelling communities. Interviewing approximately 30 participants among several European institutions, the survey targeted a broad range of users, including model developers, model users, evaluation data providers, and infrastructure providers. The output of the study provides an analysis of  requirements focusing on key technical, standards, and governance aspects.</p><p>The study used ESMValTool as a  current benchmark in terms of European evaluation tools. It is a community diagnostics and performance metrics tool for the evaluation of Earth System Models that allows for comparison of single or multiple models, either against predecessor versions or against observations. The tool is being developed in such a way that additional analyses can be added. As a community effort open to both users and developers, it encourages open exchange of diagnostic source code and evaluation results. It is currently used in Coupled Model Intercomparison Projects as well as for the development and testing of “new” models.</p><p>A key result of the survey is the widespread support for ESMValTool amongst users, developers, and even those who have taken or promote other approaches. The results of the survey identify priorities and opportunities in the further development of the ESMValTool to ensure long-term adoption of the tool by a broad community.</p>


2020 ◽  
Author(s):  
Valeriu Predoi ◽  
Bouwe Andela ◽  
Lee De Mora ◽  
Axel Lauer

<p>The Earth System Model eValuation Tool (ESMValTool) is a powerful community-driven diagnostics and performance metrics tool. It is used for the evaluation of Earth System Models (ESMs) and allows for routine comparisons of either multiple model versions or observational datasets. ESMValTool's design is highly modular and flexible so that additional analyses can easily be added; in fact, this is essential to encourage the community-based approach to its scientific development. A set of standardized recipes for each scientific topic reproduces specific diagnostics or performance metrics that have demonstrated their importance in ESM evaluation in the peer-reviewed literature. Scientific themes include selected Essential Climate Variables, a range of known systematic biases common to ESMs such as coupled tropical climate variability, monsoons, Southern Ocean processes, continental dry biases and soil hydrology-climate interactions, as well as atmospheric CO3 budgets, tropospheric and stratospheric ozone, and tropospheric aerosols. We will outline the main functional characteristics of ESMValTool Version 2; we will also introduce the reader to the current set of diagnostics and the methods they can use to contribute to its development.</p>


2017 ◽  
Author(s):  
Karthik Kumarasamy ◽  
Patrick Belmont

Abstract. Watershed scale models simulating hydrology and water quality have advanced rapidly in sophistication, process representation, flexibility in model structure, and input data. Given the importance of these models to support decision-making for a wide range of environmental issues, the hydrology community is compelled to improve the metrics used to evaluate model performance. More targeted and comprehensive metrics will facilitate better and more efficient calibration and will help demonstrate that the model is useful for the intended purpose. Here we introduce a suite of new tools for model evaluation, packaged as an open-source Hydrologic Model Evaluation (HydroME) Toolbox. Specifically, we demonstrate the use of box plots to illustrate the full distribution of common model performance metrics, such as R2, use of Euclidian distance, empirical Quantile-Quantile (Q-Q) plots and flow duration curves as simple metrics to identify and localize errors in model simulations. Further, we demonstrate the use of magnitude squared coherence to compare the frequency content between observed and modeled streamflow and wavelet coherence to localize frequency mismatches in time. We provide a rationale for a hierarchical selection of parameters to adjust during calibration and recommend that modelers progress from parameters with the most uncertainty to the least uncertainty, namely starting with pure calibration parameters, followed by derived parameters, and finally measured parameters. We apply these techniques in the calibration and evaluation of models of two watersheds, the Le Sueur River Basin (2880 km2) and Root River Basin (4300 km2) in southern Minnesota, USA.


2020 ◽  
Author(s):  
Ngan Thi Dong ◽  
Megha Khosla

AbstractMotivationA variety of machine learning based approaches have been applied to predicting miRNA-disease association. Although promising, the evaluation set up to measure prediction performance is inconsistent making it difficult to assess the actual progress. A more acute problem is that most of the models overlook the problem of data leakage due to the use of precomputed miRNA and disease similarity features.ResultsWe unearth a crucial problem of data leakage in evaluation of machine learning models for miRNA-disease association prediction. In particular, information from test set, in the form of precomputed input features for miRNA and disease, is used during training of the model. Moreover, we point out problems in the widely used performance metrics used in model evaluation. While resolving the issues of data leakage and model evaluation, we perform an indepth study of 3 recent models along with our proposed 9 variants of these models. Our proposed variants have resulted in improvements in Average Precision scores (as compared to original models) by approximately 287.7% and 36.7% on HMDDv2.0 (AP:0.504) and HMDDv3.0 (AP: 0.216) datasets respectively.Availability and ImplementationWe release a unified evaluation framework including all models and datasets at https://git.l3s.uni-hannover.de/dong/simplifying_mirna_disease.


Sign in / Sign up

Export Citation Format

Share Document