A Standard for the FAIR publication of Atmospheric Model Data developed by the AtMoDat Project

Due to the increasing amount of data produced in science, concepts for data reusability are of immense importance. One aspect is the publication of data in a way that ensures that it is findable, reusable, traceable and comparable (FAIR1 principles). However, putting these principles into practice often causes significant difficulties for researchers. Therefore some repositories accept datasets described only with the minimum metadata required for DOI allocation. Unfortunately, this contains not&#160; enough information to conform to the FAIR principles - many research data cannot be reused despite having a DOI. In contrast, other repositories aid the researchers by providing advice and strictly controlling the data and their metadata. To simplify the process of defining the needed amount of metadata and of controlling the data and metadata, the AtMoDat2 (Atmospheric Model Data) project developed a detailed standard for the FAIR publication of atmospheric model data.For this purpose we have developed a concept for the &#8220;ideal&#8221; description of atmospheric model data. A prerequisite for this is the data publication with a DataCite DOI. The ATMODAT standard3 was developed to implement this concept. The standard defines the data format as NetCDF, mandatory metadata (for DOI, landing page and data header), and naming conventions used in climate research - the Climate and Forecast conventions (CF-conventions4). However, many variable names used in urban climate research, for example, are not part of the CF-conventions. For this, standard names have to be defined together with the community and the inclusion in the list of CF-conventions has to be requested. Furthermore we developed and published Python routines which allow data producers as well as repositories to check model output data against the standard.&#160;The ATMODAT standard will first be applied by the project partners of the two participating universities (University of Hamburg and Leipzig). Here, climate model data are processed with a post-processor in preparation for publication. Subsequently, the files including the specified metadata for the DataCite metadata schema will be published by the World Data Center for Climate5 (WDCC). Data fulfilling the AtMoDat standard will be marked at the landing page by a special EASYDAB6 (Earth System Data Branding) logo. EASYDAB is a currently developed branding for FAIR and open data from the Earth System Sciences. This indicates to future data users that the dataset is a verified dataset that can be easily reused. The standardization of the data and the further steps are easily transferable to data from other disciplines.1 Wilkinson, M., Dumontier, M., Aalbersberg, I. et al.: The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18&#160;2 https://www.atmodat.de/3 https://cera-www.dkrz.de/WDCC/ui/cerasearch/entry?acronym=atmodat_standard_en_v3_04 https://cfconventions.org/5 https://cera-www.dkrz.de/WDCC/ui/cerasearch/6 https://www.easydab.de/&#160;

Download Full-text

EASYDAB (Earth System Data Branding) for FAIR and Open Data

10.5194/egusphere-egu21-2139 ◽

2021 ◽

Author(s):

Anette Ganske ◽

Amandine Kaiser ◽

Angelina Kraft ◽

Daniel Heydebreck ◽

Andrea Lammert ◽

...

Keyword(s):

Open Data ◽

Atmospheric Model ◽

Research Data ◽

Quality Data ◽

Data Repository ◽

Earth System ◽

Model Data ◽

Core Element ◽

The Earth ◽

High Maturity

As in many scientific disciplines, there are a variety of activities in Earth system sciences that address the important aspects of good research data management. What has not been sufficiently investigated and dealt with so far is the easy discoverability and re-use of quality-checked data. This aspect is taken up by the EASYDAB label.EASYDAB1 is a currently developed branding for FAIR and open data from the Earth System Sciences. The branding can be adopted by institutions running a data repository which stores data from the Earth System Sciences. EASYDAB is always connected to a research data publication with DataCite DOIs. Data published under EASYDAB are characterized by a high maturity, extensive metadata information and compliance with a comprehensive discipline-specific standard. For these datasets, the EASYDAB logo is added to the landing page of the data repository. Thereby, repositories can indicate their efforts to publish data with high maturity.The first standard made for EASYDAB is the ATMODAT standard2, which has been developed within the AtMoDat3 project (Atmospheric Model Data). It incorporates concrete recommendations and requirements related to the maturity, publication and enhanced FAIRness of atmospheric model data. The requirements are for rich metadata with controlled vocabularies, structured landing pages, file formats (netCDF) and the structure within files. Human- and machine-readable landing pages are a core element of the ATMODAT standard and should hold and present discipline-specific metadata on simulation and variable level.&#160;The ATMODAT standard includes checklists for the data producer and the data curator so that the compliance with the standard can easily be obtained by both sides. To facilitate automatic checking of the netCDF files headers, a checker program will also be provided and published with DOI. Moreover, a checker for the compliance with the requirements for the DOI Metadata will be developed and made openly available.&#160;The integration of standards from other disciplines in the Earth System Sciences, such as oceanography, into EASYDAB is helpful and desirable to improve the re-use of reviewed, high-quality data.&#160;&#160;1www.easydab.de2https://cera-www.dkrz.de/WDCC/ui/cerasearch/entry?acronym=atmodat_standard_en_v3_03www.atmodat.de

Download Full-text

Representation of the Community Earth System Model (CESM1) CAM4-chem within the Chemistry-ClimateModel Initiative (CCMI)

10.5194/gmd-2015-237 ◽

2016 ◽

Cited By ~ 1

Author(s):

S. Tilmes ◽

J.-F. Lamarque ◽

L. K. Emmons ◽

D. E. Kinnison ◽

D. Marsh ◽

...

Keyword(s):

Climate Model ◽

Evaluation Studies ◽

Surface Ozone ◽

Atmospheric Model ◽

Horizontal Resolution ◽

Earth System Model ◽

System Model ◽

Good Representation ◽

Earth System ◽

Community Earth System Model

Abstract. The Community Earth System Model, CESM1 CAM4-chem has been used to perform the Chemistry Climate Model Initiative (CCMI) reference and sensitivity simulations. In this model, the Community Atmospheric Model Version 4 (CAM4) is fully coupled to tropospheric and stratospheric chemistry. Details and specifics of each configuration, including new developments and improvements are described. CESM1 CAM4-chem is a low top model that reaches up to approximately 40 km and uses a horizontal resolution of 1.9° latitude and 2.5° longitude. For the specified dynamics experiments, the model is nudged to Modern-Era Retrospective Analysis For Research And Applications (MERRA) reanalysis. We summarize the performance of the three reference simulations suggested by CCMI, with a focus on the observed period. Comparisons with elected datasets are employed to demonstrate the general performance of the model. We highlight new datasets that are suited for multi-model evaluation studies. Most important improvements of the model are the treatment of stratospheric aerosols and the corresponding adjustments for radiation and optics, the updated chemistry scheme including improved polar chemistry and stratospheric dynamics, and improved dry deposition rates. These updates lead to a very good representation of tropospheric ozone within 20 % of values from available observations for most regions. In particular, the trend and magnitude of surface ozone has been much improved compared to earlier versions of the model. Furthermore, stratospheric column ozone of the Southern Hemisphere in winter and spring is reasonably well represented. All experiments still underestimate CO most significantly in Northern Hemisphere spring and show a significant underestimation of hydrocarbons based on surface observations.

Download Full-text

EASYDAB (Earth System Data Branding): Enhancing the Findability and the Reuse of FAIR and Open Data

10.1002/essoar.10509983.1 ◽

2022 ◽

Author(s):

Anette Ganske ◽

Angelika Heil ◽

Andrea Lammert ◽

Hannes Thiemann

Keyword(s):

Open Data ◽

Earth System ◽

System Data

Download Full-text

Earth system data cubes unravel global multivariate dynamics

10.5194/esd-2019-62 ◽

2019 ◽

Cited By ~ 2

Author(s):

Miguel D. Mahecha ◽

Fabian Gans ◽

Gunnar Brandt ◽

Rune Christiansen ◽

Sarah E. Cornell ◽

...

Keyword(s):

Data Streams ◽

Multiple Time Scales ◽

Earth System ◽

Model Data ◽

Data Cubes ◽

Intensive Research ◽

Data Intensive ◽

The Earth ◽

System Data ◽

Multiple Variables

Abstract. Understanding Earth system dynamics in the light of ongoing human intervention and dependency remains a major scientific challenge. The unprecedented availability of data streams describing different facets of the Earth now offers fundamentally new avenues to address this quest. However, several practical hurdles, especially the lack of data interoperability, limit the joint potential of these data streams. Today many initiatives within and beyond the Earth system sciences are exploring new approaches to overcome these hurdles and meet the growing inter-disciplinary need for data-intensive research; using data cubes is one promising avenue. Here, we introduce the concept of Earth system data cubes and how to operate on them in a formal way. The idea is that treating multiple data dimensions, such as spatial, temporal, variable, frequency and other grids alike, allows effective application of user-defined functions to co-interpret Earth observations and/or model-data. An implementation of this concept combines analysis-ready data cubes with a suitable analytic interface. In three case studies we demonstrate how the concept and its implementation facilitate the execution of complex workflows for research across multiple variables, spatial and temporal scales: (1) summary statistics for ecosystem and climate dynamics; (2) intrinsic dimensionality analysis on multiple time-scales; and (3) data-model integration. We discuss the emerging perspectives for investigating global interacting and coupled phenomena in observed or simulated data. Latest developments in machine learning, causal inference, and model data integration can be seamlessly implemented in the proposed framework, supporting rapid progress in data-intensive research across disciplinary boundaries.

Download Full-text

Earth system data cubes unravel global multivariate dynamics

Earth System Dynamics ◽

10.5194/esd-11-201-2020 ◽

2020 ◽

Vol 11 (1) ◽

pp. 201-234 ◽

Cited By ~ 6

Author(s):

Miguel D. Mahecha ◽

Fabian Gans ◽

Gunnar Brandt ◽

Rune Christiansen ◽

Sarah E. Cornell ◽

...

Keyword(s):

Data Integration ◽

Data Streams ◽

Scale Model ◽

Earth System ◽

Model Data ◽

Data Cubes ◽

Intensive Research ◽

Data Intensive ◽

The Earth ◽

System Data

Abstract. Understanding Earth system dynamics in light of ongoing human intervention and dependency remains a major scientific challenge. The unprecedented availability of data streams describing different facets of the Earth now offers fundamentally new avenues to address this quest. However, several practical hurdles, especially the lack of data interoperability, limit the joint potential of these data streams. Today, many initiatives within and beyond the Earth system sciences are exploring new approaches to overcome these hurdles and meet the growing interdisciplinary need for data-intensive research; using data cubes is one promising avenue. Here, we introduce the concept of Earth system data cubes and how to operate on them in a formal way. The idea is that treating multiple data dimensions, such as spatial, temporal, variable, frequency, and other grids alike, allows effective application of user-defined functions to co-interpret Earth observations and/or model–data integration. An implementation of this concept combines analysis-ready data cubes with a suitable analytic interface. In three case studies, we demonstrate how the concept and its implementation facilitate the execution of complex workflows for research across multiple variables, and spatial and temporal scales: (1) summary statistics for ecosystem and climate dynamics; (2) intrinsic dimensionality analysis on multiple timescales; and (3) model–data integration. We discuss the emerging perspectives for investigating global interacting and coupled phenomena in observed or simulated data. In particular, we see many emerging perspectives of this approach for interpreting large-scale model ensembles. The latest developments in machine learning, causal inference, and model–data integration can be seamlessly implemented in the proposed framework, supporting rapid progress in data-intensive research across disciplinary boundaries.

Download Full-text

AtMoDat: Improving the reusability of ATmospheric MOdel DATa with DataCite DOIs paving the path towards FAIR data

10.5194/egusphere-egu2020-8463 ◽

2020 ◽

Author(s):

Daniel Neumann ◽

Anette Ganske ◽

Vivien Voss ◽

Angelina Kraft ◽

Heinke Höck ◽

...

Keyword(s):

Quality Indicator ◽

System Modeling ◽

Atmospheric Model ◽

Atmospheric Modeling ◽

Research Data ◽

Earth System ◽

Model Data ◽

Earth System Modeling ◽

The Earth ◽

Generic Quality

The generation of high quality research data is expensive. The FAIR principles were established to foster the reuse of such data for the benefit of the scientific community and beyond. Publishing research data with metadata and DataCite DOIs in public repositories makes them findable and accessible (FA of FAIR). However, DOIs and basic metadata do not guarantee the data are actually reusable without discipline-specific knowledge: if data are saved in proprietary or undocumented file formats, if detailed discipline-specific metadata are missing and if quality information on the data and metadata are not provided. In this contribution, we present ongoing work in the AtMoDat project, -a consortium of atmospheric scientists and infrastructure providers, which aims on improving the reusability of atmospheric model data. &#160;&#160; Consistent standards are necessary to simplify the reuse of research data. Although standardization of file structure and metadata is well established for some subdomains of the earth system modeling community &#8211; e.g. CMIP &#8211;, several other subdomains are lacking such standardization. Hence, scientists from the Universities of Hamburg and Leipzig and infrastructure operators cooperate in the AtMoDat project in order to advance standardization for model output files in specific subdomains of the atmospheric modeling community. Starting from the demanding CMIP6 standard, the aim is to establish an easy-to-use standard that is at least compliant with the Climate and Forecast (CF) conventions. In parallel, an existing netCDF file convention checker is extended to check for the new standards. This enhanced checker is designed to support the creation of compliant files and thus lower the hurdle for data producers to comply with the new standard. The transfer of this approach to further sub-disciplines of the earth system modeling community will be supported by a best-practice guide and other documentation. A showcase of a standard for the urban atmospheric modeling community will be presented in this session. The standard is based on CF Conventions and adapts several global attributes and controlled vocabularies from the well-established CMIP6 standard. &#160;&#160; Additionally, the AtMoDat project aims on introducing a generic quality indicator into the DataCite metadata schema to foster further reuse of data. This quality indicator should require a discipline-specific implementation of a quality standard linked to the indicator. We will present the concept of the generic quality indicator in general and in the context of urban atmospheric modeling data.&#160;

Download Full-text

The ATMODAT Standard enhances FAIRness of Atmospheric Model data

10.5194/ems2021-298 ◽

2021 ◽

Author(s):

Angelika Heil ◽

Anette Ganske ◽

Andrea Lammert ◽

Daniel Heydebreck ◽

Hannes Thiemann

Keyword(s):

Climate Models ◽

Urban Climate ◽

Atmospheric Model ◽

Data Repository ◽

Model Data ◽

Core Element ◽

File Formats ◽

Fair Principles ◽

Machine Readable ◽

Wide Scientific Community

Atmospheric Model data form the basis to understand and predict weather, climate and air quality phenomena. Access to this data is not only of interest to a wide scientific community but also to public services, companies, politicians and citizens. One way to make the data available is to publish them via a data repository. To ensure that datasets in a repository are indeed Findable, Accessible, Interoperable, and Reusable (i.e. FAIR1), it is essential that the data are stored together with detailed metadata and that the file structure and metadata follow an established standard. Furthermore, datasets are easier to find and reuse if&#160; the corresponding metadata is machine-readable and uses a standardised vocabulary. While data standardization is well established in large, internationally coordinated model intercomparison projects (e.g. for climate models in CMIP2), joint standards are still lacking in many atmospheric modelling sub-disciplines, such as e.g. urban climate or cloud-resolving modelling.&#160;The AtMoDat project (Atmospheric Model Data)3, led by a team of atmospheric scientists and infrastructure providers, aims to improve the overall FAIRness of atmospheric model data and thus promote their re-use. Within the project, the ATMODAT standard4 has been developed which includes precise recommendations to achieve enhanced FAIRness of atmospheric model data in repositories. A prerequisite of this standard is that the data are published with a DataCite DOI5. The ATMODAT standard specifies requirements for rich metadata with controlled vocabularies, structured landing pages, file formats (netCDF) and the structure within files. Human- and machine-readable landing pages holding discipline-specific metadata are a core element of this standard.&#160;The ATMODAT standard is easy to implement and provides checklists for data curators and data producers. In addition, to facilitate the compliance check with the ATMODAT standard, the atmodat data checker6 has been developed. A dataset that complies with this standard will follow the FAIR principles and its metadata will be of high quality. If this compliance has been verified by the respective repository, the dataset can be labelled with the Earth System Data Branding (EASYDAB)7. This branding makes it easy for users to verify that the data are properly curated and the metadata has been quality assured.1&#160; Juckes et al., 2020: https://doi.org/10.5194/gmd-13-201-2020&#160; 2 &#160;Eyring et al., 2016: https://doi.org/10.5194/gmd-9-1937-2016 3 &#160;www.atmodat.de 4 &#160;https://doi.org/10.35095/WDCC/atmodat_standard_en_v3_0 5 &#160;https://datacite.org 6 &#160;https://github.com/AtMoDat/atmodat_data_checker&#160; 7&#160; https://easydab.de

Download Full-text

Earth System Music: music generated from the first United Kingdom Earth System model

10.5194/egusphere-egu2020-7267 ◽

2020 ◽

Author(s):

Lee de Mora ◽

Alistair Sellar ◽

Andrew Yool ◽

Julien Palmieri ◽

Robin S. Smith ◽

...

Keyword(s):

United Kingdom ◽

Time Series Data ◽

Scientific Data ◽

Earth System Model ◽

System Model ◽

Series Data ◽

Historical Period ◽

System Modelling ◽

Earth System ◽

Model Data

With the ever-growing interest from the general public towards understanding climate science, it is becoming increasingly important that we present this information in ways accessible to non-experts. In this pilot study, we use time series data from the first United Kingdom Earth System model (UKESM1) to create six procedurally generated musical pieces and use them to explain the process of modelling the earth system and to engage with the wider community.&#160;Scientific data is almost always represented graphically either in figures or in videos. By adding audio to the visualisation of model data, the combination of music and imagery provides additional contextual clues to aid in the interpretation. Furthermore, the audiolisation of model data can be employed to generate interesting and captivating music, which can not&#160; only reach a wider audience, but also hold the attention of the listeners for extended periods of time.Each of the six pieces presented in this work was themed around either a scientific principle or a practical aspect of earth system modelling. These pieces demonstrate the concepts of a spin up, a pre-industrial control run, multiple historical experiments, and the use of several future climate scenarios to a wider audience. They also show the ocean acidification over the historical period, the changes in circulation, the natural variability of the pre-industrial simulations, and the expected rise in sea surface temperature over the 20th century.&#160;Each of these pieces were arranged using different musical progression, style and tempo. All six pieces were performed by the digital piano synthesizer, TiMidity++, and were published on the lead author's YouTube channel. The videos all show the progression of the data in time with the music and a brief description of the methodology is posted alongside the video.&#160;To disseminate these works, links to each piece were published on the lead author's personal and professional social media accounts. The reach of these works was also analysed using YouTube's channel monitoring toolkit for content creators, YouTube studio.

Download Full-text