EASYDAB (Earth System Data Branding) for FAIR and Open Data

Author(s):  
Anette Ganske ◽  
Amandine Kaiser ◽  
Angelina Kraft ◽  
Daniel Heydebreck ◽  
Andrea Lammert ◽  
...  

<p>As in many scientific disciplines, there are a variety of activities in Earth system sciences that address the important aspects of good research data management. What has not been sufficiently investigated and dealt with so far is the easy discoverability and re-use of quality-checked data. This aspect is taken up by the EASYDAB label.</p><p>EASYDAB<sup>1</sup> is a currently developed branding for FAIR and open data from the Earth System Sciences. The branding can be adopted by institutions running a data repository which stores data from the Earth System Sciences. EASYDAB is always connected to a research data publication with DataCite DOIs. Data published under EASYDAB are characterized by a high maturity, extensive metadata information and compliance with a comprehensive discipline-specific standard. For these datasets, the EASYDAB logo is added to the landing page of the data repository. Thereby, repositories can indicate their efforts to publish data with high maturity.</p><p>The first standard made for EASYDAB is the ATMODAT standard<sup>2</sup>, which has been developed within the AtMoDat<sup>3</sup> project (Atmospheric Model Data). It incorporates concrete recommendations and requirements related to the maturity, publication and enhanced FAIRness of atmospheric model data. The requirements are for rich metadata with controlled vocabularies, structured landing pages, file formats (netCDF) and the structure within files. Human- and machine-readable landing pages are a core element of the ATMODAT standard and should hold and present discipline-specific metadata on simulation and variable level. </p><p>The ATMODAT standard includes checklists for the data producer and the data curator so that the compliance with the standard can easily be obtained by both sides. To facilitate automatic checking of the netCDF files headers, a checker program will also be provided and published with DOI. Moreover, a checker for the compliance with the requirements for the DOI Metadata will be developed and made openly available. </p><p>The integration of standards from other disciplines in the Earth System Sciences, such as oceanography, into EASYDAB is helpful and desirable to improve the re-use of reviewed, high-quality data. </p><p> <sup>1</sup>www.easydab.de</p><p><sup>2</sup>https://cera-www.dkrz.de/WDCC/ui/cerasearch/entry?acronym=atmodat_standard_en_v3_0</p><p><sup>3</sup>www.atmodat.de</p>

2020 ◽  
Author(s):  
Daniel Neumann ◽  
Anette Ganske ◽  
Vivien Voss ◽  
Angelina Kraft ◽  
Heinke Höck ◽  
...  

<p>The generation of high quality research data is expensive. The FAIR principles were established to foster the reuse of such data for the benefit of the scientific community and beyond. Publishing research data with metadata and DataCite DOIs in public repositories makes them findable and accessible (FA of FAIR). However, DOIs and basic metadata do not guarantee the data are actually reusable without discipline-specific knowledge: if data are saved in proprietary or undocumented file formats, if detailed discipline-specific metadata are missing and if quality information on the data and metadata are not provided. In this contribution, we present ongoing work in the AtMoDat project, -a consortium of atmospheric scientists and infrastructure providers, which aims on improving the reusability of atmospheric model data.<br>  <br>Consistent standards are necessary to simplify the reuse of research data. Although standardization of file structure and metadata is well established for some subdomains of the earth system modeling community – e.g. CMIP –, several other subdomains are lacking such standardization. Hence, scientists from the Universities of Hamburg and Leipzig and infrastructure operators cooperate in the AtMoDat project in order to advance standardization for model output files in specific subdomains of the atmospheric modeling community. Starting from the demanding CMIP6 standard, the aim is to establish an easy-to-use standard that is at least compliant with the Climate and Forecast (CF) conventions. In parallel, an existing netCDF file convention checker is extended to check for the new standards. This enhanced checker is designed to support the creation of compliant files and thus lower the hurdle for data producers to comply with the new standard. The transfer of this approach to further sub-disciplines of the earth system modeling community will be supported by a best-practice guide and other documentation. A showcase of a standard for the urban atmospheric modeling community will be presented in this session. The standard is based on CF Conventions and adapts several global attributes and controlled vocabularies from the well-established CMIP6 standard.<br>  <br>Additionally, the AtMoDat project aims on introducing a generic quality indicator into the DataCite metadata schema to foster further reuse of data. This quality indicator should require a discipline-specific implementation of a quality standard linked to the indicator. We will present the concept of the generic quality indicator in general and in the context of urban atmospheric modeling data. </p>


2021 ◽  
Author(s):  
Andrea Lammert ◽  
Anette Ganske ◽  
Amandine Kaiser ◽  
Angelina Kraft

<p>Due to the increasing amount of data produced in science, concepts for data reusability are of immense importance. One aspect is the publication of data in a way that ensures that it is findable, reusable, traceable and comparable (FAIR<sup>1</sup> principles). However, putting these principles into practice often causes significant difficulties for researchers. Therefore some repositories accept datasets described only with the minimum metadata required for DOI allocation. Unfortunately, this contains not  enough information to conform to the FAIR principles - many research data cannot be reused despite having a DOI. In contrast, other repositories aid the researchers by providing advice and strictly controlling the data and their metadata. To simplify the process of defining the needed amount of metadata and of controlling the data and metadata, the AtMoDat<sup>2</sup> (Atmospheric Model Data) project developed a detailed standard for the FAIR publication of atmospheric model data.</p><p>For this purpose we have developed a concept for the “ideal” description of atmospheric model data. A prerequisite for this is the data publication with a DataCite DOI. The ATMODAT standard<sup>3</sup> was developed to implement this concept. The standard defines the data format as NetCDF, mandatory metadata (for DOI, landing page and data header), and naming conventions used in climate research - the Climate and Forecast conventions (CF-conventions<sup>4</sup>). However, many variable names used in urban climate research, for example, are not part of the CF-conventions. For this, standard names have to be defined together with the community and the inclusion in the list of CF-conventions has to be requested. Furthermore we developed and published Python routines which allow data producers as well as repositories to check model output data against the standard. </p><p>The ATMODAT standard will first be applied by the project partners of the two participating universities (University of Hamburg and Leipzig). Here, climate model data are processed with a post-processor in preparation for publication. Subsequently, the files including the specified metadata for the DataCite metadata schema will be published by the World Data Center for Climate<sup>5</sup> (WDCC). Data fulfilling the AtMoDat standard will be marked at the landing page by a special EASYDAB<sup>6</sup> (Earth System Data Branding) logo. EASYDAB is a currently developed branding for FAIR and open data from the Earth System Sciences. This indicates to future data users that the dataset is a verified dataset that can be easily reused. The standardization of the data and the further steps are easily transferable to data from other disciplines.</p><p>1 Wilkinson, M., Dumontier, M., Aalbersberg, I. et al.: The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18 </p><p>2 https://www.atmodat.de/</p><p>3 https://cera-www.dkrz.de/WDCC/ui/cerasearch/entry?acronym=atmodat_standard_en_v3_0</p><p>4 https://cfconventions.org/</p><p>5 https://cera-www.dkrz.de/WDCC/ui/cerasearch/</p><p>6 https://www.easydab.de/</p><p> </p>


2021 ◽  
Author(s):  
Angelika Heil ◽  
Anette Ganske ◽  
Andrea Lammert ◽  
Daniel Heydebreck ◽  
Hannes Thiemann

<p>Atmospheric Model data form the basis to understand and predict weather, climate and air quality phenomena. Access to this data is not only of interest to a wide scientific community but also to public services, companies, politicians and citizens. One way to make the data available is to publish them via a data repository. To ensure that datasets in a repository are indeed <strong>F</strong>indable, <strong>A</strong>ccessible, <strong>I</strong>nteroperable, and <strong>R</strong>eusable (i.e. FAIR<sup>1</sup>), it is essential that the data are stored together with detailed metadata and that the file structure and metadata follow an established standard. Furthermore, datasets are easier to find and reuse if  the corresponding metadata is machine-readable and uses a standardised vocabulary. While data standardization is well established in large, internationally coordinated model intercomparison projects (e.g. for climate models in CMIP<sup>2</sup>), joint standards are still lacking in many atmospheric modelling sub-disciplines, such as e.g. urban climate or cloud-resolving modelling. </p><p>The AtMoDat project (<strong>At</strong>mospheric <strong>Mo</strong>del <strong>Dat</strong>a)<sup>3</sup>, led by a team of atmospheric scientists and infrastructure providers, aims to improve the overall FAIRness of atmospheric model data and thus promote their re-use. Within the project, the ATMODAT standard<sup>4</sup> has been developed which includes precise recommendations to achieve enhanced FAIRness of atmospheric model data in repositories. A prerequisite of this standard is that the data are published with a DataCite DOI<sup>5</sup>. The ATMODAT standard specifies requirements for rich metadata with controlled vocabularies, structured landing pages, file formats (netCDF) and the structure within files. Human- and machine-readable landing pages holding discipline-specific metadata are a core element of this standard. </p><p>The ATMODAT standard is easy to implement and provides checklists for data curators and data producers. In addition, to facilitate the compliance check with the ATMODAT standard, the <em>atmodat data checker</em><sup>6</sup> has been developed. A dataset that complies with this standard will follow the FAIR principles and its metadata will be of high quality. If this compliance has been verified by the respective repository, the dataset can be labelled with the <strong>Ea</strong>rth <strong>Sy</strong>stem <strong>Da</strong>ta <strong>B</strong>randing (EASYDAB)<sup>7</sup>. This branding makes it easy for users to verify that the data are properly curated and the metadata has been quality assured.</p><p><sup>1</sup>  Juckes et al., 2020: https://doi.org/10.5194/gmd-13-201-2020 <br><sup>2</sup>  <span>Eyring</span> et al., 2016: https://doi.org/10.5194/gmd-9-1937-2016<br><sup>3</sup>  www.atmodat.de<br><sup>4</sup>  https://doi.org/10.35095/WDCC/atmodat_standard_en_v3_0<br><sup>5</sup>  https://datacite.org<br><sup>6</sup>  https://github.com/AtMoDat/atmodat_data_checker <br><sup>7</sup>  https://easydab.de</p>


2013 ◽  
Vol 8 (1) ◽  
pp. 193-203 ◽  
Author(s):  
Sarah Callaghan ◽  
Fiona Murphy ◽  
Jonathan Tedds ◽  
Rob Allan ◽  
John Kunze ◽  
...  

The Peer REview for Publication and Accreditation of Research Data in the Earth sciences (PREPARDE) project is a JISC and NERC funded project which aims to investigate the policies and procedures required for the formal publication of research data, ranging from ingestion into a data repository, through to formal publication in a data journal. It also addresses key issues arising in the data publication paradigm, including, but not limited to, issues related to how one peer reviews a dataset, what criteria are needed for a repository to be considered objectively trustworthy, and how datasets and journal publications can be effectively cross-linked for the benefit of the wider research community. PREPARDE brings together a wide range of experts in the research, academic publishing and data management fields both within the Earth Sciences and in the broader life sciences with the aim of producing general guidelines applicable to a wide range of scientific disciplines and data publication types. This paper provides details of the work done in the first half of the project; the project itself will be completed in June 2013.


2017 ◽  
Author(s):  
Antonio Sánchez-Padial

In Spanish.This preprint is a summary for the communication sent to the II Jornadas de Investigación Agraria para el Desarrollo (II Workshop of Agriculture Research for Development).The work shows the relevance of open data in order to face the challenge of global food security presenting the GODAN (Global Open Data for Agriculture and Nutrition) initiative. Then, it shows how INIA (the Spanish National Institute for Research and Technology in Agriculture and Food) is partnering GODAN, and it introduces briefly INIA's project for developing a agriculture research data repository. It ends with a call for participation in the GODAN initiative.


2020 ◽  
Vol 15 (2) ◽  
pp. 168-170
Author(s):  
Jennifer Kaari

A Review of: Elsayed, A. M., & Saleh, E. I. (2018). Research data management and sharing among researchers in Arab universities: An exploratory study. IFLA Journal, 44(4), 281–299. https://doi.org/10.1177/0340035218785196 Abstract Objective – To investigate researchers’ practices and attitudes regarding research data management and data sharing. Design – Email survey. Setting – Universities in Egypt, Jordan, and Saudi Arabia. Subjects – Surveys were sent to 4,086 academic faculty researchers. Methods – The survey was emailed to faculty at three Arab universities, targeting faculty in the life sciences and engineering. The survey was created using Google Docs and remained open for five months. Participants were asked basic demographic questions, questions regarding their research data and metadata practices, and questions regarding their data sharing practices. Main Results – The authors received 337 responses, for a response rate of 8%. The results showed that 48.4% of respondents had a data management plan and that 97% were responsible for preserving their own data. Most respondents stored their research data on their personal storage devices. The authors found that 64.4% of respondents reported sharing their research data. Respondents most frequently shared their data by publishing in a data research journal, sharing through academic social networks such as ResearchGate, and providing data upon request to peers. Only 5.1% of respondents shared data through an open data repository.  Of those who did not share data, data privacy and confidentiality were the most common reasons cited. Of the respondents who did share their data, contributing to scientific progress and increased citation and visibility were the primary reasons for doing so. A total of 59.6% of respondents stated that they needed more training in research data management from their universities. Conclusion – The authors conclude that researchers at Arab universities are still primarily responsible for their own data and that data management planning is still a new concept to most researchers. For the most part, the researchers had a positive attitude toward data sharing, although depositing data in open repositories is still not a widespread practice. The authors conclude that in order to encourage strong data management practices and open data sharing among Arab university researchers, more training and institutional support is needed.


2021 ◽  
Vol 3 (1) ◽  
pp. 189-204
Author(s):  
Hua Nie ◽  
Pengcheng Luo ◽  
Ping Fu

Research Data Management (RDM) has become increasingly important for more and more academic institutions. Using the Peking University Open Research Data Repository (PKU-ORDR) project as an example, this paper will review a library-based university-wide open research data repository project and related RDM services implementation process including project kickoff, needs assessment, partnerships establishment, software investigation and selection, software customization, as well as data curation services and training. Through the review, some issues revealed during the stages of the implementation process are also discussed and addressed in the paper such as awareness of research data, demands from data providers and users, data policies and requirements from home institution, requirements from funding agencies and publishers, the collaboration between administrative units and libraries, and concerns from data providers and users. The significance of the study is that the paper shows an example of creating an Open Data repository and RDM services for other Chinese academic libraries planning to implement their RDM services for their home institutions. The authors of the paper have also observed since the PKU-ORDR and RDM services implemented in 2015, the Peking University Library (PKUL) has helped numerous researchers to support the entire research life cycle and enhanced Open Science (OS) practices on campus, as well as impacted the national OS movement in China through various national events and activities hosted by the PKUL.


2019 ◽  
Author(s):  
Miguel D. Mahecha ◽  
Fabian Gans ◽  
Gunnar Brandt ◽  
Rune Christiansen ◽  
Sarah E. Cornell ◽  
...  

Abstract. Understanding Earth system dynamics in the light of ongoing human intervention and dependency remains a major scientific challenge. The unprecedented availability of data streams describing different facets of the Earth now offers fundamentally new avenues to address this quest. However, several practical hurdles, especially the lack of data interoperability, limit the joint potential of these data streams. Today many initiatives within and beyond the Earth system sciences are exploring new approaches to overcome these hurdles and meet the growing inter-disciplinary need for data-intensive research; using data cubes is one promising avenue. Here, we introduce the concept of Earth system data cubes and how to operate on them in a formal way. The idea is that treating multiple data dimensions, such as spatial, temporal, variable, frequency and other grids alike, allows effective application of user-defined functions to co-interpret Earth observations and/or model-data. An implementation of this concept combines analysis-ready data cubes with a suitable analytic interface. In three case studies we demonstrate how the concept and its implementation facilitate the execution of complex workflows for research across multiple variables, spatial and temporal scales: (1) summary statistics for ecosystem and climate dynamics; (2) intrinsic dimensionality analysis on multiple time-scales; and (3) data-model integration. We discuss the emerging perspectives for investigating global interacting and coupled phenomena in observed or simulated data. Latest developments in machine learning, causal inference, and model data integration can be seamlessly implemented in the proposed framework, supporting rapid progress in data-intensive research across disciplinary boundaries.


2020 ◽  
Vol 11 (1) ◽  
pp. 201-234 ◽  
Author(s):  
Miguel D. Mahecha ◽  
Fabian Gans ◽  
Gunnar Brandt ◽  
Rune Christiansen ◽  
Sarah E. Cornell ◽  
...  

Abstract. Understanding Earth system dynamics in light of ongoing human intervention and dependency remains a major scientific challenge. The unprecedented availability of data streams describing different facets of the Earth now offers fundamentally new avenues to address this quest. However, several practical hurdles, especially the lack of data interoperability, limit the joint potential of these data streams. Today, many initiatives within and beyond the Earth system sciences are exploring new approaches to overcome these hurdles and meet the growing interdisciplinary need for data-intensive research; using data cubes is one promising avenue. Here, we introduce the concept of Earth system data cubes and how to operate on them in a formal way. The idea is that treating multiple data dimensions, such as spatial, temporal, variable, frequency, and other grids alike, allows effective application of user-defined functions to co-interpret Earth observations and/or model–data integration. An implementation of this concept combines analysis-ready data cubes with a suitable analytic interface. In three case studies, we demonstrate how the concept and its implementation facilitate the execution of complex workflows for research across multiple variables, and spatial and temporal scales: (1) summary statistics for ecosystem and climate dynamics; (2) intrinsic dimensionality analysis on multiple timescales; and (3) model–data integration. We discuss the emerging perspectives for investigating global interacting and coupled phenomena in observed or simulated data. In particular, we see many emerging perspectives of this approach for interpreting large-scale model ensembles. The latest developments in machine learning, causal inference, and model–data integration can be seamlessly implemented in the proposed framework, supporting rapid progress in data-intensive research across disciplinary boundaries.


2016 ◽  
Vol 50 (7) ◽  
pp. 623-635
Author(s):  
Angelina Kraft ◽  
Matthias Razum ◽  
Jan Potthoff ◽  
Andrea Porzel ◽  
Thomas Engel ◽  
...  

Zusammenfassung Disziplinübergreifendes Forschungsdatenmanagement für Hochschulbibliotheken und Projekte zu vereinfachen und zu etablieren – das ist das Ziel von RADAR. Im Sommer 2016 geht mit ‚RADAR – Research Data Repository‘ ein Service an den Start, der Forschenden, Institutionen verschiedener Fachdisziplinen und Verlagen eine generische Infrastruktur für die Archivierung und Publikation von Forschungsdaten anbietet. Zu den Dienstleistungen gehören u. a. die Langzeitverfügbarkeit der Daten mit Handle oder Digital Object Identifier (DOI), ein anpassbares Rollen- und Zugriffsrechtemanagement, eine optionale Peer-Review-Funktion und Zugriffsstatistiken. Das Geschäftsmodell ermutigt Forschende, die anfallenden Nutzungsgebühren des Repositoriums in Drittmittelanträge und Datenmanagementpläne zu integrieren. Publizierte Daten stehen als Open Data zur Nachnutzung wie etwa Data Mining, Metadaten-Harvesting und Verknüpfung mit Suchportalen zur Verfügung. Diese Vernetzung ermöglicht ein nachhaltiges Forschungsdatenmanagement und die Etablierung von Dateninfrastrukturen wie RADAR.


Sign in / Sign up

Export Citation Format

Share Document