PalMod-II Data Management Plan: A FAIR-inspired conceptual framework for data simulation, inter-comparison, sharing and publication  

Author(s):  
Swati Gehlot ◽  
Karsten Peters-von Gehlen ◽  
Andrea Lammert

<p>Large scale transient climate simulations and their intercomparison with paleo data within the German initiative PalMod (www.palmod.de, currently in phase II) provides an exclusive example of applying a Data Management Plan (DMP) to conceptualise  data workflows within and outside a large multidisciplinary project. PalMod-II data products include output of three state-of-the-art climate models with various coupling complexities and spatial resolutions  simulating the climate of the past 130,000 years. Additional to the long time series of model data, a comprehensive compilation of paleo-observation data (including a model-observation-comparison toolbox, Baudouin et al, 2021 EGU-CL1.2) is envisaged for validation. </p><p>Owing to the enormous amount of data coming from models and observations, produced and handled by different groups of scientists spread across various institutions, a dedicated DMP as a living document provides a data-workflow framework for exchange and sharing of data within and outside the PalMod community. The DMP covers the data life cycle within the project starting from its generation (data formats and standards), analysis (intercomparison with models and observations), publication (usage, licences), dissemination (standardised, via ESGF) and finally archiving after the project lifetime. As an active and continually updated document, the DMP ensures the ownership and responsibilities of data subsets of various working groups along with their data sharing/reuse regulations within the working groups in order to ensure a sustained progress towards the project goals. </p><p>This contribution discusses the current status and challenges of the DMP for PalMod-II which covers the details of data produced within various working groups, project-wide workflow strategy for sharing and exchange of data, as well as a definition of a PalMod-II variable list for ESGF standard publication. The FAIR (Findability, Accessibility, Interoperability, and Reusability) data principles play a central role and are proposed for the entire life cycle of PalMod-II data products (model and proxy paleo data) for sharing/reuse during and after the project lifetime.</p><p><br><br></p>

Author(s):  
Ewa Deelman ◽  
Ann Chervenak

Scientific applications such as those in astronomy, earthquake science, gravitational-wave physics, and others have embraced workflow technologies to do large-scale science. Workflows enable researchers to collaboratively design, manage, and obtain results that involve hundreds of thousands of steps, access terabytes of data, and generate similar amounts of intermediate and final data products. Although workflow systems are able to facilitate the automated generation of data products, many issues still remain to be addressed. These issues exist in different forms in the workflow lifecycle. This chapter describes a workflow lifecycle as consisting of a workflow generation phase where the analysis is defined, the workflow planning phase where resources needed for execution are selected, the workflow execution part, where the actual computations take place, and the result, metadata, and provenance storing phase. The authors discuss the issues related to data management at each step of the workflow cycle. They describe challenge problems and illustrate them in the context of real-life applications. They discuss the challenges, possible solutions, and open issues faced when mapping and executing large-scale workflows on current cyberinfrastructure. They particularly emphasize the issues related to the management of data throughout the workflow lifecycle.


2019 ◽  
Vol 15 (2) ◽  
Author(s):  
Viviane Santos de Oliveira Veiga ◽  
Patricia Henning ◽  
Simone Dib ◽  
Erick Penedo ◽  
Jefferson Da Costa Lima ◽  
...  

RESUMO Este artigo trás para discussão o papel dos planos de gestão de dados como instrumento facilitador da gestão dos dados durante todo o ciclo de vida da pesquisa. A abertura de dados de pesquisa é pauta prioritária nas agendas científicas, por ampliar tanto a visibilidade e transparência das investigações, como a capacidade de reprodutibilidade e reuso dos dados em novas pesquisas. Nesse contexto, os princípios FAIR, um acrônimo para ‘Findable’, ‘Accessible’, ‘Interoperable’ e ‘Reusable’ é fundamental por estabelecerem orientações basilares e norteadoras na gestão, curadoria e preservação dos dados de pesquisa direcionados para o compartilhamento e o reuso. O presente trabalho tem por objetivo apresentar uma proposta de template de Plano de Gestão de Dados, alinhado aos princípios FAIR, para a Fundação Oswaldo Cruz. A metodologia utilizada é de natureza bibliográfica e de análise documental de diversos planos de gestão de dados europeus. Concluímos que a adoção de um plano de gestão nas práticas cientificas de universidades e instituições de pesquisa é fundamental. No entanto, para tirar maior proveito dessa atividade é necessário contar com a participação de todos os atores envolvidos no processo, além disso, esse plano de gestão deve ser machine-actionable, ou seja, acionável por máquina.Palavras-chave: Plano de Gestão de Dados; Dado de Pesquisa; Princípios FAIR; PGD Acionável por Máquina; Ciência Aberta.ABSTRACT This article proposes to discuss the role of data management plans as a tool to facilitate data management during researches life cycle. Today, research data opening is a primary agenda at scientific agencies as it may boost investigations’ visibility and transparency as well as the ability to reproduce and reuse its data on new researches. Within this context, FAIR principles, an acronym for Findable, Accessible, Interoperable and Reusable, is paramount, as it establishes basic and guiding orientations for research data management, curatorship and preservation with an intent on its sharing and reuse. The current work intends to present to the Fundação Oswaldo Cruz a new Data Management Plan template proposal, aligned with FAIR principles. The methodology used is bibliographical research and documental analysis of several European data management plans. We conclude that the adoption of a management plan on universities and research institutions scientific activities is paramount. However, to be fully benefited from this activity, all actors involved in the process must participate, and, on top of that, this plan must be machine-actionable.Keywords: Data Management Plan; Research Data; FAIR Principles; DMP Machine-Actionable; Open Science.


Author(s):  
Leif Schulman ◽  
Aino Juslén ◽  
Kari Lahti

The service model of the Global Biodiversity Information Facility (GBIF) is being implemented in an increasing number of national biodiversity (BD) data services. While GBIF already shares >109 data points, national initiatives are an essential component: increase in GBIF-mediated data relies on national data mobilisation and GBIF is not optimised to support local use. The Finnish Biodiversity Information Facility (FinBIF), initiated in 2012 and operational since late 2016, is one of the more recent examples of national BD research infrastructures (RIs) – and arguably among the most comprehensive. Here, we describe FinBIF’s development and service integration, and provide a model approach for the construction of all-inclusive national BD RIs. FinBIF integrates a wide array of BD RI approaches under the same umbrella. These include large-scale and multi-technology digitisation of natural history collections; building a national DNA barcode reference library and linking it to species occurrence data; citizen science platforms enabling recording, managing and sharing of observation data; management and sharing of restricted data among authorities; community-driven species identification support; an e-learning environment for species identification; and IUCN Red Listing (Fig. 1). FinBIF’s aims are to accelerate digitisation, mobilisation, and distribution of biodiversity data and to boost their use in research and education, environmental administration, and the private sector. The core functionalities of FinBIF were built in a 3.5-year project (01/2015–06/2018) by a consortium of four university-based natural history collection facilities led by the Finnish Museum of Natural History Luomus. Close to 30% of the total funding was granted through the Finnish Research Infrastructures programme (FIRI) governed by the national research council and based on scientific excellence. Government funds for productivity enhancement in state administration covered c.40 % of the development and the rest was self-financed by the implementing consortium of organisations that have both a research and an education mission. The cross-sectoral scope of FinBIF has led to rapid uptake and a broad user base of its functionalities and services. Not only researchers but also administrative authorities, various enterprises and a large number of private citizens show a significant interest in the RI (Table 1). FinBIF is now in its second construction cycle (2019–2022), funded through the FIRI programme and, thus, focused on researcher services. The work programme includes integration of tools for data management in ecological restoration and e-Lab tools for spatial analyses, morphometric analysis of 3D images, species identification from sound recordings, and metagenomics analyses.


2021 ◽  
Vol 16 (1) ◽  
pp. 48
Author(s):  
Robert J. Sandusky ◽  
Suzie Allard ◽  
Lynn Baird ◽  
Leah Cannon ◽  
Kevin Crowston ◽  
...  

DataONE, funded from 2009-2019 by the U.S. National Science Foundation, is an early example of a large-scale project that built both a cyberinfrastructure and culture of data discovery, sharing, and reuse. DataONE used a Working Group model, where a diverse group of participants collaborated on targeted research and development activities to achieve broader project goals. This article summarizes the work carried out by two of DataONE’s working groups: Usability & Assessment (2009-2019) and Sociocultural Issues (2009-2014). The activities of these working groups provide a unique longitudinal look at how scientists, librarians, and other key stakeholders engaged in convergence research to identify and analyze practices around research data management through the development of boundary objects, an iterative assessment program, and reflection. Members of the working groups disseminated their findings widely in papers, presentations, and datasets, reaching international audiences through publications in 25 different journals and presentations to over 5,000 people at interdisciplinary venues. The working groups helped inform the DataONE cyberinfrastructure and influenced the evolving data management landscape. By studying working groups over time, the paper also presents lessons learned about the working group model for global large-scale projects that bring together participants from multiple disciplines and communities in convergence research.


2020 ◽  
Author(s):  
Jon Seddon ◽  
Ag Stephens

<div> <p>The PRIMAVERA project aims to develop a new generation of advanced and well evaluated high-resolution global climate models. An integral component of PRIMAVERA is a new set of simulations at standard and high-resolution from seven different European climate models. The expected data volume is 1.6 petabytes, which is comparable to the total volume of data in CMIP5.  </p> </div><div> <p>A comprehensive Data Management Plan (DMP) was developed to allow the distributed group of scientists to produce and analyse this volume of data during the project’s limited time duration. The DMP uses the approach of taking the analysis to the data. The simulations were run on HPCs across Europe and the data was transferred to the JASMIN super-data-cluster at the Rutherford Appleton Laboratory. A Data Management Tool (DMT) was developed to catalogue the available data and allow users to search through it using an intuitive web-based interface. The DMT allows users to request that the data they require is restored from tape to disk. The users are then able to perform all their analyses at JASMIN. The DMT also controls the publication of the data to the Earth System Grid Federation, making it available to the global community. </p> </div><div> <p>Here we introduce JASMIN and the PRIMAVERA data management plan. We describe how the DMT allowed the project’s scientists to analyse this multi-model dataset. We describe how the tools and techniques developed can help future projects.</p> </div>


2020 ◽  
Author(s):  
Kate Winfield

<p>Sending data to a secure long-term archive is increasingly a necessity for science projects due to the funding body and publishing requirements. It is also good practice for long term scientific aims and to enable the preservation and re-use of valuable research data. The Centre for Environmental Data Analysis (CEDA) hosts a data archive holding vast atmospheric and earth observation data from sources including aircraft campaigns, satellites, pollution, automatic weather stations, climate models, etc. The CEDA archive currently holds 14 PB data, in over 250 millions of files, which makes it challenging to discover and access specific data. In order to manage this, it is necessary to use standard formats and descriptions about the data. This poster will explore best practice in data management in CEDA and show tools used to archive and share data.</p>


2016 ◽  
Vol 56 ◽  
pp. 2.1-2.34 ◽  
Author(s):  
W.-K. Tao ◽  
Y. N. Takayabu ◽  
S. Lang ◽  
S. Shige ◽  
W. Olson ◽  
...  

Abstract Yanai and coauthors utilized the meteorological data collected from a sounding network to present a pioneering work in 1973 on thermodynamic budgets, which are referred to as the apparent heat source (Q1) and apparent moisture sink (Q2). Latent heating (LH) is one of the most dominant terms in Q1. Yanai’s paper motivated the development of satellite-based LH algorithms and provided a theoretical background for imposing large-scale advective forcing into cloud-resolving models (CRMs). These CRM-simulated LH and Q1 data have been used to generate the look-up tables in Tropical Rainfall Measuring Mission (TRMM) LH algorithms. A set of algorithms developed for retrieving LH profiles from TRMM-based rainfall profiles is described and evaluated, including details concerning their intrinsic space–time resolutions. Included in the paper are results from a variety of validation analyses that define the uncertainty of the LH profile estimates. Also, examples of how TRMM-retrieved LH profiles have been used to understand the life cycle of the MJO and improve the predictions of global weather and climate models as well as comparisons with large-scale analyses are provided. Areas for further improvement of the TRMM products are discussed.


2012 ◽  
Vol 83 ◽  
pp. 188-197
Author(s):  
Ke Chang Lin ◽  
Yi Qing Ni ◽  
Xiao Wei Ye ◽  
Kai Yuan Wong

The data management system (DMS) is an essential part for long-term structural health monitoring (SHM) systems, which stores a pool of monitoring data for various applications. A robust database within a DMS is generally used to archive, manage and update life-cycle information of civil structures. However, many applications especially those to large-scale structures provide little support for visualizing the long-term monitoring data. This paper presents the development of an efficient visualized DMS by integrating 4-dimension (4D) model technology, nested relational database, and virtual reality (VR) technology. Spatial data of the 4D model are organized in nested tables, while real-time (temporal) monitoring data are linked to the 4D model. The model is then reconstructed by use of an OpenSceneGraph 3D engine. A user interface is developed to query the database and display the data via the 4D model. To demonstrate its efficiency, the proposed method has been applied to the Canton Tower, a supertall tower-like structure instrumented with a long-term SHM system


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Khin Mar Shwe

AbstractThere has been considerable growth in citizen science in academic contributions—researches by the paradigms of different disciplines and by the activities of citizens when undertaking data collecting, data processing, and data analyzing for disseminating results. These researches have proved the importance of data management practices—urgent to carry out the data life cycle. This study aims to analyze the scientific data contribution of citizen science under the data life cycle approach. It investigates 1,020 citizen science projects within the DataONE life cycle framework, which includes data management plan, data collection, data quality assurance, data documentation, data discovery, data integration, data preservation, and data analysis. As the major finding, the result of this study shows that the data management plan is developed with the leading of universities, which are the host of the majority of citizen science projects. The processes of data collection, data quality assurance, data documentation, data preservation, and data analysis are well organized with the systematic tool in the Information and Communications Technology (ICT) age; meanwhile the citizen science projects are cumulative. Data discovery has mostly linked with SciStarter (citizen science community site) and Facebook (social media). In data integration, it is found that most of the projects integrate with global observation. Finally, the study provides the process and procedure of citizen science data management in an effort to contribute the scientific data and the design of data life cycle to academic and governmental works.


IFLA Journal ◽  
2017 ◽  
Vol 43 (1) ◽  
pp. 5-21 ◽  
Author(s):  
Pierre-Yves Burgi ◽  
Eliane Blumer ◽  
Basma Makhlouf-Shabou

In this article, the authors report on an ongoing data life cycle management national project realized in Switzerland, with a major focus on long-term preservation. Based on an extensive document analysis as well as semi-structured interviews, the project aims at providing national services to respond to the most relevant researchers’ data life cycle management needs, which include: guidelines for establishing a data management plan, active data management solutions, long-term preservation storage options, training, and a single point of access and contact to get support. In addition to presenting the different working axes of the project, the authors describe a strategic management and lean startup template for developing new business models, which is key for building viable services.


Sign in / Sign up

Export Citation Format

Share Document