Integration of an Active Research Data System with a Data Repository to Streamline the Research Data Lifecyle: Pure-NOMAD Case Study

Research funders have introduced requirements that expect researchers to properly manage and publicly share their research data, and expect institutions to put in place services to support researchers in meeting these requirements. So far the general focus of these services and systems has been on addressing the final stages of the research data lifecycle (archive, share and re-use), rather than stages related to the active phase of the cycle (collect/create and analyse). As a result, full integration of active data management systems with data repositories is not yet the norm, making the streamlined transition of data from an active to a published and archived status an important challenge. In this paper we present the integration between an active data management system developed in-house (NOMAD) and Elsevier’s Pure data repository used at our institution, with the aim of offering a simple workflow to facilitate and promote the data deposit process. The integration results in a new data management and publication workflow that helps researchers to save time, minimize human errors related to manually handling files, and further promote data deposit together with collaboration across the institution.

Download Full-text

Data management planning and repository demands for qualitative research

KWALON ◽

10.5117/2016.021.001.006 ◽

2016 ◽

Vol 21 (1) ◽

Author(s):

René van Horik

Keyword(s):

Data Management ◽

Research Data ◽

Digital Data ◽

Data Repository ◽

Data Sets ◽

Data Repositories ◽

Management Planning ◽

Research Activities ◽

Trusted Data ◽

New Research

Summary Nowadays, research without a role for digital data and data analysis tools is barely possible. As a result, we see an increasing interest in research data management, as this enables the replication of research outcomes and the reuse of research data for new research activities. Data management planning outlines how to handle data, both during research and after the research is completed. Trusted data repositories are places were research data are archived and made available for the long term. This article covers the state of the art concerning data management and data repository demands with a focus on qualitative data sets.

Download Full-text

Research data management and services: Resources for different data practitioners

IASSIST Quarterly ◽

10.29173/iq995 ◽

2021 ◽

Vol 45 (3-4) ◽

Author(s):

Gilbert Mushi

Keyword(s):

Data Management ◽

Developed Countries ◽

Management Plan ◽

Research Data ◽

Data Repository ◽

Data Repositories ◽

Research Libraries ◽

Research Data Management ◽

Metadata Standards ◽

Training Resources

The emergence of data-driven research and demands for the establishment of Research Data Management (RDM) has created interest in academic institutions and research organizations globally. Some of the libraries especially in developed countries have started offering RDM services to their communities. Although lagging behind, some academic libraries in developing countries are at the stage of planning or implementing the service. However, the level of RDM awareness is very low among researchers, librarians and other data practitioners. The objective of this paper is to present available open resources for different data practitioners particularly researchers and librarians. It includes training resources for both researchers and librarians, Data Management Plan (DMP) tool for researchers; data repositories available for researchers to freely archive and share their research data to the local and international communities. A case study with a survey was conducted at the University of Dodoma to identify relevant RDM services so that librarians could assist researchers to make their data accessible to the local and international community. The study findings revealed a low level of RDM awareness among researchers and librarians. Over 50% of the respondent indicated their perceived knowledge as poor in the following RDM knowledge areas; DMP, data repository, long term digital preservation, funders RDM mandates, metadata standards describing data and general awareness of RDM. Therefore, this paper presents available open resources for different data practitioners to improve RDM knowledge and boost the confidence of academic and research libraries in establishing the service.

Download Full-text

The on-premise data sharing infrastructure e!DAL: Foster FAIR data for faster data acquisition

GigaScience ◽

10.1093/gigascience/giaa107 ◽

2020 ◽

Vol 9 (10) ◽

Cited By ~ 1

Author(s):

Daniel Arend ◽

Patrick König ◽

Astrid Junker ◽

Uwe Scholz ◽

Matthias Lange

Keyword(s):

Data Management ◽

Data Storage ◽

Best Practice ◽

Research Process ◽

Research Data ◽

Primary Data ◽

Data Repository ◽

Plant Genomics ◽

Quality Of Data

Abstract Background The FAIR data principle as a commitment to support long-term research data management is widely accepted in the scientific community. Although the ELIXIR Core Data Resources and other established infrastructures provide comprehensive and long-term stable services and platforms for FAIR data management, a large quantity of research data is still hidden or at risk of getting lost. Currently, high-throughput plant genomics and phenomics technologies are producing research data in abundance, the storage of which is not covered by established core databases. This concerns the data volume, e.g., time series of images or high-resolution hyper-spectral data; the quality of data formatting and annotation, e.g., with regard to structure and annotation specifications of core databases; uncovered data domains; or organizational constraints prohibiting primary data storage outside institional boundaries. Results To share these potentially dark data in a FAIR way and master these challenges the ELIXIR Germany/de.NBI service Plant Genomic and Phenomics Research Data Repository (PGP) implements a “bring the infrastructure to the data” approach, which allows research data to be kept in place and wrapped in a FAIR-aware software infrastructure. This article presents new features of the e!DAL infrastructure software and the PGP repository as a best practice on how to easily set up FAIR-compliant and intuitive research data services. Furthermore, the integration of the ELIXIR Authentication and Authorization Infrastructure (AAI) and data discovery services are introduced as means to lower technical barriers and to increase the visibility of research data. Conclusion The e!DAL software matured to a powerful and FAIR-compliant infrastructure, while keeping the focus on flexible setup and integration into existing infrastructures and into the daily research process.

Download Full-text

Implementation of an open adoption research data management system for clinical studies

BMC Research Notes ◽

10.1186/s13104-017-2566-0 ◽

2017 ◽

Vol 10 (1) ◽

Cited By ~ 4

Author(s):

Jan Müller ◽

Kirsten Ingmar Heiss ◽

Renate Oberhoffer

Keyword(s):

Data Management ◽

Clinical Studies ◽

Management System ◽

Research Data ◽

Data Management System ◽

Open Adoption ◽

Research Data Management ◽

Adoption Research

Download Full-text

GAMS – An infrastructure for the long-term preservation and publication of research data from the Humanities

Mitteilungen der Vereinigung Österreichischer Bibliothekarinnen und Bibliothekare ◽

10.31263/voebm.v71i1.1992 ◽

2018 ◽

Vol 71 (1) ◽

pp. 207-216 ◽

Cited By ~ 1

Author(s):

Johannes Hubert Stigler ◽

Elisabeth Steiner

Keyword(s):

System Architecture ◽

Academic Research ◽

Research Data ◽

Data Repository ◽

Data Repositories ◽

Domain Specific ◽

Preservation Policy ◽

Long Term Preservation ◽

Data Centres

Research data repositories and data centres are becoming more and more important as infrastructures in academic research. The article introduces the Humanities’ research data repository GAMS, starting with the system architecture to preservation policy and content policy. Challenges of data centres and repositories and the general and domain-specific approaches and solutions are outlined. Special emphasis lies on the sustainability and long-term perspective of such infrastructures, not only on the technical but above all on the organisational and financial level.

Download Full-text

Connecting Data Publication to the Research Workflow: A Preliminary Analysis

International Journal of Digital Curation ◽

10.2218/ijdc.v12i1.533 ◽

2017 ◽

Vol 12 (1) ◽

pp. 88-105 ◽

Cited By ~ 3

Author(s):

Sünje Dallmeier-Tiessen ◽

Varsha Khodiyar ◽

Fiona Murphy ◽

Amy Nurnberger ◽

Lisa Raymond ◽

...

Keyword(s):

Working Group ◽

Research Data ◽

Data Publishing ◽

Data Repository ◽

Research Activity ◽

Data Repositories ◽

Data Publication ◽

Loosely Coupled ◽

Definition Of ◽

Data Documentation

The data curation community has long encouraged researchers to document collected research data during active stages of the research workflow, to provide robust metadata earlier, and support research data publication and preservation. Data documentation with robust metadata is one of a number of steps in effective data publication. Data publication is the process of making digital research objects ‘FAIR’, i.e. findable, accessible, interoperable, and reusable; attributes increasingly expected by research communities, funders and society. Research data publishing workflows are the means to that end. Currently, however, much published research data remains inconsistently and inadequately documented by researchers. Documentation of data closer in time to data collection would help mitigate the high cost that repositories associate with the ingest process. More effective data publication and sharing should in principle result from early interactions between researchers and their selected data repository. This paper describes a short study undertaken by members of the Research Data Alliance (RDA) and World Data System (WDS) working group on Publishing Data Workflows. We present a collection of recent examples of data publication workflows that connect data repositories and publishing platforms with research activity ‘upstream’ of the ingest process. We re-articulate previous recommendations of the working group, to account for the varied upstream service components and platforms that support the flow of contextual and provenance information downstream. These workflows should be open and loosely coupled to support interoperability, including with preservation and publication environments. Our recommendations aim to stimulate further work on researchers’ views of data publishing and the extent to which available services and infrastructure facilitate the publication of FAIR data. We also aim to stimulate further dialogue about, and definition of, the roles and responsibilities of research data services and platform providers for the ‘FAIRness’ of research data publication workflows themselves.

Download Full-text

BEXIS2: A FAIR-aligned data management system for biodiversity, ecology and environmental data

Biodiversity Data Journal ◽

10.3897/bdj.9.e72901 ◽

2021 ◽

Vol 9 ◽

Author(s):

Javad Chamanara ◽

Jitendra Gaikwad ◽

Roman Gerlach ◽

Alsayed Algergawy ◽

Andreas Ostrowski ◽

...

Keyword(s):

Data Management ◽

Management System ◽

Research Data ◽

Environmental Data ◽

The Self ◽

Data Management System ◽

Maturity Model ◽

Ecological Data ◽

Self Assessment ◽

Research Data Management

Obtaining fit-to-use data associated with diverse aspects of biodiversity, ecology and environment is challenging since often it is fragmented, sub-optimally managed and available in heterogeneous formats. Recently, with the universal acceptance of the FAIR data principles, the requirements and standards of data publications have changed substantially. Researchers are encouraged to manage the data as per the FAIR data principles and ensure that the raw data, metadata, processed data, software, codes and associated material are securely stored and the data be made available with the completion of the research. We have developed BEXIS2 as an open-source community-driven web-based research data management system to support research data management needs of mid to large-scale research projects with multiple sub-projects and up to several hundred researchers. BEXIS2 is a modular and extensible system providing a range of functions to realise the complete data lifecycle from data structure design to data collection, data discovery, dissemination, integration, quality assurance and research planning. It is an extensible and customisable system that allows for the development of new functions and customisation of its various components from database schemas to the user interface layout, elements and look and feel. During the development of BEXIS2, we aimed to incorporate key aspects of what is encoded in FAIR data principles. To investigate the extent to which BEXIS2 conforms to these principles, we conducted the self-assessment using the FAIR indicators, definitions and criteria provided in the FAIR Data Maturity Model. Even though the FAIR data maturity model is developed initially to judge the conformance of datasets, the self-assessment results indicated that BEXIS2 remarkably conforms and supports FAIR indicators. BEXIS2 strongly conforms to the indicators Findability and Accessibility. The indicator Interoperability is moderately supported as of now; however, for many of the lesssupported facets, we have concrete plans for improvement. Reusability (as defined by the FAIR data principles) is partially achieved. This paper also illustrates community deployment examples of the BEXIS2 instances as success stories to exemplify its capacity to meet the biodiversity and ecological data management needs of differently sized projects and serve as an organisational research data management system.

Download Full-text

A HIGHLY SCALABLE DATA MANAGEMENT SYSTEM FOR POINT CLOUD AND FULL WAVEFORM LIDAR DATA

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b4-2020-507-2020 ◽

2020 ◽

Vol XLIII-B4-2020 ◽

pp. 507-512

Author(s):

A. V. Vo ◽

D. F. Laefer ◽

M. Trifkovic ◽

C. N. L. Hewage ◽

M. Bertolotto ◽

...

Keyword(s):

Data Management ◽

Management System ◽

Web Application ◽

Point Cloud ◽

Web Applications ◽

Distributed Database ◽

Data Management System ◽

Data Repository ◽

Lidar Data ◽

Full Waveform

Abstract. The massive amounts of spatio-temporal information often present in LiDAR data sets make their storage, processing, and visualisation computationally demanding. There is an increasing need for systems and tools that support all the spatial and temporal components and the three-dimensional nature of these datasets for effortless retrieval and visualisation. In response to these needs, this paper presents a scalable, distributed database system that is designed explicitly for retrieving and viewing large LiDAR datasets on the web. The ultimate goal of the system is to provide rapid and convenient access to a large repository of LiDAR data hosted in a distributed computing platform. The system is composed of multiple, share-nothing nodes operating in parallel. Namely, each node is autonomous and has a dedicated set of processors and memory. The nodes communicate with each other via an interconnected network. The data management system presented in this paper is implemented based on Apache HBase, a distributed key-value datastore within the Hadoop eco-system. HBase is extended with new data encoding and indexing mechanisms to accommodate both the point cloud and the full waveform components of LiDAR data. The data can be consumed by any desktop or web application that communicates with the data repository using the HTTP protocol. The communication is enabled by a web servlet. In addition to the command line tool used for administration tasks, two web applications are presented to illustrate the types of user-facing applications that can be coupled with the data system.

Download Full-text

NFDI4Chem - Towards a National Research Data Infrastructure for Chemistry in Germany

Research Ideas and Outcomes ◽

10.3897/rio.6.e55852 ◽

2020 ◽

Vol 6 ◽

Cited By ~ 3

Author(s):

Christoph Steinbeck ◽

Oliver Koepler ◽

Felix Bach ◽

Sonja Herres-Pawlis ◽

Nicole Jung ◽

...

Keyword(s):

Data Management ◽

Data Science ◽

Open Data ◽

Research Data ◽

Data Standards ◽

Data Repositories ◽

Data Infrastructure ◽

Research Data Management ◽

Wide Range ◽

Chemistry Community

The vision of NFDI4Chem is the digitalisation of all key steps in chemical research to support scientists in their efforts to collect, store, process, analyse, disclose and re-use research data. Measures to promote Open Science and Research Data Management (RDM) in agreement with the FAIR data principles are fundamental aims of NFDI4Chem to serve the chemistry community with a holistic concept for access to research data. To this end, the overarching objective is the development and maintenance of a national research data infrastructure for the research domain of chemistry in Germany, and to enable innovative and easy to use services and novel scientific approaches based on re-use of research data. NFDI4Chem intends to represent all disciplines of chemistry in academia. We aim to collaborate closely with thematically related consortia. In the initial phase, NFDI4Chem focuses on data related to molecules and reactions including data for their experimental and theoretical characterisation. This overarching goal is achieved by working towards a number of key objectives: Key Objective 1: Establish a virtual environment of federated repositories for storing, disclosing, searching and re-using research data across distributed data sources. Connect existing data repositories and, based on a requirements analysis, establish domain-specific research data repositories for the national research community, and link them to international repositories. Key Objective 2: Initiate international community processes to establish minimum information (MI) standards for data and machine-readable metadata as well as open data standards in key areas of chemistry. Identify and recommend open data standards in key areas of chemistry, in order to support the FAIR principles for research data. Finally, develop standards, if there is a lack. Key Objective 3: Foster cultural and digital change towards Smart Laboratory Environments by promoting the use of digital tools in all stages of research and promote subsequent Research Data Management (RDM) at all levels of academia, beginning in undergraduate studies curricula. Key Objective 4: Engage with the chemistry community in Germany through a wide range of measures to create awareness for and foster the adoption of FAIR data management. Initiate processes to integrate RDM and data science into curricula. Offer a wide range of training opportunities for researchers. Key Objective 5: Explore synergies with other consortia and promote cross-cutting development within the NFDI. Key Objective 6: Provide a legally reliable framework of policies and guidelines for FAIR and open RDM.

Download Full-text

CaosDB—Research Data Management for Complex, Changing, and Automated Research Workflows

Data ◽

10.3390/data4020083 ◽

2019 ◽

Vol 4 (2) ◽

pp. 83 ◽

Cited By ~ 2

Author(s):

Timm Fitschen ◽

Alexander Schlemmer ◽

Daniel Hornung ◽

Henrik tom Wörden ◽

Ulrich Parlitz ◽

...

Keyword(s):

Data Management ◽

Data Model ◽

Query Language ◽

Research Data ◽

Data Management System ◽

Biomedical Sciences ◽

Seamless Integration ◽

Research Data Management ◽

The Status ◽

Data Acquisition And Processing

We present CaosDB, a Research Data Management System (RDMS) designed to ensure seamless integration of inhomogeneous data sources and repositories of legacy data in a FAIR way. Its primary purpose is the management of data from biomedical sciences, both from simulations and experiments during the complete research data lifecycle. An RDMS for this domain faces particular challenges: research data arise in huge amounts, from a wide variety of sources, and traverse a highly branched path of further processing. To be accepted by its users, an RDMS must be built around workflows of the scientists and practices and thus support changes in workflow and data structure. Nevertheless, it should encourage and support the development and observation of standards and furthermore facilitate the automation of data acquisition and processing with specialized software. The storage data model of an RDMS must reflect these complexities with appropriate semantics and ontologies while offering simple methods for finding, retrieving, and understanding relevant data. We show how CaosDB responds to these challenges and give an overview of its data model, the CaosDB Server and its easy-to-learn CaosDB Query Language. We briefly discuss the status of the implementation, how we currently use CaosDB, and how we plan to use and extend it.

Download Full-text