A Lightweight, Microservice-Based Research Data Management Architecture for Large Scale Environmental Datasets

Author(s):  
Alexander Götz ◽  
Johannes Munke ◽  
Mohamad Hayek ◽  
Hai Nguyen ◽  
Tobias Weber ◽  
...  

<p>LTDS ("Let the Data Sing") is a lightweight, microservice-based Research Data Management (RDM) architecture which augments previously isolated data stores ("data silos") with FAIR research data repositories. The core components of LTDS include a metadata store as well as dissemination services such as a landing page generator and an OAI-PMH server. As these core components were designed to be independent from one another, a central control system has been implemented, which handles data flows between components. LTDS is developed at LRZ (Leibniz Supercomputing Centre, Garching, Germany), with the aim of allowing researchers to make massive amounts of data (e.g. HPC simulation results) on different storage backends FAIR. Such data can often, owing to their size, not easily be transferred into conventional repositories. As a result, they remain "hidden", while only e.g. final results are published - a massive problem for reproducibility of simulation-based science. The LTDS architecture uses open-source and standardized components and follows best practices in FAIR data (and metadata) handling. We present our experience with our first three use cases: the Alpine Environmental Data Analysis Centre (AlpEnDAC) platform, the ClimEx dataset with 400TB of climate ensemble simulation data, and the Virtual Water Value (ViWA) hydrological model ensemble.</p>

2021 ◽  
Vol 9 ◽  
Author(s):  
Javad Chamanara ◽  
Jitendra Gaikwad ◽  
Roman Gerlach ◽  
Alsayed Algergawy ◽  
Andreas Ostrowski ◽  
...  

Obtaining fit-to-use data associated with diverse aspects of biodiversity, ecology and environment is challenging since often it is fragmented, sub-optimally managed and available in heterogeneous formats. Recently, with the universal acceptance of the FAIR data principles, the requirements and standards of data publications have changed substantially. Researchers are encouraged to manage the data as per the FAIR data principles and ensure that the raw data, metadata, processed data, software, codes and associated material are securely stored and the data be made available with the completion of the research. We have developed BEXIS2 as an open-source community-driven web-based research data management system to support research data management needs of mid to large-scale research projects with multiple sub-projects and up to several hundred researchers. BEXIS2 is a modular and extensible system providing a range of functions to realise the complete data lifecycle from data structure design to data collection, data discovery, dissemination, integration, quality assurance and research planning. It is an extensible and customisable system that allows for the development of new functions and customisation of its various components from database schemas to the user interface layout, elements and look and feel. During the development of BEXIS2, we aimed to incorporate key aspects of what is encoded in FAIR data principles. To investigate the extent to which BEXIS2 conforms to these principles, we conducted the self-assessment using the FAIR indicators, definitions and criteria provided in the FAIR Data Maturity Model. Even though the FAIR data maturity model is developed initially to judge the conformance of datasets, the self-assessment results indicated that BEXIS2 remarkably conforms and supports FAIR indicators. BEXIS2 strongly conforms to the indicators Findability and Accessibility. The indicator Interoperability is moderately supported as of now; however, for many of the lesssupported facets, we have concrete plans for improvement. Reusability (as defined by the FAIR data principles) is partially achieved. This paper also illustrates community deployment examples of the BEXIS2 instances as success stories to exemplify its capacity to meet the biodiversity and ecological data management needs of differently sized projects and serve as an organisational research data management system.


2020 ◽  
Vol 6 ◽  
Author(s):  
Christoph Steinbeck ◽  
Oliver Koepler ◽  
Felix Bach ◽  
Sonja Herres-Pawlis ◽  
Nicole Jung ◽  
...  

The vision of NFDI4Chem is the digitalisation of all key steps in chemical research to support scientists in their efforts to collect, store, process, analyse, disclose and re-use research data. Measures to promote Open Science and Research Data Management (RDM) in agreement with the FAIR data principles are fundamental aims of NFDI4Chem to serve the chemistry community with a holistic concept for access to research data. To this end, the overarching objective is the development and maintenance of a national research data infrastructure for the research domain of chemistry in Germany, and to enable innovative and easy to use services and novel scientific approaches based on re-use of research data. NFDI4Chem intends to represent all disciplines of chemistry in academia. We aim to collaborate closely with thematically related consortia. In the initial phase, NFDI4Chem focuses on data related to molecules and reactions including data for their experimental and theoretical characterisation. This overarching goal is achieved by working towards a number of key objectives: Key Objective 1: Establish a virtual environment of federated repositories for storing, disclosing, searching and re-using research data across distributed data sources. Connect existing data repositories and, based on a requirements analysis, establish domain-specific research data repositories for the national research community, and link them to international repositories. Key Objective 2: Initiate international community processes to establish minimum information (MI) standards for data and machine-readable metadata as well as open data standards in key areas of chemistry. Identify and recommend open data standards in key areas of chemistry, in order to support the FAIR principles for research data. Finally, develop standards, if there is a lack. Key Objective 3: Foster cultural and digital change towards Smart Laboratory Environments by promoting the use of digital tools in all stages of research and promote subsequent Research Data Management (RDM) at all levels of academia, beginning in undergraduate studies curricula. Key Objective 4: Engage with the chemistry community in Germany through a wide range of measures to create awareness for and foster the adoption of FAIR data management. Initiate processes to integrate RDM and data science into curricula. Offer a wide range of training opportunities for researchers. Key Objective 5: Explore synergies with other consortia and promote cross-cutting development within the NFDI. Key Objective 6: Provide a legally reliable framework of policies and guidelines for FAIR and open RDM.


2020 ◽  
Author(s):  
Ionut Iosifescu-Enescu ◽  
Gian-Kasper Plattner ◽  
Dominik Haas-Artho ◽  
David Hanimann ◽  
Konrad Steffen

<p>EnviDat – www.envidat.ch – is the institutional Environmental Data portal of the Swiss Federal Institute for Forest, Snow and Landscape Research WSL. Launched in 2012 as a small project to explore possible solutions for a generic WSL-wide data portal, it has since evolved into a strategic initiative at the institutional level tackling issues in the broad areas of Open Research Data and Research Data Management. EnviDat demonstrates our commitment to accessible research data in order to advance environmental science.</p><p>EnviDat actively implements the FAIR (Findability, Accessibility, Interoperability and Reusability) principles. Core EnviDat research data management services include the registration, integration and hosting of quality-controlled, publication-ready data from a wide range of terrestrial environmental systems, in order to provide unified access to WSL’s environmental monitoring and research data. The registration of research data in EnviDat results in the formal publication with permanent identifiers (EnviDat own PIDs as well as DOIs) and the assignment of appropriate citation information.</p><p>Innovative EnviDat features that contribute to the global system of modern documentation and exchange of scientific information include: (i) a DataCRediT mechanism designed for specifying data authorship (Collection, Validation, Curation, Software, Publication, Supervision), (ii) the ability to enhance published research data with additional resources, such as model codes and software, (iii) in-depth documentation of data provenance, e.g., through a dataset description as well as related publications and datasets, (iv) unambiguous and persistent identifiers for authors (ORCIDs) and, in the medium-term, (v) a decentralized “peer-review” data publication process for safeguarding the quality of available datasets in EnviDat.</p><p>More recently, the EnviDat development has been moving beyond the set of core features expected from a research data management portal with a built-in publishing repository. This evolution is driven by the diverse set of researchers’ requirements for a specialized environmental data portal that formally cuts across the five WSL research themes forest, landscape, biodiversity, natural hazards, and snow and ice, and that concerns all research units and central IT services.</p><p>Examples of such recent requirements for EnviDat include: (i) immediate access to data collected by automatic measurements stations, (ii) metadata and data visualization on charts and maps, with geoservices for large geodatasets, and (iii) progress towards linked open data (LOD) with curated vocabularies and semantics for the environmental domain.</p><p>There are many challenges associated with the developments mentioned above. However, they also represent opportunities for further improving the exchange of scientific information in the environmental domain. Especially geospatial technologies have the potential to become a central element for any specialized environmental data portal, triggering the convergence between publishing repositories and geoportals. Ultimately, these new requirements demonstrate the raised expectations that institutions and researchers have towards the future capabilities of research data portals and repositories in the environmental domain. With EnviDat, we are ready to take up these challenges over the years to come.</p>


2019 ◽  
Vol 39 (06) ◽  
pp. 308-314
Author(s):  
Mahdi Salah Mohammed ◽  
Rafea Ibrahim

Research emphasises the fundamental role of research data management (RDM) in enhancing academic and scientific research. This paper intended to examine RDM in Iraqi Universities, identify the current challenges of RDM and propose influential RDM practices. Data collection employed a self-administered questionnaires distributed to 155 postgraduate students and 20 faculty members from five universities in Iraq. Research findings revealed that there is a lack of proper RDM. Postgraduate students and researchers were managing their own research data. Main challenges of maintaining a good RDM involve lack of guidelines on effective RDM practices, insufficient of adequate human resources, technological obsolescence, insecure and inefficient infrastructure, lack of financial resources, absence of research data management policies and lack of support by institutional authorities and researchers negatively influenced on research data management. Postgraduate students and researchers recommend building research data repositories and collaboration with other universities and research organisations.


Author(s):  
Adi Alter ◽  
Eddie Neuwirth ◽  
Dani Guzman

Academic libraries are looking for ways to grow their involvement in and scale-up their support for research activities. The successful transition depends to a large extent on the library's ability to systematically manage data, break down information silos and unify workflows across the library, research office and researchers. Data repositories are at the heart of this challenge, yet often institutional repositories are not built to address the needs of modern research data management due to inability to store all research assets, lack of consistent data models, and insufficient workflows. This chapter will present a new approach to research data management that ensures visibility of research output and data, data coherency, and compliance with open access standards. The authors will discuss a ‘Next-Generation Research Repository' that spans multiple data management activities, including automated data capture, metadata enrichment, dissemination, compliance-related workflows, automated publication to scholarly profiles, as well as open integration with the research ecosystem.


2020 ◽  
Author(s):  
Michael Finkel ◽  
Albrecht Baur ◽  
Tobias K.D. Weber ◽  
Karsten Osenbrück ◽  
Hermann Rügner ◽  
...  

<p>The consistent management of research data is crucial for the success of long-term and large-scale collaborative research. Research data management is the basis for efficiency, continuity, and quality of the research, as well as for maximum impact and outreach, including the long-term publication of data and their accessibility. Both funding agencies and publishers increasingly require this long term and open access to research data. Joint environmental studies typically take place in a fragmented research landscape of diverse disciplines; researchers involved typically show a variety of attitudes towards and previous experiences with common data policies, and the extensive variety of data types in interdisciplinary research poses particular challenges for collaborative data management.We present organizational measures, data and metadata management concepts, and technical solutions to form a flexible research data management framework that allows for efficiently sharing the full range of data and metadata among all researchers of the project, and smooth publishing of selected data and data streams to publicly accessible sites. The concept is built upon data type-specific and hierarchical metadata using a common taxonomy agreed upon by all researchers of the project. The framework’s concept has been developed along the needs and demands of the scientists involved, and aims to minimize their effort in data management, which we illustrate from the researchers’ perspective describing their typical workflow from the generation and preparation of data and metadata to the long-term preservation of data including their metadata.</p>


2018 ◽  
Vol 42 (2) ◽  
pp. 1-16
Author(s):  
Cristina Ribeiro ◽  
João Rocha da Silva ◽  
João Aguiar Castro ◽  
Ricardo Carvalho Amorim ◽  
João Correia Lopes ◽  
...  

Research datasets include all kinds of objects, from web pages to sensor data, and originate in every domain. Concerns with data generated in large projects and well-funded research areas are centered on their exploration and analysis. For data in the long tail, the main issues are still how to get data visible, satisfactorily described, preserved, and searchable. Our work aims to promote data publication in research institutions, considering that researchers are the core stakeholders and need straightforward workflows, and that multi-disciplinary tools can be designed and adapted to specific areas with a reasonable effort. For small groups with interesting datasets but not much time or funding for data curation, we have to focus on engaging researchers in the process of preparing data for publication, while providing them with measurable outputs. In larger groups, solutions have to be customized to satisfy the requirements of more specific research contexts. We describe our experience at the University of Porto in two lines of enquiry. For the work with long-tail groups we propose general-purpose tools for data description and the interface to multi-disciplinary data repositories. For areas with larger projects and more specific requirements, namely wind infrastructure, sensor data from concrete structures and marine data, we define specialized workflows. In both cases, we present a preliminary evaluation of results and an estimate of the kind of effort required to keep the proposed infrastructures running.  The tools available to researchers can be decisive for their commitment. We focus on data preparation, namely on dataset organization and metadata creation. For groups in the long tail, we propose Dendro, an open-source research data management platform, and explore automatic metadata creation with LabTablet, an electronic laboratory notebook. For groups demanding a domain-specific approach, our analysis has resulted in the development of models and applications to organize the data and support some of their use cases. Overall, we have adopted ontologies for metadata modeling, keeping in sight metadata dissemination as Linked Open Data.


2019 ◽  
Author(s):  
Heather Andrews ◽  
Marta Teperek ◽  
Jasper van Dijck ◽  
Kees den Heijer ◽  
Robbert Eggermont ◽  
...  

The Data Stewardship project is a new initiative from the Delft University of Technology (TU Delft) in the Netherlands. Its aim is to create mature working practices and policies regarding research data management across all TU Delft faculties. The novelty of this project relies on having a dedicated person, the so-called ‘Data Steward’, embedded in each faculty to approach research data management from a more discipline-specific perspective. It is within this framework that a research data management survey was carried out at the faculties that had a Data Steward in place by July 2018. The goal was to get an overview of the general data management practices, and use its results as a benchmark for the project. The total response rate was 11 to 37% depending on the faculty. Overall, the results show similar trends in all faculties, and indicate lack of awareness regarding different data management topics such as automatic data backups, data ownership, relevance of data management plans, awareness of FAIR data principles and usage of research data repositories. The results also show great interest towards data management, as more than ~80% of the respondents in each faculty claimed to be interested in data management training and wished to see the summary of survey results. Thus, the survey helped identified the topics the Data Stewardship project is currently focusing on, by carrying out awareness campaigns and providing training at both university and faculty levels.


Author(s):  
Frank Oliver Glöckner ◽  
Michael Diepenbroek

Background: The NFDI process in Germany The digital revolution is fundamentally transforming research data and methods. Mastering this transformation poses major challenges for stakeholders in the domains of science and policy. The process of digitalisation creates immense opportunities, but it must be structured proactively. To this end, the establishment of effective governance mechanisms for research data management (RDM) is of fundamental importance and will be one key driver for successful research and innovation in the future. In 2016 the German Council for Information Infrastructures (RfII) recommended the establishment of a “Nationale Forschungsdateninfrastruktur” (National Research Data Infrastructure, or NFDI), which will serve as the backbone for research data management in Germany. The NFDI should be implemented as a dynamic national collaborative network that grows over time and is composed of various specialised nodes (consortia). The talk will provide a short overview of the status and objectives of the NFDI. It will commence with a description of the goals of the NFDI4BioDiversity consortium which was established for the targeted support of the biodiversity community with data management. The NFDI4BioDiversity Consortium: Biodiversity, Ecology & Environmental Data Biodiversity is more than just the diversity of living species. It includes genetic diversity, functional diversity, interactions and the diversity of whole ecosystems. Mankind continuous to dramatically impact the earth’s ecosystem: species dying-out genetic diversity as well as whole ecosystems are endangered or already lost. Next to the loss of charismatic species and conspicuous change in ecosystems, we are experiencing a quiet loss of common species which together has captured high level policy attention. This has impacts on vital ecosystem services that provide the foundation of human well-being. A general understanding of the status, trends and drivers of the biodiversity on earth is urgently needed to devise conservation responses. Besides the fact that data are often scattered across repositories or not accessible at all, the main challenge for integrative studies is the heterogeneity of measurements and observation types, combined with a substantial lack of documentation. This leads to inconsistencies and incompatibilities in data structures, interfaces and semantics and thus hinders the re-usability of data to answer scientifically and socially relevant questions. Synthesis as well as hypothesis generation will only proceed when data are compliant with the FAIR (Findable, Accessible, Interoperable and Re-usable) data principles. Over the last five years these key challenges have been addressed by the DFG funded German Federation for Biological Data (GFBio) project. GFBio encompasses technical, organizational, financial, and community aspects to raise awareness for research data management in biodiversity research and environmental sciences. To foster sustainability across this federated infrastructure the not-for-profit association “Gesellschaft für biologische Daten e.V. (GFBio e.V.)” has been set up in 2016 as an independent legal entity. NFDI4BioDiversity builds on the experience and established user community of GFBio and takes advantage of GFBio e.V. GFBio already comprises data centers for nucleotide and environmental data as well as the seven well-established data centers of Germany´s largest natural science research facilities, museums and world’s most diverse microbiological resource collection. The network is now extended to include the network of botanical gardens and the largest collections of crop plants and their wild relatives. All collections together host more than 75% of all museum objects (150 millions) in Germany and >80% of all described microbial species. They represent the biggest and internationally-relevant data repositories. NFDI4BioDiversity will extend its community engagement at the science-society-policy interface by including farm animal biology, crop sciences, biodiversity monitoring and citizen science, as well as systems biology encompassing world-leading tools and collections for FAIR data management. Partners of the German Network for Bioinformatics Infrastructure (de.NBI) provide large scale data analysis and storage capacities in the cloud, as well as extensive continuous training and education experiences. Dedicated personnel will be responsible for the mutual exchange of data and experiences with NFDI4Life-Umbrella,NFDI4Earth, NFDI4Chem, NFDI4Health and beyond. As digitalization and liberation of data proceeds, NFDI4BioDiversity will foster community standards, quality management and documentation as well as the harmonization and synthesis of heterogeneous data. It will pro-actively engage the user community to build a coordinated data management platform for all types of biodiversity data as a dedicated added value service for all users of NFDI.


2020 ◽  
Vol 15 (1) ◽  
pp. 18
Author(s):  
Yingshen Huang ◽  
Andrew Cox ◽  
Laura Sbaffi

On April 2, 2018, the State Council of China formally released a national research data management (RDM) policy “Measures for Managing Scientific Data”. Literature review shows that university libraries have played an important role in supporting Research Data Management at an institutional level in countries in North America, Europe and Australasia. The aim of this paper is to capture the current status of RDM in Chinese universities, in particular how university libraries have involved in taking the agenda forward. This paper uses mixed methods: a website analysis of university policies and services; a questionnaire for university librarians; and semi-structured interviews. Findings from website analysis and questionnaires indicate that RDS at a local level in Chinese Universities are in their infancy. On the whole there is more evidence of activity in developing data repositories than support services. Despite the existence of a national policy there remain significant barriers to further service development, such as the lag in the creation of local policy, insufficient funding for technical infrastructure, shortages of staff skills in data curation, and language barriers to international data sharing and open science. RDS in Chinese university libraries are still lagging behind the English-speaking countries and Europe.


Sign in / Sign up

Export Citation Format

Share Document