ENVRI knowledge base: A community knowledge base for research, innovation and society

The Horizon 2020 ENVRI-FAIR project brings together 14 European environmental research infrastructures (ENVRI) to develop solutions to improve the FAIRness of their data and services, and eventually to connect the ENVRI community with the European Open Science Cloud (EOSC). It is thus essential to share the reusable solutions while RIs are tackling common challenges in improving their FAIRness, and to continually assess the FAIRness of ENVRI (meta)data services as they are developed.&#160; The FAIRness assessment is, however, far from trivial. On the one hand, the task relies on gathering the required information from RIs, e.g. information about the metadata and data repositories operated by RIs, the kind of metadata standards repositories implement, the use of persistent identifier systems. Such information is gathered using questionnaires whose processing can be time-consuming. On the other hand, to enable efficient querying, processing and analysis, the information needs to be machine-actionable and curated in a knowledge base. Besides acting as a general resource to learn about RIs, the ENVRI knowledge base (KB) supports RI managers in identifying current gaps in their RI&#8217;s implementation of the FAIR Data Principles. For instance, a RI manager can interrogate the KB to discover whether a data repository of the RI uses a persistent identifier service or if the repository is certified according to some scheme. Having identified a gap, the KB can support the RI manager in exploring the solutions implemented by other RIs. By linking questionnaire information to training resources, the KB also supports the discovery of materials that provide hands-on demonstrations for how state-of-the-art technologies can be used and implemented to address FAIR requirements. For instance, if a RI manager discovers that the metadata of one of the RI&#8217;s repositories does not include machine-readable provenance, the ENVRI KB can inform the manager about available training material demonstrating how the PROV Ontology can be used to implement machine-readable provenance in systems. Such demonstrators can be highly actionable as they can be implemented in Jupyter and executed with services such as mybinder. Thus, the KB can seamlessly integrate the state of FAIR implementation in RIs with actionable training material and is therefore a resource that is expected to contribute substantially to improving ENVRI FAIRness. The ENVRI KB is implemented using the W3C Recommendations developed within the Semantic Web Activity, specifically RDF, OWL, and SPARQL. To effectively expose its content to RI communities, ranging from scientists to managers, and other stakeholders, the ENVRI-FAIR KB will need a customisable user interface for context-aware information discovery, visualisation, and content update. The current prototype can be accessed: kb.oil-e.net.&#160;

Download Full-text

Data matters: how earth and environmental scientists determine data relevance and reusability

Collection and Curation ◽

10.1108/cc-11-2018-0023 ◽

2019 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Angela P. Murillo

Keyword(s):

Information Needs ◽

Environmental Science ◽

Data Reuse ◽

Data Repository ◽

Phase 2 ◽

Data Repositories ◽

Content Type ◽

Data Record ◽

Information Professionals ◽

Practical Implications

Purpose The purpose of this study is to examine the information needs of earth and environmental scientists regarding how they determine data reusability and relevance. Additionally, this study provides strategies for the development of data collections and recommendations for data management and curation for information professionals working alongside researchers. Design/methodology/approach This study uses a multi-phase mixed-method approach. The test environment is the DataONE data repository. Phase 1 includes a qualitative and quantitative content analysis of deposited data. Phase 2 consists of a quasi-experiment think-aloud study. This paper reports mainly on Phase 2. Findings This study identifies earth and environmental scientists’ information needs to determine data reusability. The findings include a need for information regarding research methods, instruments and data descriptions when determining data reusability, as well as a restructuring of data abstracts. Additional findings include reorganizing of the data record layout and data citation information. Research limitations/implications While this study was limited to earth and environmental science data, the findings provide feedback for scientists in other disciplines, as earth and environmental science is a highly interdisciplinary scientific domain that pulls from many disciplines, including biology, ecology and geology, and additionally there has been a significant increase in interdisciplinary research in many scientific fields. Practical implications The practical implications include concrete feedback to data librarians, data curators and repository managers, as well as other information professionals as to the information needs of scientists reusing data. The suggestions could be implemented to improve consultative practices when working alongside scientists regarding data deposition and data creation. These suggestions could improve policies for data repositories through direct feedback from scientists. These suggestions could be implemented to improve how data repositories are created and what should be considered mandatory information and secondary information to improve the reusability of data. Social implications By examining the information needs of earth and environmental scientists reusing data, this study provides feedback that could change current practices in data deposition, which ultimately could improve the potentiality of data reuse. Originality/value While there has been research conducted on data sharing and reuse, this study provides more detailed granularity regarding what information is needed to determine reusability. This study sets itself apart by not focusing on social motivators and demotivators, but by focusing on information provided in a data record.

Download Full-text

A study of the roles of metadata standard and data repository in science, technology, engineering and mathematics researchers' data reuse

Online Information Review ◽

10.1108/oir-09-2020-0431 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Youngseek Kim

Keyword(s):

Data Reuse ◽

Data Repository ◽

Research Model ◽

Data Repositories ◽

Scientific Communities ◽

Metadata Standard ◽

Content Type ◽

Science Technology ◽

Metadata Standards ◽

And Mathematics

PurposeThis research investigates how the availabilities of both metadata standards and data repositories influence researchers' data reuse intentions either directly or indirectly as mediated by the norms of data reuse and their attitudes toward data reuse.Design/methodology/approachThe theory of planned behavior (TPB) was employed to develop the research model of researchers' data reuse intentions, focusing on the roles of metadata standards, data repositories and norms of data reuse. The proposed research model was evaluated using the structural equation modeling (SEM) method based on the survey responses received from 811 STEM (science, technology, engineering and mathematics) researchers in the United States.FindingsThis research found that the availabilities of both metadata standards and data repositories significantly affect STEM researchers' norm of data reuse, which influences their data reuse intentions as mediated by their attitudes toward data reuse. This research also found that both the availability of data repositories and the norm of data reuse have a direct influence on data reuse intentions and that norm of data reuse significantly increases the effect of attitude toward data reuse on data reuse intention as a moderator.Research limitations/implicationsThe modified model of TPB provides a new perspective in apprehending the roles of resource facilitating conditions such as the availabilities of metadata standards and data repositories in an individual's attitude, norm and their behavioral intention to conduct a certain behavior.Practical implicationsThis study suggests that scientific communities need to develop more supportive metadata standards and data repositories by considering their roles in enhancing the community norm of data reuse, which eventually lead to data reuse behaviors.Originality/valueThis study sheds light on the mechanism of metadata standard and data repository in researchers' data reuse behaviors through their community norm of data reuse; this can help scientific communities and academic institutions to better support researchers in their data sharing and reuse behaviors.Peer reviewThe peer review history for this article is available at: https://publons.com/publon/10.1108/OIR-09-2020-0431

Download Full-text

Dataset search in biodiversity research: Do metadata in data repositories reflect scholarly information needs?

PLoS ONE ◽

10.1371/journal.pone.0246099 ◽

2021 ◽

Vol 16 (3) ◽

pp. e0246099

Author(s):

Felicitas Löffler ◽

Valentin Wesp ◽

Birgitta König-Ries ◽

Friederike Klan

Keyword(s):

Information Needs ◽

Primary Source ◽

Large Data ◽

Heterogeneous Data ◽

Data Reuse ◽

Data Types ◽

Data Repositories ◽

Biodiversity Research ◽

User Query ◽

Metadata Standards

The increasing amount of publicly available research data provides the opportunity to link and integrate data in order to create and prove novel hypotheses, to repeat experiments or to compare recent data to data collected at a different time or place. However, recent studies have shown that retrieving relevant data for data reuse is a time-consuming task in daily research practice. In this study, we explore what hampers dataset retrieval in biodiversity research, a field that produces a large amount of heterogeneous data. In particular, we focus on scholarly search interests and metadata, the primary source of data in a dataset retrieval system. We show that existing metadata currently poorly reflect information needs and therefore are the biggest obstacle in retrieving relevant data. Our findings indicate that for data seekers in the biodiversity domain environments, materials and chemicals, species, biological and chemical processes, locations, data parameters and data types are important information categories. These interests are well covered in metadata elements of domain-specific standards. However, instead of utilizing these standards, large data repositories tend to use metadata standards with domain-independent metadata fields that cover search interests only to some extent. A second problem are arbitrary keywords utilized in descriptive fields such as title, description or subject. Keywords support scholars in a full text search only if the provided terms syntactically match or their semantic relationship to terms used in a user query is known.

Download Full-text

The Red Queen in the Repository

International Journal of Digital Curation ◽

10.2218/ijdc.v15i1.646 ◽

2020 ◽

Vol 15 (1) ◽

pp. 16

Author(s):

Joakim Philipson

Keyword(s):

General Purpose ◽

Research Data ◽

Data Repository ◽

Red Queen ◽

Changing Environment ◽

Data Repositories ◽

Metadata Standards ◽

And Migration ◽

The Red Queen

One of the grand curation challenges is to secure metadata quality in the ever-changing environment of metadata standards and file formats. As the Red Queen tells Alice in Through the Looking-Glass: “Now, here, you see, it takes all the running you can do, to keep in the same place.” That is, there is some “running” needed to keep metadata records in a research data repository fit for long-term use and put in place. One of the main tools of adaptation and keeping pace with the evolution of new standards, formats – and versions of standards in this ever-changing environment are validation schemas. Validation schemas are mainly seen as methods of checking data quality and fitness for use, but are also important for long-term preservation. We might like to think that our present (meta)data standards and formats are made for eternity, but in reality we know that standards evolve, formats change (some even become obsolete with time), and so do our needs for storage, searching and future dissemination for re-use. Eventually, we come to a point where transformation of our archival records and migration to other formats will be necessary. This could also mean that even if the AIPs, the Archival Information Packages stay the same in storage, the DIPs, the Dissemination Information Packages that we want to extract from the archive are subject to change of format. Further, in order for archival information packages to be self-sustainable, as required in the OAIS model, it is important to take interdependencies between individual files in the information packages into account. This should be done already by the time of ingest and validation of the SIPs, the Submission Information Packages, and along the line at different points of necessary transformation/migration (from SIP to AIP, from AIP to DIP etc.), in order to counter obsolescence. This paper investigates possible validation errors and missing elements in metadata records from three general purpose, multidisciplinary research data repositories – Figshare, Harvard’s Dataverse and Zenodo, and explores the potential effects of these errors on future transformation to AIPs and migration to other formats within a digital archive.

Download Full-text

Research data management and services: Resources for different data practitioners

IASSIST Quarterly ◽

10.29173/iq995 ◽

2021 ◽

Vol 45 (3-4) ◽

Author(s):

Gilbert Mushi

Keyword(s):

Data Management ◽

Developed Countries ◽

Management Plan ◽

Research Data ◽

Data Repository ◽

Data Repositories ◽

Research Libraries ◽

Research Data Management ◽

Metadata Standards ◽

Training Resources

The emergence of data-driven research and demands for the establishment of Research Data Management (RDM) has created interest in academic institutions and research organizations globally. Some of the libraries especially in developed countries have started offering RDM services to their communities. Although lagging behind, some academic libraries in developing countries are at the stage of planning or implementing the service. However, the level of RDM awareness is very low among researchers, librarians and other data practitioners. The objective of this paper is to present available open resources for different data practitioners particularly researchers and librarians. It includes training resources for both researchers and librarians, Data Management Plan (DMP) tool for researchers; data repositories available for researchers to freely archive and share their research data to the local and international communities. A case study with a survey was conducted at the University of Dodoma to identify relevant RDM services so that librarians could assist researchers to make their data accessible to the local and international community. The study findings revealed a low level of RDM awareness among researchers and librarians. Over 50% of the respondent indicated their perceived knowledge as poor in the following RDM knowledge areas; DMP, data repository, long term digital preservation, funders RDM mandates, metadata standards describing data and general awareness of RDM. Therefore, this paper presents available open resources for different data practitioners to improve RDM knowledge and boost the confidence of academic and research libraries in establishing the service.

Download Full-text

Chances and challenges of a long-term data repository in multiple sclerosis: 20th birthday of the German MS registry

Scientific Reports ◽

10.1038/s41598-021-92722-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Lisa-Marie Ohle ◽

David Ellenberger ◽

Peter Flachenecker ◽

Tim Friede ◽

Judith Haas ◽

...

Keyword(s):

Multiple Sclerosis ◽

Sociodemographic Factors ◽

Healthcare Research ◽

Structure Design ◽

Data Repository ◽

Research Projects ◽

Real World Data ◽

Data Repositories ◽

Term Data

AbstractIn 2001, the German Multiple Sclerosis Society, facing lack of data, founded the German MS Registry (GMSR) as a long-term data repository for MS healthcare research. By the establishment of a network of participating neurological centres of different healthcare sectors across Germany, GMSR provides observational real-world data on long-term disease progression, sociodemographic factors, treatment and the healthcare status of people with MS. This paper aims to illustrate the framework of the GMSR. Structure, design and data quality processes as well as collaborations of the GMSR are presented. The registry’s dataset, status and results are discussed. As of 08 January 2021, 187 centres from different healthcare sectors participate in the GMSR. Following its infrastructure and dataset specification upgrades in 2014, more than 196,000 visits have been recorded relating to more than 33,000 persons with MS (PwMS). The GMSR enables monitoring of PwMS in Germany, supports scientific research projects, and collaborates with national and international MS data repositories and initiatives. With its recent pharmacovigilance extension, it aligns with EMA recommendations and helps to ensure early detection of therapy-related safety signals.

Download Full-text

Knowledge base design for environmental research

2005 Richard Tapia Celebration of Diversity in Computing Conference ◽

10.1109/rtcdc.2005.201647 ◽

2005 ◽

Cited By ~ 1

Author(s):

S. Erdogan ◽

T. Shaneyfelt ◽

W. de Smith ◽

Y. Ivanov ◽

A. Honma ◽

...

Keyword(s):

Knowledge Base ◽

Environmental Research ◽

Base Design

Download Full-text

GAMS – An infrastructure for the long-term preservation and publication of research data from the Humanities

Mitteilungen der Vereinigung Österreichischer Bibliothekarinnen und Bibliothekare ◽

10.31263/voebm.v71i1.1992 ◽

2018 ◽

Vol 71 (1) ◽

pp. 207-216 ◽

Cited By ~ 1

Author(s):

Johannes Hubert Stigler ◽

Elisabeth Steiner

Keyword(s):

System Architecture ◽

Academic Research ◽

Research Data ◽

Data Repository ◽

Data Repositories ◽

Domain Specific ◽

Preservation Policy ◽

Long Term Preservation ◽

Data Centres

Research data repositories and data centres are becoming more and more important as infrastructures in academic research. The article introduces the Humanities’ research data repository GAMS, starting with the system architecture to preservation policy and content policy. Challenges of data centres and repositories and the general and domain-specific approaches and solutions are outlined. Special emphasis lies on the sustainability and long-term perspective of such infrastructures, not only on the technical but above all on the organisational and financial level.

Download Full-text

Improving the Usability of Organizational Data Systems

International Journal of Digital Curation ◽

10.2218/ijdc.v16i1.592 ◽

2021 ◽

Vol 16 (1) ◽

pp. 21

Author(s):

Chung-Yi Hou ◽

Matthew S. Mayernik

Keyword(s):

User Interfaces ◽

User Study ◽

Data Repository ◽

Data Systems ◽

User Interactions ◽

Data Repositories ◽

Web Interfaces ◽

Design And Implementation ◽

Usability Assessment ◽

Assessment Techniques

For research data repositories, web interfaces are usually the primary, if not the only, method that data users have to interact with repository systems. Data users often search, discover, understand, access, and sometimes use data directly through repository web interfaces. Given that sub-par user interfaces can reduce the ability of users to locate, obtain, and use data, it is important to consider how repositories’ web interfaces can be evaluated and improved in order to ensure useful and successful user interactions. This paper discusses how usability assessment techniques are being applied to improve the functioning of data repository interfaces at the National Center for Atmospheric Research (NCAR). At NCAR, a new suite of data system tools is being developed and collectively called the NCAR Digital Asset Services Hub (DASH). Usability evaluation techniques have been used throughout the NCAR DASH design and implementation cycles in order to ensure that the systems work well together for the intended user base. By applying user study, paper prototype, competitive analysis, journey mapping, and heuristic evaluation, the NCAR DASH Search and Repository experiences provide examples for how data systems can benefit from usability principles and techniques. Integrating usability principles and techniques into repository system design and implementation workflows helps to optimize the systems’ overall user experience.

Download Full-text

Connecting Data Publication to the Research Workflow: A Preliminary Analysis

International Journal of Digital Curation ◽

10.2218/ijdc.v12i1.533 ◽

2017 ◽

Vol 12 (1) ◽

pp. 88-105 ◽

Cited By ~ 3

Author(s):

Sünje Dallmeier-Tiessen ◽

Varsha Khodiyar ◽

Fiona Murphy ◽

Amy Nurnberger ◽

Lisa Raymond ◽

...

Keyword(s):

Working Group ◽

Research Data ◽

Data Publishing ◽

Data Repository ◽

Research Activity ◽

Data Repositories ◽

Data Publication ◽

Loosely Coupled ◽

Definition Of ◽

Data Documentation

The data curation community has long encouraged researchers to document collected research data during active stages of the research workflow, to provide robust metadata earlier, and support research data publication and preservation. Data documentation with robust metadata is one of a number of steps in effective data publication. Data publication is the process of making digital research objects ‘FAIR’, i.e. findable, accessible, interoperable, and reusable; attributes increasingly expected by research communities, funders and society. Research data publishing workflows are the means to that end. Currently, however, much published research data remains inconsistently and inadequately documented by researchers. Documentation of data closer in time to data collection would help mitigate the high cost that repositories associate with the ingest process. More effective data publication and sharing should in principle result from early interactions between researchers and their selected data repository. This paper describes a short study undertaken by members of the Research Data Alliance (RDA) and World Data System (WDS) working group on Publishing Data Workflows. We present a collection of recent examples of data publication workflows that connect data repositories and publishing platforms with research activity ‘upstream’ of the ingest process. We re-articulate previous recommendations of the working group, to account for the varied upstream service components and platforms that support the flow of contextual and provenance information downstream. These workflows should be open and loosely coupled to support interoperability, including with preservation and publication environments. Our recommendations aim to stimulate further work on researchers’ views of data publishing and the extent to which available services and infrastructure facilitate the publication of FAIR data. We also aim to stimulate further dialogue about, and definition of, the roles and responsibilities of research data services and platform providers for the ‘FAIRness’ of research data publication workflows themselves.

Download Full-text