FAIR Principles and Digital Objects: Accelerating Convergence on a Data Infrastructure

Author(s):  
Erik Schultes ◽  
Peter Wittenburg
2013 ◽  
pp. 74-86
Author(s):  
David Giaretta

To preserve digitally encoded information over a long term following the OAIS Reference Model requires that the information remains accessible, understandable and usable by a specified Designated Community. These are significant challenges for repositories. It will be argued that infrastructure which is needed to support this preservation must be seen in the context of the broader science data infrastructure which international and national funders seek to put in place. Moreover aspects of the preservation components of this infrastructure must themselves be preservable, resulting in a recursive system which must also be highly adaptable, loosely coupled and asynchronous. Even more difficult is to be able to judge whether any proposal is actually likely to be effective. From the earliest discussions of concerns about the preservability of digital objects there have been calls for some way of judging the quality of digital repositories. In this chapter several interrelated efforts which contribute to solutions for these issues will be outlined. Evidence about the challenges which must be overcome and the consistency of demands across nations, disciplines and organisations will be presented, based on extensive surveys which have been carried out by the PARSE.Insight project (http://www.parse-insight.eu). The key points about the revision of the OAIS Reference Model which is underway will be provided; OAIS provides many of the key concepts which underpin the efforts to judge solutions. In the past few years the Trustworthy Repositories Audit and Certification: Criteria and Checklist (TRAC) document has been produced, as well as a number of related checklists. These efforts provide the background of the international effort (the RAC Working Group http://wiki.digitalrepositoryauditandcertification.org) to produce a full ISO standard on which an accreditation and certification process can be built. If successful this standard and associated processes will allow funders to have an independent evaluation of the effectiveness of the archives they support and data producers to have a basis for deciding which repository to entrust with their valuable data. It could shape the digital preservation market. The CASPAR project (http://www.casparpreserves.eu) is an EU part funded project with total spend of 16MEuros which is trying to faithfully implement almost all aspects of the OAIS Reference Model in particular the Information Model. The latter involves tools for capturing all types of Representation Information (Structure, Semantics and all Other types), and tools for defining the Designated Community. This chapter will describe implementations of tools and infrastructure components to support repositories in their task of long term preservation of digital resources, including the capture and preservation of digital rights management and evidence of authenticity associated with digital objects. In order to justify their existence, most repositories must also support contemporaneous use of contemporary as well as “historical” resources; the authors will show how the same techniques can support both, and hence link to the fuller science data infrastructure.


2020 ◽  
Vol 2 (1-2) ◽  
pp. 108-121 ◽  
Author(s):  
Carole Goble ◽  
Sarah Cohen-Boulakia ◽  
Stian Soiland-Reyes ◽  
Daniel Garijo ◽  
Yolanda Gil ◽  
...  

Computational workflows describe the complex multi-step methods that are used for data collection, data preparation, analytics, predictive modelling, and simulation that lead to new data products. They can inherently contribute to the FAIR data principles: by processing data according to established metadata; by creating metadata themselves during the processing of data; and by tracking and recording data provenance. These properties aid data quality assessment and contribute to secondary data usage. Moreover, workflows are digital objects in their own right. This paper argues that FAIR principles for workflows need to address their specific nature in terms of their composition of executable software steps, their provenance, and their development.


2021 ◽  
Author(s):  
Núria Queralt-Rosinach ◽  
Rajaram Kaliyaperumal ◽  
César H. Bernabé ◽  
Qinqin Long ◽  
Simone A. Joosten ◽  
...  

AbstractBackgroundThe COVID-19 pandemic has challenged healthcare systems and research worldwide. Data is collected all over the world and needs to be integrated and made available to other researchers quickly. However, the various heterogeneous information systems that are used in hospitals can result in fragmentation of health data over multiple data ‘silos’ that are not interoperable for analysis. Consequently, clinical observations in hospitalised patients are not prepared to be reused efficiently and timely. There is a need to adapt the research data management in hospitals to make COVID-19 observational patient data machine actionable, i.e. more Findable, Accessible, Interoperable and Reusable (FAIR) for humans and machines. We therefore applied the FAIR principles in the hospital to make patient data more FAIR.ResultsIn this paper, we present our FAIR approach to transform COVID-19 observational patient data collected in the hospital into machine actionable digital objects to answer medical doctors’ research questions. With this objective, we conducted a coordinated FAIRification among stakeholders based on ontological models for data and metadata, and a FAIR based architecture that complements the existing data management. We applied FAIR Data Points for metadata exposure, turning investigational parameters into a FAIR dataset. We demonstrated that this dataset is machine actionable by means of three different computational activities: federated query of patient data along open existing knowledge sources across the world through the Semantic Web, implementing Web APIs for data query interoperability, and building applications on top of these FAIR patient data for FAIR data analytics in the hospital.ConclusionsOur work demonstrates that a FAIR research data management plan based on ontological models for data and metadata, open Science, Semantic Web technologies, and FAIR Data Points is providing data infrastructure in the hospital for machine actionable FAIR digital objects. This FAIR data is prepared to be reused for federated analysis, linkable to other FAIR data such as Linked Open Data, and reusable to develop software applications on top of them for hypothesis generation and knowledge discovery.


Author(s):  
Viviane Veiga ◽  
Maria Luiza Campos ◽  
Carlos Roberto Lyra da Silva ◽  
Patrícia Henning ◽  
João Moreira

The world is experiencing a pandemic caused by the variant of the Coronavirus named SARS-CoV-2, causing the infectious disease COVID-19. Global actions are being carried out to combat this virus, including the Virus Outbreak Data Network (VODAN), which aims to establish a federated data infrastructure aligned with the FAIR principles and which supports the data collect from medical records of patients infected with high contagion. FIOCRUZ, which coordinates the GO FAIR Brazil Health network, took over the VODAN BR network in partnership with UNIRIO and UFRJ, with the participation of the hospitals Gaffrée and Guinle, Municipal São José and Israelita Albert Einstein. This article aims to present the national infrastructure aligned with the guidelines and the international network, for collecting and managing data from patient records, of these hospitals, infected with COVID-19, to establish an information model that supports the confrontation of this and other possible pandemics.


2019 ◽  
Vol 1 (1) ◽  
pp. 6-21 ◽  
Author(s):  
Peter Wittenburg

Data-intensive science is reality in large scientific organizations such as the Max Planck Society, but due to the inefficiency of our data practices when it comes to integrating data from different sources, many projects cannot be carried out and many researchers are excluded. Since about 80% of the time in data-intensive projects is wasted according to surveys we need to conclude that we are not fit for the challenges that will come with the billions of smart devices producing continuous streams of data—our methods do not scale. Therefore experts worldwide are looking for strategies and methods that have a potential for the future. The first steps have been made since there is now a wide agreement from the Research Data Alliance to the FAIR principles that data should be associated with persistent identifiers (PID) and metadata (MD). In fact after 20 years of experience we can claim that there are trustworthy PID systems already in broad use. It is argued, however, that assigning PIDs is just the first step. If we agree to assign PIDs and also use the PID to store important relationships such as pointing to locations where the bit sequences or different metadata can be accessed, we are close to defining Digital Objects (DO) which could indeed indicate a solution to solve some of the basic problems in data management and processing. In addition to standardizing the way we assign PIDs, metadata and other state information we could also define a Digital Object Access Protocol as a universal exchange protocol for DOs stored in repositories using different data models and data organizations. We could also associate a type with each DO and a set of operations allowed working on its content which would facilitate the way to automatic processing which has been identified as the major step for scalability in data science and data industry. A globally connected group of experts is now working on establishing testbeds for a DO-based data infrastructure.


2021 ◽  
Author(s):  
Chengshan Wang ◽  
Robert M Hazen ◽  
Qiuming Cheng ◽  
Michael H Stephenson ◽  
Chenghu Zhou ◽  
...  

Abstract Current barriers hindering data-driven discoveries in deep-time Earth (DE) include: substantial volumes of DE data are not digitized; many DE databases do not adhere to FAIR principles (findable, accessible, interoperable, and reusable); we lack a systematic knowledge graph for DE; existing DE databases are geographically heterogeneous; a significant fraction of DE data is not in open-access formats; tailored tools are needed. These challenges motivate the Deep-time Digital Earth (DDE) program initiated by the International Union of Geological Sciences (IUGS) and developed in cooperation with national geological surveys, professional associations, academic institutions, and scientists around the world. DDE’s mission is to build on previous research to develop a systematic DE knowledge graph, a FAIR data infrastructure that links existing databases and makes dark data visible, and tailored tools for DE data, which are universally accessible. DDE aims to harmonize DE data, share global geoscience knowledge, and facilitate data-driven discovery in the understanding of Earth's evolution.


2021 ◽  
Vol 16 (1) ◽  
pp. 16-33
Author(s):  
David M. Weigl ◽  
Tim Crawford ◽  
Aggelos Gkiokas ◽  
Werner Goebl ◽  
Emilia Gómez ◽  
...  

Vast amounts of publicly licensed classical music resources are housed within many different repositories on the Web encompassing richly diverse facets of information—including bibliographical and biographical data, digitized images of music notation, music score encodings, audiovisual performance recordings, derived feature data, scholarly commentaries, and listener reactions. While these varied perspectives ought to contribute to greater holistic understanding of the music objects under consideration, in practice, such repositories are typically minimally connected. The TROMPA project aims to improve this situation by interconnecting and enriching public-domain music repositories. This is achieved, on the one hand, by the application of automated, cutting-edge Music Information Retrieval techniques, and on the other, by the development of contribution mechanisms enabling users to integrate their expertise. Information within established repositories is interrelated with data generated by the project within a data infrastructure whose design is guided by the FAIR principles of data management and stewardship: making music information Findable, Accessible, Interoperable, and Reusable. We provide an overview of challenges of description, identification, representation, contribution, and reliability toward applying the FAIR principles to music information, and outline TROMPA's implementational approach to overcoming these challenges. This approach applies a graph-based data infrastructure to interrelate information hosted in different repositories on the Web within a unifying data model (a 'knowledge graph'). Connections are generated across different representations of music content beyond the catalogue level, for instance connecting note elements within score encodings to corresponding moments in performance time-lines. Contributions of user data are supported via privacy-first mechanisms that retain control of such data with the contributing user. Provenance information is captured throughout, supporting reproducibility and re-use of the data both within and outside the context of the project.


2020 ◽  
Vol 6 ◽  
pp. e281
Author(s):  
Remzi Celebi ◽  
Joao Rebelo Moreira ◽  
Ahmed A. Hassan ◽  
Sandeep Ayyar ◽  
Lars Ridder ◽  
...  

It is essential for the advancement of science that researchers share, reuse and reproduce each other’s workflows and protocols. The FAIR principles are a set of guidelines that aim to maximize the value and usefulness of research data, and emphasize the importance of making digital objects findable and reusable by others. The question of how to apply these principles not just to data but also to the workflows and protocols that consume and produce them is still under debate and poses a number of challenges. In this paper we describe a two-fold approach of simultaneously applying the FAIR principles to scientific workflows as well as the involved data. We apply and evaluate our approach on the case of the PREDICT workflow, a highly cited drug repurposing workflow. This includes FAIRification of the involved datasets, as well as applying semantic technologies to represent and store data about the detailed versions of the general protocol, of the concrete workflow instructions, and of their execution traces. We propose a semantic model to address these specific requirements and was evaluated by answering competency questions. This semantic model consists of classes and relations from a number of existing ontologies, including Workflow4ever, PROV, EDAM, and BPMN. This allowed us then to formulate and answer new kinds of competency questions. Our evaluation shows the high degree to which our FAIRified OpenPREDICT workflow now adheres to the FAIR principles and the practicality and usefulness of being able to answer our new competency questions.


Author(s):  
David Giaretta

To preserve digitally encoded information over a long term following the OAIS Reference Model requires that the information remains accessible, understandable and usable by a specified Designated Community. These are significant challenges for repositories. It will be argued that infrastructure which is needed to support this preservation must be seen in the context of the broader science data infrastructure which international and national funders seek to put in place. Moreover aspects of the preservation components of this infrastructure must themselves be preservable, resulting in a recursive system which must also be highly adaptable, loosely coupled and asynchronous. Even more difficult is to be able to judge whether any proposal is actually likely to be effective. From the earliest discussions of concerns about the preservability of digital objects there have been calls for some way of judging the quality of digital repositories. In this chapter several interrelated efforts which contribute to solutions for these issues will be outlined. Evidence about the challenges which must be overcome and the consistency of demands across nations, disciplines and organisations will be presented, based on extensive surveys which have been carried out by the PARSE.Insight project (http://www.parse-insight.eu). The key points about the revision of the OAIS Reference Model which is underway will be provided; OAIS provides many of the key concepts which underpin the efforts to judge solutions. In the past few years the Trustworthy Repositories Audit and Certification: Criteria and Checklist (TRAC) document has been produced, as well as a number of related checklists. These efforts provide the background of the international effort (the RAC Working Group http://wiki.digitalrepositoryauditandcertification.org) to produce a full ISO standard on which an accreditation and certification process can be built. If successful this standard and associated processes will allow funders to have an independent evaluation of the effectiveness of the archives they support and data producers to have a basis for deciding which repository to entrust with their valuable data. It could shape the digital preservation market. The CASPAR project (http://www.casparpreserves.eu) is an EU part funded project with total spend of 16MEuros which is trying to faithfully implement almost all aspects of the OAIS Reference Model in particular the Information Model. The latter involves tools for capturing all types of Representation Information (Structure, Semantics and all Other types), and tools for defining the Designated Community. This chapter will describe implementations of tools and infrastructure components to support repositories in their task of long term preservation of digital resources, including the capture and preservation of digital rights management and evidence of authenticity associated with digital objects. In order to justify their existence, most repositories must also support contemporaneous use of contemporary as well as “historical” resources; the authors will show how the same techniques can support both, and hence link to the fuller science data infrastructure.


Author(s):  
Sharif Islam

The Distributed System of Scientific Collections (DiSSCo) is a new Research Infrastructure that is working towards the unification of all European natural science collections under common curation, access policies, and practices (Addink et al. 2019). The physical specimens in the collections and the vast amount of data derived from and linked to these specimens are important building blocks for this unification process. Primarily coming from large scale digitization projects (Blagoderov et al. 2012) along with new types of data collection, curation, and sharing methods (e.g. Kays et al. 2020), these specimens hold data that are critical for different scientific endeavours (Cook et al. 2020, Hedrick et al. 2020). Therefore it is important that the data infrastructure and the relevant services can provide a long-term sustainable and reliable access to these data. To that end, DiSSCo is working towards transforming a fragmented landscape of the natural science collections into an integrated data infrastructure that can ensure that these data can be easily Findable, more Accessible, Interoperable and Reusable – in other words, comply with the FAIR Guiding Principles (Wilkinson et al. 2016). A key decision for the design of this FAIR data infrastructure was to adopt FAIR Digital Objects (Wittenburg and Strawn 2019) that will enable the creation of Digital Specimen—a machine-actionable digital twin of the physical specimen (Lannom et al. 2020). This FAIR Digital Object by design, ensures FAIRness of the data (De Smedt et al. 2020) and thus will allow DiSSCo to provide services that are essential for natural science collection-based research. This talk summarises the motivation behind this adoption by showing how design decisions and best practices were influenced by the FAIR data principles, global discussions around FAIR Digital Objects and outputs from the Research Data Alliance (RDA) interest and working groups.


Sign in / Sign up

Export Citation Format

Share Document