scholarly journals Packaging research artefacts with RO-Crate

Data Science ◽  
2022 ◽  
pp. 1-42
Author(s):  
Stian Soiland-Reyes ◽  
Peter Sefton ◽  
Mercè Crosas ◽  
Leyla Jael Castro ◽  
Frederik Coppens ◽  
...  

An increasing number of researchers support reproducibility by including pointers to and descriptions of datasets, software and methods in their publications. However, scientific articles may be ambiguous, incomplete and difficult to process by automated systems. In this paper we introduce RO-Crate, an open, community-driven, and lightweight approach to packaging research artefacts along with their metadata in a machine readable manner. RO-Crate is based on Schema.org annotations in JSON-LD, aiming to establish best practices to formally describe metadata in an accessible and practical way for their use in a wide variety of situations. An RO-Crate is a structured archive of all the items that contributed to a research outcome, including their identifiers, provenance, relations and annotations. As a general purpose packaging approach for data and their metadata, RO-Crate is used across multiple areas, including bioinformatics, digital humanities and regulatory sciences. By applying “just enough” Linked Data standards, RO-Crate simplifies the process of making research outputs FAIR while also enhancing research reproducibility. An RO-Crate for this article11 https://w3id.org/ro/doi/10.5281/zenodo.5146227 is archived at https://doi.org/10.5281/zenodo.5146227.

2021 ◽  
Author(s):  
Ashleigh Hawkins

AbstractMass digitisation and the exponential growth of born-digital archives over the past two decades have resulted in an enormous volume of archives and archival data being available digitally. This has produced a valuable but under-utilised source of large-scale digital data ripe for interrogation by scholars and practitioners in the Digital Humanities. However, current digitisation approaches fall short of the requirements of digital humanists for structured, integrated, interoperable, and interrogable data. Linked Data provides a viable means of producing such data, creating machine-readable archival data suited to analysis using digital humanities research methods. While a growing body of archival scholarship and praxis has explored Linked Data, its potential to open up digitised and born-digital archives to the Digital Humanities is under-examined. This article approaches Archival Linked Data from the perspective of the Digital Humanities, extrapolating from both archival and digital humanities Linked Data scholarship to identify the benefits to digital humanists of the production and provision of access to Archival Linked Data. It will consider some of the current barriers preventing digital humanists from being able to experience the benefits of Archival Linked Data evidenced, and to fully utilise archives which have been made available digitally. The article argues for increased collaboration between the two disciplines, challenges individuals and institutions to engage with Linked Data, and suggests the incorporation of AI and low-barrier tools such as Wikidata into the Linked Data production workflow in order to scale up the production of Archival Linked Data as a means of increasing access to and utilisation of digitised and born-digital archives.


GigaScience ◽  
2021 ◽  
Vol 10 (5) ◽  
Author(s):  
Neil Davies ◽  
John Deck ◽  
Eric C Kansa ◽  
Sarah Whitcher Kansa ◽  
John Kunze ◽  
...  

Abstract Sampling the natural world and built environment underpins much of science, yet systems for managing material samples and associated (meta)data are fragmented across institutional catalogs, practices for identification, and discipline-specific (meta)data standards. The Internet of Samples (iSamples) is a standards-based collaboration to uniquely, consistently, and conveniently identify material samples, record core metadata about them, and link them to other samples, data, and research products. iSamples extends existing resources and best practices in data stewardship to render a cross-domain cyberinfrastructure that enables transdisciplinary research, discovery, and reuse of material samples in 21st century natural science.


Semantic Web ◽  
2020 ◽  
pp. 1-29
Author(s):  
Bettina Klimek ◽  
Markus Ackermann ◽  
Martin Brümmer ◽  
Sebastian Hellmann

In the last years a rapid emergence of lexical resources has evolved in the Semantic Web. Whereas most of the linguistic information is already machine-readable, we found that morphological information is mostly absent or only contained in semi-structured strings. An integration of morphemic data has not yet been undertaken due to the lack of existing domain-specific ontologies and explicit morphemic data. In this paper, we present the Multilingual Morpheme Ontology called MMoOn Core which can be regarded as the first comprehensive ontology for the linguistic domain of morphological language data. It will be described how crucial concepts like morphs, morphemes, word forms and meanings are represented and interrelated and how language-specific morpheme inventories can be created as a new possibility of morphological datasets. The aim of the MMoOn Core ontology is to serve as a shared semantic model for linguists and NLP researchers alike to enable the creation, conversion, exchange, reuse and enrichment of morphological language data across different data-dependent language sciences. Therefore, various use cases are illustrated to draw attention to the cross-disciplinary potential which can be realized with the MMoOn Core ontology in the context of the existing Linguistic Linked Data research landscape.


Author(s):  
Javier D. Fernández ◽  
Nelia Lasierra ◽  
Didier Clement ◽  
Huw Mason ◽  
Ivan Robinson

Author(s):  
Alexandros Ioannidis-Pantopikos ◽  
Donat Agosti

In the landscape of general-purpose repositories, Zenodo was built at the European Laboratory for Particle Physics' (CERN) data center to facilitate the sharing and preservation of the long tail of research across all disciplines and scientific domains. Given Zenodo’s long tradition of making research artifacts FAIR (Findable, Accessible, Interoperable, and Reusable), there are still challenges in applying these principles effectively when serving the needs of specific research domains. Plazi’s biodiversity taxonomic literature processing pipeline liberates data from publications, making it FAIR via extensive metadata, the minting of a DataCite Digital Object Identifier (DOI), a licence and both human- and machine-readable output provided by Zenodo, and accessible via the Biodiversity Literature Repository community at Zenodo. The deposits (e.g., taxonomic treatments, figures) are an example of how local networks of information can be formally linked to explicit resources in a broader context of other platforms like GBIF (Global Biodiversity Information Facility). In the context of biodiversity taxonomic literature data workflows, a general-purpose repository’s traditional submission approach is not enough to preserve rich metadata and to capture highly interlinked objects, such as taxonomic treatments and digital specimens. As a prerequisite to serve these use cases and ensure that the artifacts remain FAIR, Zenodo introduced the concept of custom metadata, which allows enhancing submissions such as figures or taxonomic treatments (see as an example the treatment of Eurygyrus peloponnesius) with custom keywords, based on terms from common biodiversity vocabularies like Darwin Core and Audubon Core and with an explicit link to the respective vocabulary term. The aforementioned pipelines and features are designed to be served first and foremost using public Representational State Transfer Application Programming Interfaces (REST APIs) and open web technologies like webhooks. This approach allows researchers and platforms to integrate existing and new automated workflows into Zenodo and thus empowers research communities to create self-sustained cross-platform ecosystems. The BiCIKL project (Biodiversity Community Integrated Knowledge Library) exemplifies how repositories and tools can become building blocks for broader adoption of the FAIR principles. Starting with the above literature processing pipeline, the concepts of and resulting FAIR data, with a focus on the custom metadata used to enhance the deposits, will be explained.


2020 ◽  
pp. 108-120
Author(s):  
О. Zherebko

The article analyzes forensic activity as one of the forms of activity in the field of legal proceedings. A comprehensive analysis of forensic activity has allowed formulating a number of proposals regarding ways and means of improving it. Ways of improving forensic activities have been identified and proposed: increasing the level of technical and forensic support for the disclosure, investigation and prevention of crimes; implementation of measures to increase the effectiveness of the participation of specialists of expert services in conducting investigative actions and operational-search measures. There is also indicated on improving research activities and introducing into practice new technical and forensic tools, forensic methods and techniques. Conducting forensic records, analytical and organizational work based on the introduction of modern automated systems and technologies; synthesis and dissemination of best practices and analysis of expert practice; improving the selection, training and placement of employees of expert units, strengthening official and executive discipline. Intensification of interaction between the expert services of the Ministry of Internal Affairs with other departments of the internal affairs bodies, as well as with other law enforcement agencies, including at the interstate level is described.


2016 ◽  
Vol 60 (4) ◽  
pp. 223 ◽  
Author(s):  
Qiang Jin ◽  
Jim Hahn ◽  
Gretchen Croll

With support from an internal innovation grant from the University of Illinois Library at Urbana-Champaign, researchers transformed and enriched nearly 300,000 e-book records in their library catalog from Machine-Readable Cataloging (MARC) records to Bibliographic Framework (BIBFRAME) linked data resources. Researchers indexed the BIBFRAME resources online, and created two search interfaces for the discovery of BIBFRAME linked data. One result of the grant was the incorporation of BIBFRAME resources within an experimental Bento view of the linked library data for e-books. The end goal of this project is to provide enhanced discovery of library data, bringing like sets of content together in contemporary and easy to understand views assisting users in locating sets of associated bibliographic metadata.


Author(s):  
Christian Bizer ◽  
Tom Heath ◽  
Tim Berners-Lee

The term “Linked Data” refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions— the Web of Data. In this article, the authors present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. They describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.


2018 ◽  
Vol 14 (3) ◽  
pp. 167-183
Author(s):  
Ahmed Ktob ◽  
Zhoujun Li

This article describes how recently, many new technologies have been introduced to the web; linked data is probably the most important. Individuals and organizations started emerging and publishing their data on the web adhering to a set of best practices. This data is published mostly in English; hence, only English agents can consume it. Meanwhile, although the number of Arabic users on the web is immense, few Arabic datasets are published. Publication catalogs are one of the primary sources of Arabic data that is not being exploited. Arabic catalogs provide a significant amount of meaningful data and metadata that are commonly stored in excel sheets. In this article, an effort has been made to help publishers easily and efficiently share their catalogs' data as linked data. Marefa is the first tool implemented that automatically extracts RDF triples from Arabic catalogs, aligns them to the BIBO ontology and links them with the Arabic chapter of DBpedia. An evaluation of the framework was conducted, and some statistical measures were generated during the different phases of the extraction process.


Sign in / Sign up

Export Citation Format

Share Document