scholarly journals CoreTrustSeal

Author(s):  
Ingrid Dillo ◽  
Lisa De Leeuw

Open data and data management policies that call for the long-term storage and accessibility of data are becoming more and more commonplace in the research community. With it the need for trustworthy data repositories to store and disseminate data is growing. CoreTrustSeal, a community based and non-profit organisation, offers data repositories a core level certification based on the DSA-WDS Core Trustworthy Data Repositories Requirements catalogue and procedures. This universal catalogue of requirements reflects the core characteristics of trustworthy data repositories. Core certification involves an uncomplicated process whereby data repositories supply evidence that they are sustainable and trustworthy. A repository first conducts an internal self-assessment, which is then reviewed by community peers. Once the self-assessment is found adequate the CoreTrustSeal board certifies the repository with a CoreTrustSeal. The Seal is valid for a period of three years. Being a certified repository has several external and internal benefits. It for instance improves the quality and transparency of internal processes, increases awareness of and compliance with established standards, builds stakeholder confidence, enhances the reputation of the repository, and demonstrates that the repository is following good practices. It is also offering a benchmark for comparison and helps to determine the strengths and weaknesses of a repository. In the future we foresee a larger uptake through different domains, not in the least because within the European Open Science Cloud, the FAIR principles and therefore also the certification of trustworthy digital repositories holding data is becoming increasingly important. Next to that the CoreTrustSeal requirements will most probably become a European Technical standard which can be used in procurement (under review by the European Commission).

Metabolomics ◽  
2019 ◽  
Vol 15 (10) ◽  
Author(s):  
Kevin M. Mendez ◽  
Leighton Pritchard ◽  
Stacey N. Reinke ◽  
David I. Broadhurst

Abstract Background A lack of transparency and reporting standards in the scientific community has led to increasing and widespread concerns relating to reproduction and integrity of results. As an omics science, which generates vast amounts of data and relies heavily on data science for deriving biological meaning, metabolomics is highly vulnerable to irreproducibility. The metabolomics community has made substantial efforts to align with FAIR data standards by promoting open data formats, data repositories, online spectral libraries, and metabolite databases. Open data analysis platforms also exist; however, they tend to be inflexible and rely on the user to adequately report their methods and results. To enable FAIR data science in metabolomics, methods and results need to be transparently disseminated in a manner that is rapid, reusable, and fully integrated with the published work. To ensure broad use within the community such a framework also needs to be inclusive and intuitive for both computational novices and experts alike. Aim of Review To encourage metabolomics researchers from all backgrounds to take control of their own data science, mould it to their personal requirements, and enthusiastically share resources through open science. Key Scientific Concepts of Review This tutorial introduces the concept of interactive web-based computational laboratory notebooks. The reader is guided through a set of experiential tutorials specifically targeted at metabolomics researchers, based around the Jupyter Notebook web application, GitHub data repository, and Binder cloud computing platform.


2019 ◽  
Vol 24 (1) ◽  
pp. 205-220
Author(s):  
Christian Steiner ◽  
Robert Klugseder

Abstract The Digital Humanities project ‘CANTUS NETWORK. Libri ordinarii of the Salzburg metropolitan province’ undertakes research around the liturgy and music of the churches and monasteries of the medieval ecclesiastical province of Salzburg. Key sources are the liturgical ‘prompt books’, called libri ordinarii, which include a short form of more or less the entire rite of a diocese or a monastery. The workflow of the project is set in an environment called GAMS, a humanities research data repository built for long-term storage and presentation of data coming from the humanities. Digital editions of the libri ordinarii of the province were generated with the aim of enabling a comparative analysis of the various different traditions. As a first step, the books were transcribed with strict rule-based tags in Microsoft Word and transformed to TEI using the community’s XSLT stylesheets and a Java-based script. Subsequently, Semantic Web technologies were deployed to foster graph-based search and analysis of the structured data. Possible future work on the topic is facilitated by the dissemination of content levels as Linked Open Data. Further analysis is conducted with the help of Natural Language Processing methods in order to find text similarities and differences between the libri ordinarii.


Author(s):  
Abraham Nieva de la Hidalga ◽  
Nicolas Cazenave ◽  
Donat Agosti ◽  
Zhengzhe Wu ◽  
Mathias Dillen ◽  
...  

Digitisation of Natural History Collections (NHC) has evolved from transcription of specimen catalogues in databases to web portals providing access to data, digital images, and 3D models of specimens. These portals increase global accessibility to specimens and help preserve the physical specimens by reducing their handling. The size of the NHC requires developing high-throughput digitisation workflows, as well as research into novel acquisition systems, image standardisation, curation, preservation, and publishing. Nowadays, herbarium sheet digitisation workflows (and fast digitisation stations) can digitise up to 6,000 specimens per day. Operating those digitisation stations in parallel, can increase the digitisation capacity. The high-resolution images obtained from these specimens, and their volume require substantial bandwidth, and disk space and tapes for storage of original digitised materials, as well as availability of computational processing resources for generating derivatives, information extraction, and publishing. While large institutions have dedicated digitisation teams that manage the whole workflow from acquisition to publishing, other institutions cannot dedicate resources to support all digitisation activities, in particular long-term storage. National and European e-infrastructures can provide an alternative solution by supporting different parts of the digitisation workflows. In the context of the Innovation and consolidation for large scale digitisation of natural heritage (ICEDIG Project 2018), three different e-infrastructures providing long-term storage have been analysed through three pilot studies: EUDAT-CINES, Zenodo, and National Infrastructures. The EUDAT-CINES pilot centred on transferring large digitised herbarium collections from the National Museum of Natural History France (MNHN) to the storage infrastructure provided by the Centre Informatique National de l’Enseignement Supérieur (CINES 2014), a European trusted digital repository. The upload, processing, and access services are supported by a combination of services provided by the European Collaborative Data Infrastructure (EUDAT CDI 2019) and CINES . The Zenodo pilot included the upload of herbarium collections from Meise Botanic Garden (APM) and other European herbaria into the Zenodo repository (Zenodo 2019). The upload, processing and access services are supported by Zenodo services, accessed by APM. The National Infrastructures pilot facilitated the upload of digital assets derived from specimens of herbarium and entomology collections held at the Finnish Museum of Natural History (LUOMUS) into the Finnish Biodiversity Information Facility (FinBIF 2019). This pilot concentrates on simplifying the integration of digitisation facilities to Finnish national e-infrastructures, using services developed by LUOMUS to access FinBIF resources. The data models employed in the pilots allow defining data schemas according to the types of collection and specimen images stored. For EUDAT-CINES, data were composed of the specimen data and its business metadata (those the institution making the deposit, in this case MNHN, considers relevant for the data objects being stored), enhanced by archiving metadata, added during the archiving process (institution, licensing, identifiers, project, archiving date, etc). EUDAT uses ePIC identifiers (ePIC 2019) to identify each deposit. The Zenodo pilot was designed to allow defining specimen data and metadata supporting indexing and access to resources. Zenodo uses DataCite Digital Object Identifiers (DOI) and the underlying data types as the main identifiers for the resources, augmented with fields based on standard TDWG vocabularies. FinBIF compiles Finnish biodiversity information to one single service for open access sharing. In FinBIF, HTTP URI based identifiers are used for all data, which link the specimen data with other information, such as images. The pilot infrastructure design reports describe features, capacities, functions and costs for each model, in three specific contexts are relevant for the implementation of the Distributed Systems of Scientific Collections (DiSSCo 2019) research infrastructure, informing the options for long-term storage and archiving digitised specimen data. The explored options allow preservation of assets and support easy access. In a wider context, the results provide a template for service evaluation in the European Open Science Cloud (EOSC 2019) which can guide similar efforts.


2019 ◽  
Vol 4 (2) ◽  
pp. 156-172
Author(s):  
Henrique Machado dos Santos

Este estudo discute a implementação de repositórios arquivísticos em conformidade com o Sistema Aberto para Arquivamento de Informação e a necessidade de auditá-los para avaliar sua confiabilidade. Para tanto, realiza-se um levantamento bibliográfico de materiais previamente publicados, com seleção de: livros que abordam as perspectivas da Arquivística na era digital e o desafio da custódia documental confiável; publicações técnicas como as normas International Organization for Standardization e padrões de auditoria; e artigos científicos recuperados pela ferramenta de pesquisa Google Scholar, com busca temática relacionada à preservação de documentos arquivísticos digitais, repositórios digitais confiáveis, auditoria de informação e auditoria arquivística. O repositório arquivístico é o prisma da discussão, já a comparação entre os padrões de auditoria torna-se a categoria norteadora, logo, obtém-se um artigo de revisão assistemática. Dessa forma, são analisados os padrões de auditoria: Trustworthy Repository Audit & Certification: Criteria and Checklist, Catalogue of Criteria for Trusted Digital Repositories da Network of Expertise in long-term STORage, Digital Repository Audit Method Based on Risk Assessment e Audit and Certification of Trustworthy Digital Repositories. Por fim, o comparativo entre os padrões demonstra que o Audit and Certification of Trustworthy Digital Repositories é o mais indicado para auditar os repositórios arquivísticos digitais.


2021 ◽  
Vol 16 (1) ◽  
pp. 16
Author(s):  
Amy Currie ◽  
William Kilbride

Digital preservation is a fast-moving and growing community of practice of ubiquitous relevance, but in which capability is unevenly distributed. Within the open science and research data communities, digital preservation has a close alignment to the FAIR principles and is delivered through a complex specialist infrastructure comprising technology, staff and policy. However, capacity erodes quickly, establishing a need for ongoing examination and review to ensure that skills, technology, and policy remain fit for changing purpose. To address this challenge, the Digital Preservation Coalition (DPC) conducted the FAIR Forever study, commissioned by the European Open Science Cloud (EOSC) Sustainability Working Group and funded by the EOSC Secretariat Project in 2020, to assess the current strengths, weaknesses, opportunities and threats to the preservation of research data across EOSC, and the feasibility of establishing shared approaches, workflows and services that would benefit EOSC stakeholders. This paper draws from the FAIR Forever study to document and explore its key findings on the identified strengths, weaknesses, opportunities, and threats to the preservation of FAIR data in EOSC, and to the preservation of research data more broadly. It begins with background of the study and an overview of the methodology employed, which involved a desk-based assessment of the emerging EOSC vision, interviews with representatives of EOSC stakeholders, and focus groups with digital preservation specialists and data managers in research organizations. It summarizes key findings on the need for clarity on digital preservation in the EOSC vision and for elucidation of roles, responsibilities, and accountabilities to mitigate risks of data loss, reputation, and sustainability. It then outlines the recommendations provided in the final report presented to the EOSC Sustainability Working Group. To better ensure that research data can be FAIRer for longer, the recommendations of the study are presented with discussion on how they can be extended and applied to various research data stakeholders in and outside of EOSC, and suggest ways to bring together research data curation, management, and preservation communities to better ensure FAIRness now and in the long term.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 1858
Author(s):  
Nomi L. Harris ◽  
Peter J.A. Cock ◽  
Brad Chapman ◽  
Christopher J. Fields ◽  
Karsten Hokamp ◽  
...  

The Bioinformatics Open Source Conference (BOSC) is a meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. The 18th annual BOSC (http://www.open-bio.org/wiki/BOSC_2017) took place in Prague, Czech Republic in July 2017. The conference brought together nearly 250 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, open and reproducible science, and this year’s theme, open data. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community, called the OBF Codefest.


Author(s):  
Юрий Иванович Шокин ◽  
Нурлан Муханович Темирбеков ◽  
Олег Львович Жижимов ◽  
Алмас Нурланович Темирбеков ◽  
Досан Ракимгалиевич Байгереев

Описана модель интегрированной распределенной библиотечной информационной системы, объединяющей оцифровываемый книжный фонд и научные труды Казахстанского инженерно-технологического университета и ряда научно-исследовательских институтов, расположенных в Академгородке г. Алматы. Проанализированы возможности и потребности всех участников научно-образовательного кластера для построения оптимальной архитектуры распределенной информационной системы. Приведено описание и представлены результаты реализации подсистем цифрового репозитория интегрированной распределенной библиотечной информационной системы Академгородка г. Алматы. The purpose of the work is to create an integrated distributed library information system that allows preserving the result of intellectual activity of Kazakhstan Engineering Technological University and a number of research institutes located in the Academgorodok of Almaty in an actual form and to provide access to them based on Web technologies. The relevance of the work is related to the fact that significant amounts of information obtained as a result of research of these institutions, their continuous expansion and heterogeneity in the nature of storage and distribution, along with the lack of unified access to them create significant problems of their effective use. These problems lead to the need to find new approaches and solutions to the problems of creating a repository of information resources, their organization, means and ways to access users. In addition, the work is consistent with the goals set within the framework of the State Program “Digital Kazakhstan” for 2017-2020. The following results were obtained in the present work: 1. The analysis of the capabilities and needs of all participants of the scientificand educational cluster for building the optimal architecture of the distributed information system is carried out. 2. The model of integrated distributed library information system of Almaty Akademgorodok was developed. 3. The results of the implementation of its subsystems are presented. As a result of the conducted research, it was concluded that this system corresponds to the needs of participants in the scientific and educational cluster both in terms of information content and support for sectoral and linguistic specifics, since it solves the main tasks for these systems: providing a system for reliable long-term storage of digital (electronic) documents with preservation of all semantic and functional characteristics of source documents; ensuring “transparent” search and access of users to documents, both for familiarization and for the analysis of the facts contained therein; organization of collection of information on remote digital repositories supporting the OAI-PMH, SRW/SRU, and Z39.50 protocols.


2019 ◽  
Vol 15 (2) ◽  
Author(s):  
Claudio José Silva Ribeiro

RESUMO O acesso aberto à produção em Ciência e Tecnologia vem impulsionando os debates sobre a adoção e uso de instrumentos para compartilhamento da produção científica. Este relato propõe um modelo de maturidade que trata de forma integrada os repositórios confiáveis de documentos e os repositórios de dados, formando a noção de repositório digital integrado, para posicionar a instituição em um nível de maturidade para estes instrumentos de compartilhamento da produção científica. Apresenta o estudo sobre modelos de maturidade como forma de avaliação da qualidade para repositórios. Adota o modelo CMMI como referência, relacionado os respectivos níveis. Apresenta um estudo sobre repositórios confiáveis e critérios de avaliação, finalizando a proposta com a inclusão dos princípios FAIR no processo de avaliação. Fazendo uso de uma combinação de métodos para desenvolvimento da investigação, propõe a avaliação da qualidade de repositórios do estado do Rio de Janeiro segundo o modelo elaborado. Propõe a consolidação dos resultados em um modelo de avaliação estruturado em níveis de maturidade, incluindo um método de avaliação para repositórios integrados.Palavras-chave: Modelo de Maturidade; Repositório Digital; Gestão da Qualidade; Ciência Aberta; Repositório Confiável; Princípios FAIR.ABSTRACT Open access to research outcomes and data obtained in Science and Technology production has risen debates about the adoption and use of sharing tools for those outcomes. This research proposes a maturity model that unifies trusted document repositories and data repositories to form a notion of an integrated digital repository, setting this institution on a maturity level regarding these scientific production sharing tools. It also presents studies on these maturity models as ways to evaluate the quality of those data repositories. It adopts the CMMI model as a reference, relating to its respective levels. It presents a study on trustworthy repositories and evaluation criteria and concludes with the incorporation of FAIR principles into the evaluating process. Using a combination of methods to develop its investigation, the research proposes the use of this maturity model to evaluate Rio de Janeiro state repositories. Finally, it proposes the consummation of its results using an evaluation scheme structured in maturity levels, which includes an evaluation method for integrated repositories.Keywords: Maturity Model; Digital Repository; Quality Management; Open Science; Trustworthy Repository; FAIR Principles.


1970 ◽  
Vol 12 (2) ◽  
pp. 177-195 ◽  
Author(s):  
Alastair Dunning ◽  
Madeleine De Smaele ◽  
Jasmin Böhmer

This practice paper describes an ongoing research project to test the effectiveness and relevance of the FAIR Data Principles. Simultaneously, it will analyse how easy it is for data archives to adhere to the principles. The research took place from November 2016 to January 2017, and will be underpinned with feedback from the repositories. The FAIR Data Principles feature 15 facets corresponding to the four letters of FAIR - Findable, Accessible, Interoperable, Reusable. These principles have already gained traction within the research world. The European Commission has recently expanded its demand for research to produce open data. The relevant guidelines1are explicitly written in the context of the FAIR Data Principles. Given an increasing number of researchers will have exposure to the guidelines, understanding their viability and suggesting where there may be room for modification and adjustment is of vital importance. This practice paper is connected to a dataset(Dunning et al.,2017) containing the original overview of the sample group statistics and graphs, in an Excel spreadsheet. Over the course of two months, the web-interfaces, help-pages and metadata-records of over 40 data repositories have been examined, to score the individual data repository against the FAIR principles and facets. The traffic-light rating system enables colour-coding according to compliance and vagueness. The statistical analysis provides overall, categorised, on the principles focussing, and on the facet focussing results. The analysis includes the statistical and descriptive evaluation, followed by elaborations on Elements of the FAIR Data Principles, the subject specific or repository specific differences, and subsequently what repositories can do to improve their information architecture. (1) H2020 Guidelines on FAIR Data Management:http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf


Publications ◽  
2019 ◽  
Vol 7 (1) ◽  
pp. 9 ◽  
Author(s):  
Juliana Raffaghelli ◽  
Stefania Manca

In the landscape of Open Science, Open Data (OD) plays a crucial role as data are one of the most basic components of research, despite their diverse formats across scientific disciplines. Opening up data is a recent concern for policy makers and researchers, as the basis for good Open Science practices. The common factor underlying these new practices—the relevance of promoting Open Data circulation and reuse—is mostly a social form of knowledge sharing and construction. However, while data sharing is being strongly promoted by policy making and is becoming a frequent practice in some disciplinary fields, Open Data sharing is much less developed in Social Sciences and in educational research. In this study, practices of OD publication and sharing in the field of Educational Technology are explored. The aim is to investigate Open Data sharing in a selection of Open Data repositories, as well as in the academic social network site ResearchGate. The 23 Open Datasets selected across five OD platforms were analysed in terms of (a) the metrics offered by the platforms and the affordances for social activity; (b) the type of OD published; (c) the FAIR (Findability, Accessibility, Interoperability, and Reusability) data principles compliance; and (d) the extent of presence and related social activity on ResearchGate. The results show a very low social activity in the platforms and very few correspondences in ResearchGate that highlight a limited social life surrounding Open Datasets. Future research perspectives as well as limitations of the study are interpreted in the discussion.


Sign in / Sign up

Export Citation Format

Share Document