scholarly journals pISA-tree - a data management framework for life science research projects using a standardised directory tree

2021 ◽  
Author(s):  
Marko Petek ◽  
Maja Zagorscak ◽  
Andrej Blejec ◽  
Ziva Ramsak ◽  
Anna Coll ◽  
...  

We have developed pISA-tree, a straightforward and flexible data management solution for organisation of life science project-associated research data and metadata. It enables on-the-fly creation of enriched directory tree structure (project/Investigation/Study/Assay) via a series of sequential batch files in a standardised manner based on the ISA metadata framework. The system supports reproducible research and is in accordance with the Open Science initiative and FAIR principles. Compared with similar frameworks, it does not require any systems administration and maintenance as it can be run on a personal computer or network drive. It is complemented with two R packages, pisar and seekr, where the former facilitates integration of the pISA-tree datasets into bioinformatic pipelines and the latter enables synchronisation with the FAIRDOMHub public repository using the SEEK API. Source code and detailed documentation of pISA-tree and its supporting R packages are available from https://github.com/NIB-SI/pISA-tree.

2019 ◽  
Vol 15 (2) ◽  
Author(s):  
Viviane Santos de Oliveira Veiga ◽  
Patricia Henning ◽  
Simone Dib ◽  
Erick Penedo ◽  
Jefferson Da Costa Lima ◽  
...  

RESUMO Este artigo trás para discussão o papel dos planos de gestão de dados como instrumento facilitador da gestão dos dados durante todo o ciclo de vida da pesquisa. A abertura de dados de pesquisa é pauta prioritária nas agendas científicas, por ampliar tanto a visibilidade e transparência das investigações, como a capacidade de reprodutibilidade e reuso dos dados em novas pesquisas. Nesse contexto, os princípios FAIR, um acrônimo para ‘Findable’, ‘Accessible’, ‘Interoperable’ e ‘Reusable’ é fundamental por estabelecerem orientações basilares e norteadoras na gestão, curadoria e preservação dos dados de pesquisa direcionados para o compartilhamento e o reuso. O presente trabalho tem por objetivo apresentar uma proposta de template de Plano de Gestão de Dados, alinhado aos princípios FAIR, para a Fundação Oswaldo Cruz. A metodologia utilizada é de natureza bibliográfica e de análise documental de diversos planos de gestão de dados europeus. Concluímos que a adoção de um plano de gestão nas práticas cientificas de universidades e instituições de pesquisa é fundamental. No entanto, para tirar maior proveito dessa atividade é necessário contar com a participação de todos os atores envolvidos no processo, além disso, esse plano de gestão deve ser machine-actionable, ou seja, acionável por máquina.Palavras-chave: Plano de Gestão de Dados; Dado de Pesquisa; Princípios FAIR; PGD Acionável por Máquina; Ciência Aberta.ABSTRACT This article proposes to discuss the role of data management plans as a tool to facilitate data management during researches life cycle. Today, research data opening is a primary agenda at scientific agencies as it may boost investigations’ visibility and transparency as well as the ability to reproduce and reuse its data on new researches. Within this context, FAIR principles, an acronym for Findable, Accessible, Interoperable and Reusable, is paramount, as it establishes basic and guiding orientations for research data management, curatorship and preservation with an intent on its sharing and reuse. The current work intends to present to the Fundação Oswaldo Cruz a new Data Management Plan template proposal, aligned with FAIR principles. The methodology used is bibliographical research and documental analysis of several European data management plans. We conclude that the adoption of a management plan on universities and research institutions scientific activities is paramount. However, to be fully benefited from this activity, all actors involved in the process must participate, and, on top of that, this plan must be machine-actionable.Keywords: Data Management Plan; Research Data; FAIR Principles; DMP Machine-Actionable; Open Science.


2017 ◽  
Vol 12 (1) ◽  
pp. 22-35 ◽  
Author(s):  
Tomasz Miksa ◽  
Andreas Rauber ◽  
Roman Ganguly ◽  
Paolo Budroni

Data management plans are free-form text documents describing the data used and produced in scientific experiments. The complexity of data-driven experiments requires precise descriptions of tools and datasets used in computations to enable their reproducibility and reuse. Data management plans fall short of these requirements. In this paper, we propose machine-actionable data management plans that cover the same themes as standard data management plans, but particular sections are filled with information obtained from existing tools. We present mapping of tools from the domains of digital preservation, reproducible research, open science, and data repositories to data management plan sections. Thus, we identify the requirements for a good solution and identify its limitations. We also propose a machine-actionable data model that enables information integration. The model uses ontologies and is based on existing standards.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 1398
Author(s):  
Susanne Hollmann ◽  
Andreas Kremer ◽  
Špela Baebler ◽  
Christophe Trefois ◽  
Kristina Gruden ◽  
...  

Today, academic researchers benefit from the changes driven by digital technologies and the enormous growth of knowledge and data, on globalisation, enlargement of the scientific community, and the linkage between different scientific communities and the society. To fully benefit from this development, however, information needs to be shared openly and transparently. Digitalisation plays a major role here because it permeates all areas of business, science and society and is one of the key drivers for innovation and international cooperation. To address the resulting opportunities, the EU promotes the development and use of collaborative ways to produce and share knowledge and data as early as possible in the research process, but also to appropriately secure results with the European strategy for Open Science (OS). It is now widely recognised that making research results more accessible to all societal actors contributes to more effective and efficient science; it also serves as a boost for innovation in the public and private sectors. However  for research data to be findable, accessible, interoperable and reusable the use of standards is essential. At the metadata level, considerable efforts in standardisation have already been made (e.g. Data Management Plan and FAIR Principle etc.), whereas in context with the raw data these fundamental efforts are still fragmented and in some cases completely missing. The CHARME consortium, funded by the European Cooperation in Science and Technology (COST) Agency, has identified needs and gaps in the field of standardisation in the life sciences and also discussed potential hurdles for implementation of standards in current practice. Here, the authors suggest four measures in response to current challenges to ensure a high quality of life science research data and their re-usability for research and innovation.


2021 ◽  
Author(s):  
Yujun Xu ◽  
Ulrich Mansmann

Abstract Reproducibility is not only essential for the integrity of scientific research, but is also a prerequisite of model validation and refinement for future application of (predictive) algorithms. However, reproducible research is becoming increasingly challenging, particularly in high-dimensional genomic data analyses with complex statistical or algorithmic techniques. Given that there are no mandatory requirements in most biomedical and statistical journals to provide the original data, analytical source code, or other relevant materials for publication, accessibility to these supplements naturally suggests a greater credibility of published work. In this study, we performed a reproducibility assessment of the notable paper by Gerstung et al. published in Nature Genetics (2017) by rerunning the analysis using their original code and data, which are publicly accessible. Despite a perfect open science setting, it was challenging to reproduce the entire research project; reasons included coding errors, suboptimal code legibility, incomplete documentation, intensive computations, and an R computing environment that could no longer be re-established. We learn that availability of code and data does not guarantee transparency and reproducibility of a study; in contrast, the source code is still liable to error and obsolescence, essentially due to methodological complexity, lack of editorial reproducibility checking at submission, and updates of software and operating environment. Building on the experience gained, we propose practical criteria for the conduct and reporting of reproducibility studies for future researchers.


2018 ◽  
Vol 13 (1) ◽  
pp. 35-46
Author(s):  
Carolyn Hank ◽  
Bradley Wade Bishop

For open science to flourish, data and any related digital outputs should be discoverable and re-usable by a variety of potential consumers. The recent FAIR Data Principles produced by the Future of Research Communication and e-Scholarship (FORCE11) collective provide a compilation of considerations for making data findable, accessible, interoperable, and re-usable. The principles serve as guideposts to ‘good’ data management and stewardship for data and/or metadata. On a conceptual level, the principles codify best practices that managers and stewards would find agreement with, exist in other data quality metrics, and already implement. This paper reports on a secondary purpose of the principles: to inform assessment of data’s FAIR-ness or, put another way, data’s fitness for use. Assessment of FAIR-ness likely requires more stratification across data types and among various consumer communities, as how data are found, accessed, interoperated, and re-used differs depending on types and purposes. This paper’s purpose is to present a method for qualitatively measuring the FAIR Data Principles through operationalizing findability, accessibility, interoperability, and re- usability from a re-user’s perspective. The findings may inform assessments that could also be used to develop situationally-relevant fitness for use frameworks.


GigaScience ◽  
2020 ◽  
Vol 9 (6) ◽  
Author(s):  
Jaqueline J Brito ◽  
Jun Li ◽  
Jason H Moore ◽  
Casey S Greene ◽  
Nicole A Nogoy ◽  
...  

Abstract Biomedical research depends increasingly on computational tools, but mechanisms ensuring open data, open software, and reproducibility are variably enforced by academic institutions, funders, and publishers. Publications may present software for which source code or documentation are or become unavailable; this compromises the role of peer review in evaluating technical strength and scientific contribution. Incomplete ancillary information for an academic software package may bias or limit subsequent work. We provide 8 recommendations to improve reproducibility, transparency, and rigor in computational biology—precisely the values that should be emphasized in life science curricula. Our recommendations for improving software availability, usability, and archival stability aim to foster a sustainable data science ecosystem in life science research.


2020 ◽  
Author(s):  
Massimo Cocco ◽  
Daniele Bailo ◽  
Keith G. Jeffery ◽  
Rossana Paciello ◽  
Valerio Vinciarelli ◽  
...  

<p>Interoperability has long been an objective for research infrastructures dealing with research data to foster open access and open science. More recently, FAIR principles (Findability, Accessibility, Interoperability and Reusability) have been proposed. The FAIR principles are now reference criteria for promoting and evaluating openness of scientific data. FAIRness is considered a necessary target for research infrastructures in different scientific domains at European and global level.</p><p>Solid Earth RIs have long been committed to engage scientific communities involved in data collection, standardization and quality management as well as providing metadata and services for qualification, storage and accessibility. They are working to adopt FAIR principles, thus addressing the onerous task of turning these principles into practices. To make FAIR principles a reality in terms of service provision for data stewardship, some RI implementers in EPOS have proposed a FAIR-adoption process leveraging a four stage roadmap that reorganizes FAIR principles to better fit to scientists and RI implementers mindset. The roadmap considers FAIR principles as requirements in the software development life cycle, and reorganizes them into data, metadata, access services and use services. Both the implementation and the assessment of “FAIRness” level by means of questionnaire and metrics is made simple and closer to day-to-day scientists works.</p><p>FAIR data and service management is demanding, requiring resources and skills and more importantly it needs sustainable IT resources. For this reason, FAIR data management is challenging for many Research Infrastructures and data providers turning FAIR principles into reality through viable and sustainable practices. FAIR data management also includes implementing services to access data as well as to visualize, process, analyse and model them for generating new scientific products and discoveries.</p><p>FAIR data management is challenging to Earth scientists because it depends on their perception of finding, accessing and using data and scientific products: in other words, the perception of data sharing. The sustainability of FAIR data and service management is not limited to financial sustainability and funding; rather, it also includes legal, governance and technical issues that concern the scientific communities.</p><p>In this contribution, we present and discuss some of the main challenges that need to be urgently tackled in order to run and operate FAIR data services in the long-term, as also envisaged by the European Open Science Cloud initiative: a) sustainability of the IT solutions and resources to support practices for FAIR data management (i.e., PID usage and preservation, including costs for operating the associated IT services); b) re-usability, which on one hand requires clear and tested methods to manage heterogeneous metadata and provenance, while on the other hand can be considered a frontier research field; c) FAIR services provision, which presents many open questions related to the application of FAIR principles to services for data stewardship, and to services for the creation of data products taking in input FAIR raw data, for which is not clear how FAIRness compliancy of data products can be still guaranteed.</p>


2021 ◽  
Author(s):  
Sylvain Prigent ◽  
Cesar Augusto Valades-Cruz ◽  
Ludovic Leconte ◽  
Léo Maury ◽  
Jean Salamero ◽  
...  

Open science and FAIR principles have become major topics in the field of bioimaging. This is due to both new data acquisition technologies that generate large datasets, and new analysis approaches that automate data mining with high accuracy. Nevertheless, data are rarely shared and rigorously annotated because it requires a lot of manual and tedious management tasks and software packaging. We present BioImageIT, an open-source framework for integrating data management according to FAIR principles with data processing.


Sign in / Sign up

Export Citation Format

Share Document