Putting the INGV data policy into practice: considerations after the first-year experience

The Istituto Nazionale di Geofisica e Vulcanologia (INGV) has a long tradition of sharing scientific data, well before the Open Science paradigm was conceived. In the last thirty years, a great deal of geophysical data generated by research projects and monitoring activities were published on the Internet, though encoded in multiple formats and made accessible using various technologies.To organise such a complex scenario, a working group (PoliDat) for implementing an institutional data policy operated from 2015 to 2018. PoliDat published three documents: in 2016, the data policy principles; in 2017, the rules for scientific publications; in 2018, the rules for scientific data management. These documents are available online in Italian, and English (https://data.ingv.it/docs/).According to a preliminary data survey performed between 2016 and 2017, nearly 300 different types of INGV-owned data were identified. In the survey, the compilers were asked to declare all the available scientific data differentiating by the level of intellectual contribution: level 0 identifies raw data generated by fully automated procedures, level 1 identifies data products generated by semi-automated procedures, level 2 is related to data resulting from scientific investigations, and level 3 is associated to integrated data resulting from complex analysis.A Data Management Office (DMO) was established in November 2018 to put the data policy into practice. DMO first goal was to design and establish a Data Registry aimed to satisfy the extremely differentiated requirements of both internal and external users, either at scientific or managerial levels. The Data Registry is defined as a metadata catalogue, i.e., a container of data descriptions, not the data themselves. In addition, the DMO supports other activities dealing with scientific data, such as checking contracts, providing advice to the legal office in case of litigations, interacting with the INGV Data Transparency Office, and in more general terms, supporting the adoption of the Open Science principles.An extensive set of metadata has been identified to accommodate multiple metadata standards. At first, a preliminary set of metadata describing each dataset is compiled by the authors using a web-based interface, then the metadata are validated by the DMO, and finally, a DataCite DOI is minted for each dataset, if not already present. The Data Registry is publicly accessible via a dedicated web portal (https://data.ingv.it). A pilot phase aimed to test the Data Registry was carried out in 2019 and involved a limited number of contributors. To this aim, a top-priority data subset was identified according to the relevance of the data within the mission of INGV and the completeness of already available information. The Directors of the Departments of Earthquakes, Volcanoes, and Environment supervised the selection of the data subset.The pilot phase helped to test and to adjust decisions made and procedures adopted during the planning phase, and allowed us to fine-tune the tools for the data management. During the next year, the Data Registry will enter its production phase and will be open to contributions from all INGV employees.

Download Full-text

The Open-Specimen Movement

BioScience ◽

10.1093/biosci/biaa146 ◽

2020 ◽

Author(s):

Jocelyn P Colella ◽

Ryan B Stephens ◽

Mariel L Campbell ◽

Brooks A Kohli ◽

Danielle J Parsons ◽

...

Keyword(s):

Data Management ◽

Open Data ◽

Open Science ◽

Management Plan ◽

Scientific Data ◽

Primary Data ◽

Explicit Integration ◽

Cultural Shift ◽

Major Barrier ◽

Annual Reporting

Abstract The open-science movement seeks to increase transparency, reproducibility, and access to scientific data. As primary data, preserved biological specimens represent records of global biodiversity critical to research, conservation, national security, and public health. However, a recent decrease in specimen preservation in public biorepositories is a major barrier to open biological science. As such, there is an urgent need for a cultural shift in the life sciences that normalizes specimen deposition in museum collections. Museums embody an open-science ethos and provide long-term research infrastructure through curation, data management and security, and community-wide access to samples and data, thereby ensuring scientific reproducibility and extension. We propose that a paradigm shift from specimen ownership to specimen stewardship can be achieved through increased open-data requirements among scientific journals and institutional requirements for specimen deposition by funding and permitting agencies, and through explicit integration of specimens into existing data management plan guidelines and annual reporting.

Download Full-text

COPO: a metadata platform for brokering FAIR data in the life sciences

10.1101/782771 ◽

2019 ◽

Author(s):

Anthony Etuk ◽

Felix Shaw ◽

Alejandra Gonzalez-Beltran ◽

David Johnson ◽

Marie-Angélique Laporte ◽

...

Keyword(s):

Management Practices ◽

Data Dissemination ◽

Science Research ◽

Publishing Research ◽

Open Science ◽

Scientific Data ◽

Institutional Repositories ◽

Metadata Standards ◽

Science Philosophy ◽

Research Objects

AbstractScientific innovation is increasingly reliant on data and computational resources. Much of today’s life science research involves generating, processing, and reusing heterogeneous datasets that are growing exponentially in size. Demand for technical experts (data scientists and bioinformaticians) to process these data is at an all-time high, but these are not typically trained in good data management practices. That said, we have come a long way in the last decade, with funders, publishers, and researchers themselves making the case for open, interoperable data as a key component of an open science philosophy. In response, recognition of the FAIR Principles (that data should be Findable, Accessible, Interoperable and Reusable) has become commonplace. However, both technical and cultural challenges for the implementation of these principles still exist when storing, managing, analysing and disseminating both legacy and new data.COPO is a computational system that attempts to address some of these challenges by enabling scientists to describe their research objects (raw or processed data, publications, samples, images, etc.) using community-sanctioned metadata sets and vocabularies, and then use public or institutional repositories to share it with the wider scientific community. COPO encourages data generators to adhere to appropriate metadata standards when publishing research objects, using semantic terms to add meaning to them and specify relationships between them. This allows data consumers, be they people or machines, to find, aggregate, and analyse data which would otherwise be private or invisible. Building upon existing standards to push the state of the art in scientific data dissemination whilst minimising the burden of data publication and sharing.AvailabilityCOPO is entirely open source and freely available on GitHub at https://github.com/collaborative-open-plant-omics. A public instance of the platform for use by the community, as well as more information, can be found at copo-project.org.

Download Full-text

CyVerse Austria—A Local, Collaborative Cyberinfrastructure

Mathematical and Computational Applications ◽

10.3390/mca25020038 ◽

2020 ◽

Vol 25 (2) ◽

pp. 38

Author(s):

Konrad Lang ◽

Sarah Stryeck ◽

David Bodruzic ◽

Manfred Stepponat ◽

Slave Trajanoski ◽

...

Keyword(s):

Data Management ◽

Research Process ◽

Open Science ◽

Research Data ◽

Data Reuse ◽

Data Generation ◽

Data Archiving ◽

Metadata Standards ◽

Before And After ◽

Active Research

Life sciences (LS) are advanced in research data management, since LS have established disciplinary tools for data archiving as well as metadata standards for data reuse. However, there is a lack of tools supporting the active research process in terms of data management and data analytics. This leads to tedious and demanding work to ensure that research data before and after publication are FAIR (findable, accessible, interoperable and reusable) and that analyses are reproducible. The initiative CyVerse US from the University of Arizona, US, supports all processes from data generation, management, sharing and collaboration to analytics. Within the presented project, we deployed an independent instance of CyVerse in Graz, Austria (CAT) in frame of the BioTechMed association. CAT helped to enhance and simplify collaborations between the three main universities in Graz. Presuming steps were (i) creating a distributed computational and data management architecture (iRODS-based), (ii) identifying and incorporating relevant data from researchers in LS and (iii) identifying and hosting relevant tools, including analytics software to ensure reproducible analytics using Docker technology for the researchers taking part in the initiative. This initiative supports research-related processes, including data management and analytics for LS researchers. It also holds the potential to serve other disciplines and provides potential for Austrian universities to integrate their infrastructure in the European Open Science Cloud.

Download Full-text

A qualitative analysis of the information science needs of public health researchers in an academic setting

Journal of the Medical Library Association JMLA ◽

10.5195/jmla.2018.316 ◽

2018 ◽

Vol 106 (2) ◽

Cited By ~ 3

Author(s):

Shanda L. Hunt ◽

Caitlin J. Bakker

Keyword(s):

Public Health ◽

Data Management ◽

Information Science ◽

Open Science ◽

Leadership Role ◽

University Of Minnesota ◽

Skill Levels ◽

Skill Sets ◽

Data Policy ◽

The University

Objectives: The University of Minnesota (UMN) Health Sciences Libraries conducted a needs assessment of public health researchers as part of a multi-institutional study led by Ithaka S+R. The aims of the study were to capture the evolving needs, opportunities, and challenges of public health researchers in the current environment and provide actionable recommendations. This paper reports on the data collected at the UMN site.Methods: Participants (n=24) were recruited through convenience sampling. One-on-one interviews, held November 2016 to January 2017, were audio-recorded. Qualitative analyses were conducted using NVivo 11 Pro and were based on the principles of grounded theory.Results: The data revealed that a broad range of skill levels among participants (e.g., literature searching) and areas of misunderstanding (e.g., current publishing landscape, open access options). Overall, data management was an afterthought. Few participants were fully aware of the breadth of librarian knowledge and skill sets, although many did express a desire for further skill development in information science.Conclusions: Libraries can engage more public health researchers by utilizing targeted and individualized marketing regarding services. We can promote open science by educating researchers on publication realities and enhancing our data visualization skills. Libraries might take an institution-wide leadership role on matters of data management and data policy compliance. Finally, as team science emerges as a research priority, we can offer our networking expertise. These support services may reduce the stresses that public health researchers feel in the current research environment.

Download Full-text

COPO: a metadata platform for brokering FAIR data in the life sciences

F1000Research ◽

10.12688/f1000research.23889.1 ◽

2020 ◽

Vol 9 ◽

pp. 495 ◽

Cited By ~ 1

Author(s):

Felix Shaw ◽

Anthony Etuk ◽

Alice Minotto ◽

Alejandra Gonzalez-Beltran ◽

David Johnson ◽

...

Keyword(s):

Management Practices ◽

Data Dissemination ◽

Science Research ◽

Publishing Research ◽

Open Science ◽

Scientific Data ◽

Institutional Repositories ◽

Metadata Standards ◽

Science Philosophy ◽

Research Objects

Scientific innovation is increasingly reliant on data and computational resources. Much of today’s life science research involves generating, processing, and reusing heterogeneous datasets that are growing exponentially in size. Demand for technical experts (data scientists and bioinformaticians) to process these data is at an all-time high, but these are not typically trained in good data management practices. That said, we have come a long way in the last decade, with funders, publishers, and researchers themselves making the case for open, interoperable data as a key component of an open science philosophy. In response, recognition of the FAIR Principles (that data should be Findable, Accessible, Interoperable and Reusable) has become commonplace. However, both technical and cultural challenges for the implementation of these principles still exist when storing, managing, analysing and disseminating both legacy and new data. COPO is a computational system that attempts to address some of these challenges by enabling scientists to describe their research objects (raw or processed data, publications, samples, images, etc.) using community-sanctioned metadata sets and vocabularies, and then use public or institutional repositories to share them with the wider scientific community. COPO encourages data generators to adhere to appropriate metadata standards when publishing research objects, using semantic terms to add meaning to them and specify relationships between them. This allows data consumers, be they people or machines, to find, aggregate, and analyse data which would otherwise be private or invisible, building upon existing standards to push the state of the art in scientific data dissemination whilst minimising the burden of data publication and sharing.

Download Full-text