scholarly journals A Standard for the Scholarly Citation of Archaeological Data as an Incentive to Data Sharing

2017 ◽  
Author(s):  
Ben Marwick ◽  
Suzanne E Pilaar Birch

How do archaeologists share their research data, if at all? We review what data are, according to current influential definitions, and previous work on the benefits, costs and norms of data sharing in the sciences broadly. To understand data sharing in archaeology, we present the results of three pilot studies: requests for data by email; review of data availability in published articles, and analysis of archaeological datasets deposited in repositories. We find that archaeologists are often willing to share, but discipline-wide sharing is patchy and ad hoc. Legislation and mandates are effective at increasing data-sharing, but editorial policies at journals lack adequate enforcement. Although most of data available at repositories are licensed to enable flexible reuse, only a small proportion of the data are stored in structured formats for easy reuse. We present some suggestions for improving the state of date sharing in archaeology, among these is a standard for citing data sets to ensure that researchers making their data publicly available receive appropriate credit.

2018 ◽  
Vol 6 (2) ◽  
pp. 125-143 ◽  
Author(s):  
Ben Marwick ◽  
Suzanne E. Pilaar Birch

ABSTRACTHow do archaeologists share their research data, if at all? We review what data are, according to current influential definitions, and previous work on the benefits, costs, and norms of data sharing in the sciences broadly. To understand data sharing in archaeology, we present the results of three pilot studies: requests for data by e-mail, review of data availability in published articles, and analysis of archaeological datasets deposited in repositories. We find that archaeologists are often willing to share but that discipline-wide sharing is patchy and ad hoc. Legislation and mandates are effective at increasing data sharing, but editorial policies at journals lack adequate enforcement. Although most of the data available at repositories are licensed to enable flexible reuse, only a small proportion of the data are stored in structured formats for easy reuse. We present some suggestions for improving the state of date sharing in archaeology; among these is a standard for citing datasets to ensure that researchers making their data publicly available receive appropriate credit.


PLoS ONE ◽  
2021 ◽  
Vol 16 (5) ◽  
pp. e0250887
Author(s):  
Luke A. McGuinness ◽  
Athena L. Sheppard

Objective To determine whether medRxiv data availability statements describe open or closed data—that is, whether the data used in the study is openly available without restriction—and to examine if this changes on publication based on journal data-sharing policy. Additionally, to examine whether data availability statements are sufficient to capture code availability declarations. Design Observational study, following a pre-registered protocol, of preprints posted on the medRxiv repository between 25th June 2019 and 1st May 2020 and their published counterparts. Main outcome measures Distribution of preprinted data availability statements across nine categories, determined by a prespecified classification system. Change in the percentage of data availability statements describing open data between the preprinted and published versions of the same record, stratified by journal sharing policy. Number of code availability declarations reported in the full-text preprint which were not captured in the corresponding data availability statement. Results 3938 medRxiv preprints with an applicable data availability statement were included in our sample, of which 911 (23.1%) were categorized as describing open data. 379 (9.6%) preprints were subsequently published, and of these published articles, only 155 contained an applicable data availability statement. Similar to the preprint stage, a minority (59 (38.1%)) of these published data availability statements described open data. Of the 151 records eligible for the comparison between preprinted and published stages, 57 (37.7%) were published in journals which mandated open data sharing. Data availability statements more frequently described open data on publication when the journal mandated data sharing (open at preprint: 33.3%, open at publication: 61.4%) compared to when the journal did not mandate data sharing (open at preprint: 20.2%, open at publication: 22.3%). Conclusion Requiring that authors submit a data availability statement is a good first step, but is insufficient to ensure data availability. Strict editorial policies that mandate data sharing (where appropriate) as a condition of publication appear to be effective in making research data available. We would strongly encourage all journal editors to examine whether their data availability policies are sufficiently stringent and consistently enforced.


2020 ◽  
Author(s):  
Luke A McGuinness ◽  
Athena Louise Sheppard

ObjectiveTo determine whether medRxiv data availability statements describe open or closed data - that is, whether the data used in the study is openly available without restriction - and to examine if this changes on publication based on journal data sharing policy. Additionally, to examine whether data availability statements are sufficient to capture code availability declarations.DesignObservational study, following a pre-registered protocol, of preprints posted on the medRxiv repository between 25th June 2019 and 1st May 2020 and their published counterparts.Main outcome measuresDistribution of preprinted data availability statements across eight categories, determined by a prespecified classification system.Change in the percentage of data availability statements describing open data between the preprinted and published versions of the same record, stratified by journal sharing policy.Number of code availability declarations reported in the full-text preprint which were not captured in the corresponding data availability statement.Results4101 medRxiv preprints were included in our sample, of which 911 (22.2%) were categorized as describing open data, 3027 (73.8%) as describing closed data, 163 (4.0%) as not applicable (e.g. editorial, protocol). 379 (9.2%) preprints were subsequently published, and of these published articles, only 159 (42.0%) contained a data availability statement. Similar to the preprint stage, most published data availability statements described closed data (59 (37.1%) open, 96 (60.4%) closed, 4 (2.5%) not applicable).Of the 151 records eligible for the comparison between preprinted and published stages, 57 (37.7%) were published in journals which mandated open data sharing. Data availability statements more frequently described open data on publication when the journal mandated data sharing (open at preprint: 33.3%, open at publication: 61.4%) compared to when the journal did not mandate data sharing (open at preprint: 20.2%, open at publication: 22.3%).ConclusionRequiring that authors submit a data availability statement is a good first step, but is insufficient to ensure data availability. Strict editorial policies that require data sharing (where appropriate) as a condition of publication appear to be effective in making research data available. We would strongly encourage all journal editors to examine whether their data availability policies are sufficiently stringent and consistently enforced.


2021 ◽  
Author(s):  
Iain Hrynaszkiewicz ◽  
James Harney ◽  
Lauren Cadwallader

PLOS has long supported Open Science. One of the ways in which we do so is via our stringent data availability policy established in 2014. Despite this policy, and more data sharing policies being introduced by other organizations, best practices for data sharing are adopted by a minority of researchers in their publications. Problems with effective research data sharing persist and these problems have been quantified by previous research as a lack of time, resources, incentives, and/or skills to share data. In this study we built on this research by investigating the importance of tasks associated with data sharing, and researchers’ satisfaction with their ability to complete these tasks. By investigating these factors we aimed to better understand opportunities for new or improved solutions for sharing data. In May-June 2020 we surveyed researchers from Europe and North America to rate tasks associated with data sharing on (i) their importance and (ii) their satisfaction with their ability to complete them. We received 728 completed and 667 partial responses. We calculated mean importance and satisfaction scores to highlight potential opportunities for new solutions to and compare different cohorts.Tasks relating to research impact, funder compliance, and credit had the highest importance scores. 52% of respondents reuse research data but the average satisfaction score for obtaining data for reuse was relatively low. Tasks associated with sharing data were rated somewhat important and respondents were reasonably well satisfied in their ability to accomplish them. Notably, this included tasks associated with best data sharing practice, such as use of data repositories. However, the most common method for sharing data was in fact via supplemental files with articles, which is not considered to be best practice.We presume that researchers are unlikely to seek new solutions to a problem or task that they are satisfied in their ability to accomplish, even if many do not attempt this task. This implies there are few opportunities for new solutions or tools to meet these researcher needs. Publishers can likely meet these needs for data sharing by working to seamlessly integrate existing solutions that reduce the effort or behaviour change involved in some tasks, and focusing on advocacy and education around the benefits of sharing data. There may however be opportunities - unmet researcher needs - in relation to better supporting data reuse, which could be met in part by strengthening data sharing policies of journals and publishers, and improving the discoverability of data associated with published articles.


2021 ◽  
Author(s):  
lili Zhang ◽  
Himanshu Vashisht ◽  
Andrey Totev ◽  
Nam Trinh ◽  
Tomas Ward

UNSTRUCTURED Deep learning models, especially RNN models, are potentially powerful tools for representing the complex learning processes and decision-making strategies used by humans. Such neural network models make fewer assumptions about the underlying mechanisms thus providing experimental flexibility in terms of applicability. However this comes at the cost of requiring a larger number of tunable parameters requiring significantly more training and representative data for effective learning. This presents practical challenges given that most computational modelling experiments involve relatively small numbers of subjects, which while adequate for conventional modelling using low dimensional parameter spaces, leads to sub-optimal model training when adopting deeper neural network approaches. Laboratory collaboration is a natural way of increasing data availability however, data sharing barriers among laboratories as necessitated by data protection regulations encourage us to seek alternative methods for collaborative data science. Distributed learning, especially federated learning, which supports the preservation of data privacy, is a promising method for addressing this issue. To verify the reliability and feasibility of applying federated learning to train neural networks models used in the characterisation of human decision making, we conducted experiments on a real-world, many-labs data pool including experimentally significant data-sets from ten independent studies. The performance of single models that were trained on single laboratory data-sets was poor, especially those with small numbers of subjects. This unsurprising finding supports the need for larger and more diverse data-sets to train more generalised and reliable models. To that end we evaluated four collaborative approaches for comparison purposes. The first approach represents conventional centralized data sharing (CL-based) and is the optimal approach but requires complete sharing of data which we wish to avoid. The results however establish a benchmark for the other three distributed approaches; federated learning (FL-based), incremental learning (IL-based), and cyclic incremental learning (CIL-based). We evaluate these approaches in terms of prediction accuracy and capacity to characterise human decision-making strategies in the context of the computational modelling experiments considered here. The results demonstrate that the FL-based model achieves performance most comparable to that of a centralized data sharing approach. This demonstrate that federated learning has value in scaling data science methods to data collected in computational modelling contexts in circumstances where data sharing is not convenient, practical or permissible.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Leho Tedersoo ◽  
Rainer Küngas ◽  
Ester Oras ◽  
Kajar Köster ◽  
Helen Eenmaa ◽  
...  

AbstractData sharing is one of the cornerstones of modern science that enables large-scale analyses and reproducibility. We evaluated data availability in research articles across nine disciplines in Nature and Science magazines and recorded corresponding authors’ concerns, requests and reasons for declining data sharing. Although data sharing has improved in the last decade and particularly in recent years, data availability and willingness to share data still differ greatly among disciplines. We observed that statements of data availability upon (reasonable) request are inefficient and should not be allowed by journals. To improve data sharing at the time of manuscript acceptance, researchers should be better motivated to release their data with real benefits such as recognition, or bonus points in grant and job applications. We recommend that data management costs should be covered by funding agencies; publicly available research data ought to be included in the evaluation of applications; and surveillance of data sharing should be enforced by both academic publishers and funders. These cross-discipline survey data are available from the plutoF repository.


2021 ◽  
Author(s):  
lili Zhang ◽  
Himanshu Vashisht ◽  
Andrey Totev ◽  
Nam Trinh ◽  
Tomas Ward

Deep learning models, especially RNN models, are potentially powerful tools for representing the complex learning processes and decision-making strategies used by humans. Such neural network models make fewer assumptions about the underlying mechanisms thus providing experimental flexibility in terms of applicability. However this comes at the cost of requiring a larger number of tunable parameters requiring significantly more training and representative data for effective learning. This presents practical challenges given that most computational modelling experiments involve relatively small numbers of subjects, which while adequate for conventional modelling using low dimensional parameter spaces, leads to sub-optimal model training when adopting deeper neural network approaches. Laboratory collaboration is a natural way of increasing data availability however, data sharing barriers among laboratories as necessitated by data protection regulations encourage us to seek alternative methods for collaborative data science. Distributed learning, especially federated learning, which supports the preservation of data privacy, is a promising method for addressing this issue. To verify the reliability and feasibility of applying federated learning to train neural networks models used in the characterisation of human decision making, we conducted experiments on a real-world, many-labs data pool including experimentally significant data-sets from ten independent studies. The performance of single models that were trained on single laboratory data-sets was poor, especially those with small numbers of subjects. This unsurprising finding supports the need for larger and more diverse data-sets to train more generalised and reliable models. To that end we evaluated four collaborative approaches for comparison purposes. The first approach represents conventional centralized data sharing (CL-based) and is the optimal approach but requires complete sharing of data which we wish to avoid. The results however establish a benchmark for the other three distributed approaches; federated learning (FL-based), incremental learning (IL-based), and cyclic incremental learning (CIL-based). We evaluate these approaches in terms of prediction accuracy and capacity to characterise human decision-making strategies in the context of the computational modelling experiments considered here. The results demonstrate that the FL-based model achieves performance most comparable to that of a centralized data sharing approach. This demonstrate that federated learning has value in scaling data science methods to data collected in computational modelling contexts in circumstances where data sharing is not convenient, practical or permissible.


2021 ◽  
Vol 6 ◽  
pp. 355
Author(s):  
Helen Buckley Woods ◽  
Stephen Pinfield

Background: Numerous mechanisms exist to incentivise researchers to share their data. This scoping review aims to identify and summarise evidence of the efficacy of different interventions to promote open data practices and provide an overview of current research. Methods: This scoping review is based on data identified from Web of Science and LISTA, limited from 2016 to 2021. A total of 1128 papers were screened, with 38 items being included. Items were selected if they focused on designing or evaluating an intervention or presenting an initiative to incentivise sharing. Items comprised a mixture of research papers, opinion pieces and descriptive articles. Results: Seven major themes in the literature were identified: publisher/journal data sharing policies, metrics, software solutions, research data sharing agreements in general, open science ‘badges’, funder mandates, and initiatives. Conclusions: A number of key messages for data sharing include: the need to build on existing cultures and practices, meeting people where they are and tailoring interventions to support them; the importance of publicising and explaining the policy/service widely; the need to have disciplinary data champions to model good practice and drive cultural change; the requirement to resource interventions properly; and the imperative to provide robust technical infrastructure and protocols, such as labelling of data sets, use of DOIs, data standards and use of data repositories.


Author(s):  
Kimberlyn McGrail ◽  
Michael Burgess ◽  
Kieran O'Doherty ◽  
Colene Bentley ◽  
Jack Teng

IntroductionResearch using linked data sets can lead to new insights and discoveries that positively impact society. However, the use of linked data raises concerns relating to illegitimate use, privacy, and security (e.g., identity theft, marginalization of some groups). It is increasingly recognized that the public needs to be consulted to develop data access systems that consider both the potential benefits and risks of research. Indeed, there are examples of data sharing projects being derailed because of backlash in the absence of adequate consultation. (e.g., care.data in the UK). Objectives and methodsThis talk will describe the results of public deliberations held in Vancouver, British Columbia in April 2018 and the fall of 2019. The purpose of these events was to develop informed and civic-minded public advice regarding the use and the sharing of linked data for research in the context of rapidly evolving data availability and researcher aspirations. ResultsIn the first deliberation, participants developed and voted on 19 policy-relevant statements. Taken together, these statements provide a broad view of public support and concerns regarding the use of linked data sets for research and offer guidance on measures that can be taken to improve the trustworthiness of policies and process around data sharing and use. The second deliberation will focus on the interplay between public and private sources of data, and role of individual and collective or community consent I the future. ConclusionGenerally, participants were supportive of research using linked data because of the value such uses can provide to society. Participants expressed a desire to see the data access request process made more efficient to facilitate more research, as long as there are adequate protections in place around security and privacy of the data. These protections include both physical and process-related safeguards as well as a high degree of transparency.


Sign in / Sign up

Export Citation Format

Share Document