Sharing research data to comply with a journal policy: Experience of a first-time depositor

Background Journals in health sciences increasingly require or recommend that authors deposit the data from their research in open repositories. The rationale for publicly available data is well understood, but many researchers lack the time, knowledge, and skills to do it well, if at all. There are few descriptions of the pragmatic process a researcher author undertakes to complete the open data deposit in the literature. When my manuscript for a mixed methods study was accepted by a journal that required shared data as condition of publication, I proceeded to comply despite uncertainty with the process. Purpose The purpose of this work is to describe the experience of an information science researcher and first-time data depositor to complete an open data deposit. The narrative illustrates the questions encountered and choices made in the process. Process Methods To begin the data deposit process, I found guidance from the accepting journal’s policy and rationale for its shared data requirement. A checklist of pragmatic steps from an open repository provided a framework used to outline and organize the process. Process steps included organizing data files, preparing documentation, determining rights and licensing, and determining sharing and permissions. Choices and decisions included which data versions to share, how much data to share, repository choice, and file naming. Processes and decisions varied between the quantitative and qualitative data prepared. Results Two datasets and documentation for each were deposited in the Figshare open repository, thus meeting the journal policy requirements to deposit sufficient data and documentation to replicate the results reported in the journal article, and also meeting the deadline to include a Data Availability Statement with the published article. Conclusion This experience illustrated some practical data sharing issues faced by a librarian author seeking to comply with a journal data sharing policy requirement for publication of an accepted manuscript. Both novice data depositors and data librarians may find this individual experience useful for their own work and the advice they give to others.

Download Full-text

Data Dentistry: How Data Are Changing Clinical Care and Research

Journal of Dental Research ◽

10.1177/00220345211020265 ◽

2021 ◽

pp. 002203452110202

Author(s):

F. Schwendicke ◽

J. Krois

Keyword(s):

Health Care ◽

Data Sharing ◽

Clinical Care ◽

Open Data ◽

User Interaction ◽

Data Availability ◽

Related Data ◽

Data User ◽

Regulatory Data ◽

Consumer Data

Data are a key resource for modern societies and expected to improve quality, accessibility, affordability, safety, and equity of health care. Dental care and research are currently transforming into what we term data dentistry, with 3 main applications: 1) medical data analysis uses deep learning, allowing one to master unprecedented amounts of data (language, speech, imagery) and put them to productive use. 2) Data-enriched clinical care integrates data from individual (e.g., demographic, social, clinical and omics data, consumer data), setting (e.g., geospatial, environmental, provider-related data), and systems level (payer or regulatory data to characterize input, throughput, output, and outcomes of health care) to provide a comprehensive and continuous real-time assessment of biologic perturbations, individual behaviors, and context. Such care may contribute to a deeper understanding of health and disease and a more precise, personalized, predictive, and preventive care. 3) Data for research include open research data and data sharing, allowing one to appraise, benchmark, pool, replicate, and reuse data. Concerns and confidence into data-driven applications, stakeholders’ and system’s capabilities, and lack of data standardization and harmonization currently limit the development and implementation of data dentistry. Aspects of bias and data-user interaction require attention. Action items for the dental community circle around increasing data availability, refinement, and usage; demonstrating safety, value, and usefulness of applications; educating the dental workforce and consumers; providing performant and standardized infrastructure and processes; and incentivizing and adopting open data and data sharing.

Download Full-text

Open Data Policies among Library and Information Science Journals

Publications ◽

10.3390/publications9020025 ◽

2021 ◽

Vol 9 (2) ◽

pp. 25

Author(s):

Brian Jackson

Keyword(s):

Information Science ◽

Open Data ◽

Research Data ◽

Data Availability ◽

Academic Publishing ◽

Library And Information Science ◽

Data Archiving ◽

Open Research ◽

Open Access Journals ◽

Public Data

Journal publishers play an important role in the open research data ecosystem. Through open data policies that include public data archiving mandates and data availability statements, journal publishers help promote transparency in research and wider access to a growing scholarly record. The library and information science (LIS) discipline has a unique relationship with both open data initiatives and academic publishing and may be well-positioned to adopt rigorous open data policies. This study examines the information provided on public-facing websites of LIS journals in order to describe the extent, and nature, of open data guidance provided to prospective authors. Open access journals in the discipline have disproportionately adopted detailed, strict open data policies. Commercial publishers, which account for the largest share of publishing in the discipline, have largely adopted weaker policies. Rigorous policies, adopted by a minority of journals, describe the rationale, application, and expectations for open research data, while most journals that provide guidance on the matter use hesitant and vague language. Recommendations are provided for strengthening journal open data policies.

Download Full-text

A descriptive analysis of the data availability statements accompanying medRxiv preprints and a comparison with their published counterparts

PLoS ONE ◽

10.1371/journal.pone.0250887 ◽

2021 ◽

Vol 16 (5) ◽

pp. e0250887

Author(s):

Luke A. McGuinness ◽

Athena L. Sheppard

Keyword(s):

Data Sharing ◽

Descriptive Analysis ◽

Open Data ◽

System Change ◽

Research Data ◽

Data Availability ◽

Published Data ◽

Editorial Policies ◽

Journal Editors ◽

Closed Data

Objective To determine whether medRxiv data availability statements describe open or closed data—that is, whether the data used in the study is openly available without restriction—and to examine if this changes on publication based on journal data-sharing policy. Additionally, to examine whether data availability statements are sufficient to capture code availability declarations. Design Observational study, following a pre-registered protocol, of preprints posted on the medRxiv repository between 25th June 2019 and 1st May 2020 and their published counterparts. Main outcome measures Distribution of preprinted data availability statements across nine categories, determined by a prespecified classification system. Change in the percentage of data availability statements describing open data between the preprinted and published versions of the same record, stratified by journal sharing policy. Number of code availability declarations reported in the full-text preprint which were not captured in the corresponding data availability statement. Results 3938 medRxiv preprints with an applicable data availability statement were included in our sample, of which 911 (23.1%) were categorized as describing open data. 379 (9.6%) preprints were subsequently published, and of these published articles, only 155 contained an applicable data availability statement. Similar to the preprint stage, a minority (59 (38.1%)) of these published data availability statements described open data. Of the 151 records eligible for the comparison between preprinted and published stages, 57 (37.7%) were published in journals which mandated open data sharing. Data availability statements more frequently described open data on publication when the journal mandated data sharing (open at preprint: 33.3%, open at publication: 61.4%) compared to when the journal did not mandate data sharing (open at preprint: 20.2%, open at publication: 22.3%). Conclusion Requiring that authors submit a data availability statement is a good first step, but is insufficient to ensure data availability. Strict editorial policies that mandate data sharing (where appropriate) as a condition of publication appear to be effective in making research data available. We would strongly encourage all journal editors to examine whether their data availability policies are sufficiently stringent and consistently enforced.

Download Full-text

Paving the Way to Open Data

Data Intelligence ◽

10.1162/dint_a_00021 ◽

2019 ◽

Vol 1 (4) ◽

pp. 368-380 ◽

Cited By ~ 1

Author(s):

Yan Wu ◽

Elizabeth Moylan ◽

Hope Inman ◽

Chris Graf

Keyword(s):

Data Sharing ◽

Open Data ◽

Data Availability ◽

Peer Reviews ◽

Open Research ◽

Research Communities ◽

The Way

It is easy to argue that open data are critical to enabling faster and more effective research discovery. In this article, we describe the approach we have taken at Wiley to support open data and to start enabling more data to be FAIR data (Findable, Accessible, Interoperable and Reusable) with the implementation of four data policies: “Encourages”, “Expects”, “Mandates” and “Mandates and Peer Reviews Data”. We describe the rationale for these policies and levels of adoption so far. In the coming months we plan to measure and monitor the implementation of these policies via the publication of data availability statements and data citations. With this information, we'll be able to celebrate adoption of data-sharing practices by the research communities we work with and serve, and we hope to showcase researchers from those communities leading in open research.

Download Full-text

Initiating FAIR geothermal data in Indonesia

10.5194/egusphere-egu21-14438 ◽

2021 ◽

Author(s):

Dasapta Erwin Irawan

Keyword(s):

Data Sharing ◽

Data Exchange ◽

Open Data ◽

Data Reuse ◽

Data Availability ◽

Scientific Development ◽

For Profit ◽

Deep Well ◽

Data Schema ◽

Corporate Social

One of the main keys to scientific development is data availability. Not only the data is easily discovered and downloaded, there's also needs for the data to be easily reused. Geothermal researchers, research institutions and industries are the three main stakeholders to foster data sharing and data reuse. Very expensive deep well datasets as well as advanced logging datasets are very important not only for exploitation purposes but also for the community involved eg: for regional planning or common environmental analyses. In data sharing, we have four principles of F.A.I.R data. Principle 1 Findable: data uploaded to open repository with proper data documentations and data schema, Principle 2 Accessible: removed access restrictions such as user id and password for easy downloads. In case of data from commercial entities, embargoed data is permitted with a clear embargo duration and data request procedure, Principle 3 Interoperable: all data must be prepared in a manner for straightforward data exchange between platforms, Principle 4 Reusable: all data must be submitted using common conventional file format, preferably text-based file (eg `csv` or `txt`) therefore it can be analyzed using various software and hardware. The fact that geothermal industries are packed with for-profit motivations and capital intensive would give even more reasons to embrace data sharing. It would be a good way for them to share their role in supporting society. The contributions from multiple stakeholders are the most essential part in science development. In the context of the commercial industry, data sharing is a form of corporate social responsibility (CSR). It shouldn't be defined only as giving out funding to support local communities.Keywords: open data, FAIR data, data sharing&#160;&#160;

Download Full-text

Data availability, reusability, and analytic reproducibility: Evaluating the impact of a mandatory open data policy at the journal Cognition

10.31222/osf.io/39cfb ◽

2018 ◽

Cited By ~ 3

Author(s):

Tom Elis Hardwicke ◽

Maya B Mathur ◽

Kyle Earl MacDonald ◽

Gustav Nilsonne ◽

George Christopher Banks ◽

...

Keyword(s):

Data Sharing ◽

Open Data ◽

Interrupted Time Series ◽

Data Availability ◽

Quality Of Data ◽

Critical Feature ◽

Target Values ◽

Data Policy ◽

Access To Data ◽

The Impact

Access to data is a critical feature of an efficient, progressive, and ultimately self-correcting scientific ecosystem. But the extent to which in-principle benefits of data sharing are realized in practice is unclear. Crucially, it is largely unknown whether published findings can be reproduced by repeating reported analyses upon shared data (“analytic reproducibility”). To investigate, we conducted an observational evaluation of a mandatory open data policy introduced at the journal Cognition. Interrupted time-series analyses indicated a substantial post-policy increase in data available statements (104/417, 25% pre-policy to 136/174, 78% post-policy), although not all data appeared reusable (23/104, 22% pre-policy to 85/136, 62%, post-policy). For 35 of the articles determined to have reusable data, we attempted to reproduce 1324 target values. Ultimately, 64 values could not be reproduced within a 10% margin of error. For 22 articles all target values were reproduced, but 11 of these required author assistance. For 13 articles at least one value could not be reproduced despite author assistance. Importantly there were no clear indications that original conclusions were seriously impacted. Mandatory open data policies can increase the frequency and quality of data sharing. However, suboptimal data curation, unclear analysis specification, and reporting errors can impede analytic reproducibility, undermining the utility of data sharing and the credibility of scientific findings.

Download Full-text

Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition

Royal Society Open Science ◽

10.1098/rsos.180448 ◽

2018 ◽

Vol 5 (8) ◽

pp. 180448 ◽

Cited By ~ 41

Author(s):

Tom E. Hardwicke ◽

Maya B. Mathur ◽

Kyle MacDonald ◽

Gustav Nilsonne ◽

George C. Banks ◽

...

Keyword(s):

Data Sharing ◽

Open Data ◽

Interrupted Time Series ◽

Data Availability ◽

Quality Of Data ◽

Critical Feature ◽

Target Values ◽

Data Policy ◽

Access To Data ◽

The Impact

Access to data is a critical feature of an efficient, progressive and ultimately self-correcting scientific ecosystem. But the extent to which in-principle benefits of data sharing are realized in practice is unclear. Crucially, it is largely unknown whether published findings can be reproduced by repeating reported analyses upon shared data (‘analytic reproducibility’). To investigate this, we conducted an observational evaluation of a mandatory open data policy introduced at the journal Cognition . Interrupted time-series analyses indicated a substantial post-policy increase in data available statements (104/417, 25% pre-policy to 136/174, 78% post-policy), although not all data appeared reusable (23/104, 22% pre-policy to 85/136, 62%, post-policy). For 35 of the articles determined to have reusable data, we attempted to reproduce 1324 target values. Ultimately, 64 values could not be reproduced within a 10% margin of error. For 22 articles all target values were reproduced, but 11 of these required author assistance. For 13 articles at least one value could not be reproduced despite author assistance. Importantly, there were no clear indications that original conclusions were seriously impacted. Mandatory open data policies can increase the frequency and quality of data sharing. However, suboptimal data curation, unclear analysis specification and reporting errors can impede analytic reproducibility, undermining the utility of data sharing and the credibility of scientific findings.

Download Full-text

Data Reuse and the Social Capital of Open Science

10.1101/093518 ◽

2016 ◽

Author(s):

Bradly Alicea

Keyword(s):

Game Theory ◽

Social Capital ◽

Data Sharing ◽

Open Data ◽

Open Science ◽

Payoff Matrix ◽

Reciprocal Relationship ◽

Data Reuse ◽

Data Set ◽

Shared Data

ABSTRACTParticipation in open data initiatives require two semi-independent actions: the sharing of data produced by a researcher or group, and a consumer of shared data. Consumers of shared data range from people interested in validating the results of a given study to people who actively transform the available data. These data transformers are of particular interest because they add value to the shared data set through the discovery of new relationships and information which can in turn be shared with the same community. The complex and often reciprocal relationship between producers and consumers can be better understood using game theory, namely by using three variations of the Prisoners’ Dilemma (PD): a classical PD payoff matrix, a simulation of the PD n-person iterative model that tests three hypotheses, and an Ideological Game Theory (IGT) model used to formulate how sharing strategies might be implemented in a specific institutional culture. To motivate these analyses, data sharing is presented as a trade-off between economic and social payoffs. This is demonstrated as a series of payoff matrices describing situations ranging from ubiquitous acceptance of Open Science principles to a community standard of complete non-cooperation. Further context is provided through the IGT model, which allows from the modeling of cultural biases and beliefs that influence open science decision-making. A vision for building a CC-BY economy are then discussed using an approach called econosemantics, which complements the treatment of data sharing as a complex system of transactions enabled by social capital.

Download Full-text

Is useful research data usually shared? An investigation of genome-wide association study summary statistics

10.1101/622795 ◽

2019 ◽

Author(s):

Mike A. Thelwall ◽

Marcus Munafò ◽

Amalia Mas Bleda ◽

Emma Stuart ◽

Meiko Makita ◽

...

Keyword(s):

Data Sharing ◽

Association Studies ◽

Data Availability ◽

Genome Wide Association ◽

Exact Nature ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Standard Format ◽

Shared Data ◽

Genome Wide

AbstractPrimary data collected during a research study is increasingly shared and may be re-used for new studies. To assess the extent of data sharing in favourable circumstances and whether such checks can be automated, this article investigates the summary statistics of primary human genome-wide association studies (GWAS). This type of data is highly suitable for sharing because it is a standard research output, is straightforward to use in future studies (e.g., for secondary analysis), and may be already stored in a standard format for internal sharing within multi-site research projects. Manual checks of 1799 articles from 2010 and 2017 matching a simple PubMed query for molecular epidemiology GWAS were used to identify 330 primary human GWAS papers. Of these, only 10.6% reported the location of a complete set of GWAS summary data, increasing from 4.3% in 2010 to 16.8% in 2017. Whilst information about whether data was shared was usually located clearly within a data availability statement, the exact nature of the shared data was usually unspecified. Thus, data sharing is the exception even in suitable research fields with relatively strong norms regarding data sharing. Moreover, the lack of clear data descriptions within data sharing statements greatly complicates the task of automatically characterising shared data sets.

Download Full-text

Open science: The open clinical trials data journey

Clinical Trials ◽

10.1177/1740774519865512 ◽

2019 ◽

Vol 16 (5) ◽

pp. 539-546 ◽

Cited By ~ 2

Author(s):

Frank Rockhold ◽

Christina Bromley ◽

Erin K Wagner ◽

Marc Buyse

Keyword(s):

Data Sharing ◽

Data Privacy ◽

Resource Constraints ◽

Open Data ◽

Data Access ◽

Open Science ◽

Current Data ◽

Shared Data ◽

Privacy Issues ◽

And Training

Open data sharing and access has the potential to promote transparency and reproducibility in research, contribute to education and training, and prompt innovative secondary research. Yet, there are many reasons why researchers don’t share their data. These include, among others, time and resource constraints, patient data privacy issues, lack of access to appropriate funding, insufficient recognition of the data originators’ contribution, and the concern that commercial or academic competitors may benefit from analyses based on shared data. Nevertheless, there is a positive interest within and across the research and patient communities to create shared data resources. In this perspective, we will try to highlight the spectrum of “openness” and “data access” that exists at present and highlight the strengths and weakness of current data access platforms, present current examples of data sharing platforms, and propose guidelines to revise current data sharing practices going forward.

Download Full-text