Achieving human and machine accessibility of cited data in scholarly publications

Reproducibility and reusability of research results is an important concern in scientific communication and science policy. A foundational element of reproducibility and reusability is the open and persistently available presentation of research data. However, many common approaches for primary data publication in use today do not achieve sufficient long-term robustness, openness, accessibility or uniformity. Nor do they permit comprehensive exploitation by modern Web technologies. This has led to several authoritative studies recommending uniform direct citation of data archived in persistent repositories. Data are to be considered as first-class scholarly objects, and treated similarly in many ways to cited and archived scientific and scholarly literature. Here we briefly review the most current and widely agreed set of principle-based recommendations for scholarly data citation, the Joint Declaration of Data Citation Principles (JDDCP). We then present a framework for operationalizing the JDDCP; and a set of initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data. The main target audience for the common implementation guidelines in this article consists of publishers, scholarly organizations, and persistent data repositories, including technical staff members in these organizations. But ordinary researchers can also benefit from these recommendations. The guidance provided here is intended to help achieve widespread, uniform human and machine accessibility of deposited data, in support of significantly improved verification, validation, reproducibility and re-use of scholarly/scientific data.

Download Full-text

Achieving human and machine accessibility of cited data in scholarly publications

10.7287/peerj.preprints.697v3 ◽

2015 ◽

Author(s):

Joan Starr ◽

Eleni Castro ◽

Mercè Crosas ◽

Michel Dumontier ◽

Robert R. Downs ◽

...

Keyword(s):

Scientific Data ◽

Primary Data ◽

Data Repositories ◽

Staff Members ◽

Joint Declaration ◽

Scholarly Publications ◽

Comprehensive Exploitation ◽

Data Citation ◽

Persistent Data ◽

Scholarly Data

Reproducibility and reusability of research results is an important concern in scientific communication and science policy. A foundational element of reproducibility and reusability is the open and persistently available presentation of research data. However, many common approaches for primary data publication in use today do not achieve sufficient long-term robustness, openness, accessibility or uniformity. Nor do they permit comprehensive exploitation by modern Web technologies. This has led to several authoritative studies recommending uniform direct citation of data archived in persistent repositories. Data are to be considered as first-class scholarly objects, and treated similarly in many ways to cited and archived scientific and scholarly literature. Here we briefly review the most current and widely agreed set of principle-based recommendations for scholarly data citation, the Joint Declaration of Data Citation Principles (JDDCP). We then present a framework for operationalizing the JDDCP; and a set of initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data. The main target audience for the common implementation guidelines in this article consists of publishers, scholarly organizations, and persistent data repositories, including technical staff members in these organizations. But ordinary researchers can also benefit from these recommendations. The guidance provided here is intended to help achieve widespread, uniform human and machine accessibility of deposited data, in support of significantly improved verification, validation, reproducibility and re-use of scholarly/scientific data.

Download Full-text

Achieving human and machine accessibility of cited data in scholarly publications

10.7287/peerj.preprints.697v1 ◽

2014 ◽

Author(s):

Joan Starr ◽

Eleni Castro ◽

Mercè Crosas ◽

Michel Dumontier ◽

Robert R. Downs ◽

...

Keyword(s):

Short Article ◽

Joint Declaration ◽

Cross Domain ◽

Widespread Adoption ◽

Scholarly Publications ◽

Data Citation ◽

Scholarly Data ◽

Operational Guidance

This short article provides operational guidance on implementing scholarly data citation and data deposition, in conformance with the Joint Declaration of Data Citation Principles (JDDCP, http://force11.org/datacitation) to help achieve widespread, uniform human and machine accessibility of deposited data. The JDDCP is the outcome of a cross-domain effort to establish core principles around cited data in scholarly publications. It deals with important issues in identification, deposition, description, accessibility, persistence, and evidential status of cited data. Eighty-five scholarly, governmental, and funding institutions have now endorsed the JDDCP. The purpose of this article is to provide the necessary guidance for JDDCP-endorsing organizations to implement these principles and to achieve their widespread adoption.

Download Full-text

A Data Citation Roadmap for Scholarly Data Repositories

10.1101/097196 ◽

2016 ◽

Cited By ~ 15

Author(s):

Martin Fenner ◽

Mercè Crosas ◽

Jeffrey Grethe ◽

David Kennedy ◽

Henning Hermjakob ◽

...

Keyword(s):

Science Policy ◽

Expert Group ◽

Data Repositories ◽

Data Publication ◽

Link Type ◽

Joint Declaration ◽

Data Citation ◽

Scholarly Data ◽

Three Phases

AbstractThis article presents a practical roadmap for scholarly data repositories to implement data citation in accordance with the Joint Declaration of Data Citation Principles, a synopsis and harmonization of the recommendations of major science policy bodies. The roadmap was developed by the Repositories Expert Group, as part of the Data Citation Implementation Pilot (DCIP) project, an initiative of FORCE11.org and the NIH BioCADDIE (https://biocaddie.org) program. The roadmap makes 11 specific recommendations, grouped into three phases of implementation: a) required steps needed to support the Joint Declaration of Data Citation Principles, b) recommended steps that facilitate article/data publication workflows, and c) optional steps that further improve data citation support provided by data repositories.

Download Full-text

Achieving human and machine accessibility of cited data in scholarly publications

10.7287/peerj.preprints.697v2 ◽

2014 ◽

Cited By ~ 1

Author(s):

Joan Starr ◽

Eleni Castro ◽

Mercè Crosas ◽

Michel Dumontier ◽

Robert R. Downs ◽

...

Keyword(s):

Short Article ◽

Joint Declaration ◽

Cross Domain ◽

Widespread Adoption ◽

Scholarly Publications ◽

Data Citation ◽

Scholarly Data ◽

Operational Guidance

This short article provides operational guidance on implementing scholarly data citation and data deposition, in conformance with the Joint Declaration of Data Citation Principles (JDDCP, http://force11.org/datacitation) to help achieve widespread, uniform human and machine accessibility of deposited data. The JDDCP is the outcome of a cross-domain effort to establish core principles around cited data in scholarly publications. It deals with important issues in identification, deposition, description, accessibility, persistence, and evidential status of cited data. Eighty-five scholarly, governmental, and funding institutions have now endorsed the JDDCP. The purpose of this article is to provide the necessary guidance for JDDCP-endorsing organizations to implement these principles and to achieve their widespread adoption.

Download Full-text

Beyond article publishing - support and opportunities for researchers in FAIR data sharing

10.5194/egusphere-egu2020-17073 ◽

2020 ◽

Author(s):

Graham Smith ◽

Andrew Hufton

Keyword(s):

Data Sharing ◽

San Francisco ◽

Scientific Data ◽

Research Data ◽

Research Assessment ◽

Environmental Sciences ◽

Data Repositories ◽

The Earth ◽

Data Citation ◽

The Impact

Researchers are increasingly expected by funders and journals to make their data available for reuse as a condition of publication. At Springer Nature, we feel that publishers must support researchers in meeting these additional requirements, and must recognise the distinct opportunities data holds as a research output. Here, we outline some of the varied ways that Springer Nature supports research data sharing and report on key outcomes.Our staff and journals are closely involved with community-led efforts, like the Enabling FAIR Data initiative and the COPDESS 2014 Statement of Commitment 1-4. The Enabling FAIR Data initiative, which was endorsed in January 2019 by Nature and Scientific Data, and by Nature Geoscience in January 2020, establishes a clear expectation that Earth and environmental sciences data should be deposited in FAIR5 Data-aligned community repositories, when available (and in general purpose repositories otherwise). In support of this endorsement, Nature and Nature Geoscience require authors to share and deposit their Earth and environmental science data, and Scientific Data has committed to progressively updating its list of recommended data repositories to help authors comply with this mandate.In addition, we offer a range of research data services, with various levels of support available to researchers in terms of data curation, expert guidance on repositories and linking research data and publications.We appreciate that researchers face potentially challenging requirements in terms of the &#8216;what&#8217;, &#8216;where&#8217; and &#8216;how&#8217; of sharing research data. This can be particularly difficult for researchers to negotiate given that huge diversity of policies across different journals. We have therefore developed a series of standardised data policies, which have now been adopted by more than 1,600 Springer Nature journals.&#160;We believe that these initiatives make important strides in challenging the current replication crisis and addressing the economic6 and societal consequences of data unavailability. They also offer an opportunity to drive change in how academic credit is measured, through the recognition of a wider range of research outputs than articles and their citations alone. As signatories of the San Francisco Declaration on Research Assessment7, Nature Research is committed to improving the methods of evaluating scholarly research. Research data in this context offers new mechanisms to measure the impact of all research outputs. To this end, Springer Nature supports the publication of peer-reviewed data papers through journals like Scientific Data. Analysis of citation patterns demonstrate that data papers can be well-cited, and offer a viable way for researchers to receive credit for data sharing through traditional citation metrics. Springer Nature is also working hard to improve support for direct data citation. In 2018 a data citation roadmap developed by the Publishers Early Adopters Expert Group was published in Scientific Data8, outlining practical steps for publishers to work with data citations and associated benefits in transparency and credit for researchers. Using examples from this roadmap, its implementation and supporting services, we outline how a FAIR-led data approach from publishers can help researchers in the Earth and environmental sciences to capitalise on new expectations around data sharing.__<ol><li>https://doi.org/10.1038/d41586-019-00075-3</li> <li>https://doi.org/10.1038/s41561-019-0506-4</li> <li>https://copdess.org/enabling-fair-data-project/commitment-statement-in-the-earth-space-and-environmental-sciences/</li> <li>https://copdess.org/statement-of-commitment/</li> <li>https://www.force11.org/group/fairgroup/fairprinciples</li> <li>https://op.europa.eu/en/publication-detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1</li> <li>https://sfdora.org/read/</li> <li>https://doi.org/10.1038/sdata.2018.259</li> </ol>

Download Full-text

Data reuse and scholarly reward: understanding practice and building infrastructure

10.7287/peerj.preprints.14v1 ◽

2013 ◽

Cited By ~ 1

Author(s):

Todd J Vision ◽

Heather A Piwowar

Keyword(s):

Data Reuse ◽

Data Availability ◽

Primary Data ◽

Funding Agency ◽

Reliable Estimate ◽

Data Repositories ◽

Data Archiving ◽

Research Software ◽

Data Citation ◽

Total Impact

Recently introduced funding agency policies seek to increase the availability of data from individual published studies for reuse by the research community at large. The success of such policies can be measured both by data input (“is useful data being made available?”) and research output (“are these data being reused by others?”). A key determinant of data input is the extent to which data producers receive adequate professional credit for making data available. One of us (HP) previously reported a large citation difference for published microarray studies with and without data available in a public repository. Analysis of a much larger sample, with more covariates, provides a more reliable estimate of this citation boost, as well as additional insights into patterns of reuse and how the availability of data affects publication impact. A more recent study tracking the reuse of 100 datasets from each of ten different primary data repositories reveals large variation in patterns of reuse and citation. Our findings (a) illuminate ways in which the reuses of archived data tend to differ in purpose from that of the original producers; (b) inform data archiving policy, such as how long data embargoes need to be in order to protect the proprietary interests of producers; (c) and allow us to answer the vexing question of what the return on investment is for data archiving. In conducting these studies, we have become aware of gaps in data citation practice and infrastructure that limit the extent to which researchers receive credit for their contributions. We describe early efforts to bake good data citation and usage tracking into cyberinfrastructure as part of DataONE, the Data Observation Network for Earth. Finally, we introduce total-impact, a tool that allows researchers to track the diverse impacts of all their research outputs, including data, and empowers them to be recognized for their scholarly work on their own terms. Software and Data Availability: Research software and data: https://github.com/hpiwowar (CCZero for data where possible, MIT for code); Dryad: new BSD license: http://code.google.com/p/dryad; DataONE: Apache license: http://www.dataone.org/developer-resources; total-impact: MIT license: https://github.com/total-impact. This is an abstract that was submitted to the iEvoBio 2012 conference, held on July 10-11, 2012, in Ottawa, Canada.

Download Full-text

Data Management and Data Sharing in Psychological Science: Revision of the DGPs Recommendations

10.31234/osf.io/24ncs ◽

2020 ◽

Cited By ~ 1

Author(s):

Mario Gollwitzer ◽

Andrea Abele-Brehm ◽

Christian Fiebach ◽

Roland Ramthun ◽

Anne M. Scheel ◽

...

Keyword(s):

Data Management ◽

Data Sharing ◽

Data Protection ◽

Scientific Practice ◽

Data Access ◽

Scientific Data ◽

Research Data ◽

Primary Data ◽

Practical Implementation ◽

Data Repositories

Providing access to research data collected as part of scientific publications and publicly funded research projects is now regarded as a central aspect of an open and transparent scientific practice and is increasingly being called for by funding institutions and scientific journals. To this end, researchers should strive to comply with the so-called FAIR principles (of scientific data management), that is, research data should be findable, accessible, interoperable, and reusable. Systematic data management supports these goals and, at the same time, makes it possible to achieve them efficiently. With these revised recommendations on data management and data sharing, which also draw on feedback from a 2018 survey of its members, the German Psychological Society (Deutsche Gesellschaft für Psychologie; DGPs) specifies important basic principles of data management in psychology. Initially, based on discipline-specific definitions of raw data, primary data, secondary data, and metadata, we provide recommendations on the degree of data processing necessary when publishing data. We then discuss data protection as well as aspects of copyright and data usage before defining the qualitative requirements for trustworthy research data repositories. This is followed by a detailed discussion of pragmatic aspects of data sharing, such as the differences between Type 1 and Type 2 data publications, restrictions on use (embargo period), the definition of "scientific use" by secondary users of shared data, and recommendations on how to resolve potential disputes. Particularly noteworthy is the new recommendation of distinct "access categories" for data, each with different requirements in terms of data protection or research ethics. These range from completely open data without usage restrictions ("access category 0") to data shared under a set of standardized conditions (e.g., reuse restricted to scientific purposes; "access category 1"), individualized usage agreements ("access category 2"), and secure data access under strictly controlled conditions (e.g., in a research data center; “access category 3"). The practical implementation of this important innovation, however, will require data repositories to provide the necessary technical functionalities. In summary, the revised recommendations aim to present pragmatic guidelines for researchers to handle psychological research data in an open and transparent manner, while addressing structural challenges to data sharing solutions that are beneficial for all involved parties.

Download Full-text

Data reuse and scholarly reward: understanding practice and building infrastructure

10.7287/peerj.preprints.14 ◽

2013 ◽

Author(s):

Todd J Vision ◽

Heather A Piwowar

Keyword(s):

Data Reuse ◽

Data Availability ◽

Primary Data ◽

Funding Agency ◽

Reliable Estimate ◽

Data Repositories ◽

Data Archiving ◽

Research Software ◽

Data Citation ◽

Total Impact

Recently introduced funding agency policies seek to increase the availability of data from individual published studies for reuse by the research community at large. The success of such policies can be measured both by data input (“is useful data being made available?”) and research output (“are these data being reused by others?”). A key determinant of data input is the extent to which data producers receive adequate professional credit for making data available. One of us (HP) previously reported a large citation difference for published microarray studies with and without data available in a public repository. Analysis of a much larger sample, with more covariates, provides a more reliable estimate of this citation boost, as well as additional insights into patterns of reuse and how the availability of data affects publication impact. A more recent study tracking the reuse of 100 datasets from each of ten different primary data repositories reveals large variation in patterns of reuse and citation. Our findings (a) illuminate ways in which the reuses of archived data tend to differ in purpose from that of the original producers; (b) inform data archiving policy, such as how long data embargoes need to be in order to protect the proprietary interests of producers; (c) and allow us to answer the vexing question of what the return on investment is for data archiving. In conducting these studies, we have become aware of gaps in data citation practice and infrastructure that limit the extent to which researchers receive credit for their contributions. We describe early efforts to bake good data citation and usage tracking into cyberinfrastructure as part of DataONE, the Data Observation Network for Earth. Finally, we introduce total-impact, a tool that allows researchers to track the diverse impacts of all their research outputs, including data, and empowers them to be recognized for their scholarly work on their own terms. Software and Data Availability: Research software and data: https://github.com/hpiwowar (CCZero for data where possible, MIT for code); Dryad: new BSD license: http://code.google.com/p/dryad; DataONE: Apache license: http://www.dataone.org/developer-resources; total-impact: MIT license: https://github.com/total-impact. This is an abstract that was submitted to the iEvoBio 2012 conference, held on July 10-11, 2012, in Ottawa, Canada.

Download Full-text