Examination of data citation guidelines in style manuals and data repositories

2020 ◽  
Author(s):  
JungWon Yoon ◽  
EunKyung Chung ◽  
Janet Schalk ◽  
Jihyun Kim
2016 ◽  
Author(s):  
Martin Fenner ◽  
Mercè Crosas ◽  
Jeffrey Grethe ◽  
David Kennedy ◽  
Henning Hermjakob ◽  
...  

AbstractThis article presents a practical roadmap for scholarly data repositories to implement data citation in accordance with the Joint Declaration of Data Citation Principles, a synopsis and harmonization of the recommendations of major science policy bodies. The roadmap was developed by the Repositories Expert Group, as part of the Data Citation Implementation Pilot (DCIP) project, an initiative of FORCE11.org and the NIH BioCADDIE (https://biocaddie.org) program. The roadmap makes 11 specific recommendations, grouped into three phases of implementation: a) required steps needed to support the Joint Declaration of Data Citation Principles, b) recommended steps that facilitate article/data publication workflows, and c) optional steps that further improve data citation support provided by data repositories.


2015 ◽  
Author(s):  
Joan Starr ◽  
Eleni Castro ◽  
Mercè Crosas ◽  
Michel Dumontier ◽  
Robert R. Downs ◽  
...  

Reproducibility and reusability of research results is an important concern in scientific communication and science policy. A foundational element of reproducibility and reusability is the open and persistently available presentation of research data. However, many common approaches for primary data publication in use today do not achieve sufficient long-term robustness, openness, accessibility or uniformity. Nor do they permit comprehensive exploitation by modern Web technologies. This has led to several authoritative studies recommending uniform direct citation of data archived in persistent repositories. Data are to be considered as first-class scholarly objects, and treated similarly in many ways to cited and archived scientific and scholarly literature. Here we briefly review the most current and widely agreed set of principle-based recommendations for scholarly data citation, the Joint Declaration of Data Citation Principles (JDDCP). We then present a framework for operationalizing the JDDCP; and a set of initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data. The main target audience for the common implementation guidelines in this article consists of publishers, scholarly organizations, and persistent data repositories, including technical staff members in these organizations. But ordinary researchers can also benefit from these recommendations. The guidance provided here is intended to help achieve widespread, uniform human and machine accessibility of deposited data, in support of significantly improved verification, validation, reproducibility and re-use of scholarly/scientific data.


Author(s):  
Gail M. Thornton ◽  
Ali Shiri

Introduction: Open health data provides healthcare professionals, biomedical researchers and the general public with access to health data which has the potential to improve healthcare delivery and policy. The challenge for data providers is to create and implement appropriate metadata, or structured data about the data, to ensure that data are easy to discover, access and re-use. The goal of this study is to identify, evaluate and compare Canadian open health data repositories for their searching, browsing and navigation functionalities, the richness of their metadata description practices, and their metadata-based filtering mechanisms.Methods: Metadata-based search and browsing was evaluated in addition to the number and nature of metadata elements. Canadian open health data repositories across national, provincial and institutional levels were evaluated. Data collected using verbatim text recording was evaluated using an analytical framework based on the 2019 Dataverse North Metadata Best Practices guide and 2019 Data Citation Implementation Project roadmap.Results: All six repositories required filtering to access “open health data”. All six repositories included subject facets for filtering, and title and description on the Results List. Inconsistencies suggest that improvements should address advanced search, health-specific search terms, records for all repositories and links to related publications.Discussion: Consistent use of title and description suggests that an interoperable interface is possible. Records indicate the need for explicit, easy to find mechanisms to access metadata in repositories. The analytical framework represents first draft guidelines for metadata creation and implementation to improve organization, discoverability and access to Canadian open health data.


2020 ◽  
Author(s):  
Graham Smith ◽  
Andrew Hufton

<p>Researchers are increasingly expected by funders and journals to make their data available for reuse as a condition of publication. At Springer Nature, we feel that publishers must support researchers in meeting these additional requirements, and must recognise the distinct opportunities data holds as a research output. Here, we outline some of the varied ways that Springer Nature supports research data sharing and report on key outcomes.</p><p>Our staff and journals are closely involved with community-led efforts, like the Enabling FAIR Data initiative and the COPDESS 2014 Statement of Commitment <sup>1-4</sup>. The Enabling FAIR Data initiative, which was endorsed in January 2019 by <em>Nature</em> and <em>Scientific Data</em>, and by <em>Nature Geoscience</em> in January 2020, establishes a clear expectation that Earth and environmental sciences data should be deposited in FAIR<sup>5</sup> Data-aligned community repositories, when available (and in general purpose repositories otherwise). In support of this endorsement, <em>Nature</em> and <em>Nature Geoscience</em> require authors to share and deposit their Earth and environmental science data, and <em>Scientific Data</em> has committed to progressively updating its list of recommended data repositories to help authors comply with this mandate.</p><p>In addition, we offer a range of research data services, with various levels of support available to researchers in terms of data curation, expert guidance on repositories and linking research data and publications.</p><p>We appreciate that researchers face potentially challenging requirements in terms of the ‘what’, ‘where’ and ‘how’ of sharing research data. This can be particularly difficult for researchers to negotiate given that huge diversity of policies across different journals. We have therefore developed a series of standardised data policies, which have now been adopted by more than 1,600 Springer Nature journals. </p><p>We believe that these initiatives make important strides in challenging the current replication crisis and addressing the economic<sup>6</sup> and societal consequences of data unavailability. They also offer an opportunity to drive change in how academic credit is measured, through the recognition of a wider range of research outputs than articles and their citations alone. As signatories of the San Francisco Declaration on Research Assessment<sup>7</sup>, Nature Research is committed to improving the methods of evaluating scholarly research. Research data in this context offers new mechanisms to measure the impact of all research outputs. To this end, Springer Nature supports the publication of peer-reviewed data papers through journals like <em>Scientific Data</em>. Analysis of citation patterns demonstrate that data papers can be well-cited, and offer a viable way for researchers to receive credit for data sharing through traditional citation metrics. Springer Nature is also working hard to improve support for direct data citation. In 2018 a data citation roadmap developed by the Publishers Early Adopters Expert Group was published in <em>Scientific Data</em><sup>8</sup>, outlining practical steps for publishers to work with data citations and associated benefits in transparency and credit for researchers. Using examples from this roadmap, its implementation and supporting services, we outline how a FAIR-led data approach from publishers can help researchers in the Earth and environmental sciences to capitalise on new expectations around data sharing.</p><p>__</p><ol><li>https://doi.org/10.1038/d41586-019-00075-3</li> <li>https://doi.org/10.1038/s41561-019-0506-4</li> <li>https://copdess.org/enabling-fair-data-project/commitment-statement-in-the-earth-space-and-environmental-sciences/</li> <li>https://copdess.org/statement-of-commitment/</li> <li>https://www.force11.org/group/fairgroup/fairprinciples</li> <li>https://op.europa.eu/en/publication-detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1</li> <li>https://sfdora.org/read/</li> <li>https://doi.org/10.1038/sdata.2018.259</li> </ol>


Author(s):  
Todd J Vision ◽  
Heather A Piwowar

Recently introduced funding agency policies seek to increase the availability of data from individual published studies for reuse by the research community at large. The success of such policies can be measured both by data input (“is useful data being made available?”) and research output (“are these data being reused by others?”). A key determinant of data input is the extent to which data producers receive adequate professional credit for making data available. One of us (HP) previously reported a large citation difference for published microarray studies with and without data available in a public repository. Analysis of a much larger sample, with more covariates, provides a more reliable estimate of this citation boost, as well as additional insights into patterns of reuse and how the availability of data affects publication impact. A more recent study tracking the reuse of 100 datasets from each of ten different primary data repositories reveals large variation in patterns of reuse and citation. Our findings (a) illuminate ways in which the reuses of archived data tend to differ in purpose from that of the original producers; (b) inform data archiving policy, such as how long data embargoes need to be in order to protect the proprietary interests of producers; (c) and allow us to answer the vexing question of what the return on investment is for data archiving. In conducting these studies, we have become aware of gaps in data citation practice and infrastructure that limit the extent to which researchers receive credit for their contributions. We describe early efforts to bake good data citation and usage tracking into cyberinfrastructure as part of DataONE, the Data Observation Network for Earth. Finally, we introduce total-impact, a tool that allows researchers to track the diverse impacts of all their research outputs, including data, and empowers them to be recognized for their scholarly work on their own terms. Software and Data Availability: Research software and data: https://github.com/hpiwowar (CCZero for data where possible, MIT for code); Dryad: new BSD license: http://code.google.com/p/dryad; DataONE: Apache license: http://www.dataone.org/developer-resources; total-impact: MIT license: https://github.com/total-impact. This is an abstract that was submitted to the iEvoBio 2012 conference, held on July 10-11, 2012, in Ottawa, Canada.


2015 ◽  
Author(s):  
Joan Starr ◽  
Eleni Castro ◽  
Mercè Crosas ◽  
Michel Dumontier ◽  
Robert R. Downs ◽  
...  

Reproducibility and reusability of research results is an important concern in scientific communication and science policy. A foundational element of reproducibility and reusability is the open and persistently available presentation of research data. However, many common approaches for primary data publication in use today do not achieve sufficient long-term robustness, openness, accessibility or uniformity. Nor do they permit comprehensive exploitation by modern Web technologies. This has led to several authoritative studies recommending uniform direct citation of data archived in persistent repositories. Data are to be considered as first-class scholarly objects, and treated similarly in many ways to cited and archived scientific and scholarly literature. Here we briefly review the most current and widely agreed set of principle-based recommendations for scholarly data citation, the Joint Declaration of Data Citation Principles (JDDCP). We then present a framework for operationalizing the JDDCP; and a set of initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data. The main target audience for the common implementation guidelines in this article consists of publishers, scholarly organizations, and persistent data repositories, including technical staff members in these organizations. But ordinary researchers can also benefit from these recommendations. The guidance provided here is intended to help achieve widespread, uniform human and machine accessibility of deposited data, in support of significantly improved verification, validation, reproducibility and re-use of scholarly/scientific data.


Author(s):  
Martin Fenner ◽  
Daniella Lowenberg ◽  
Matt Jones ◽  
Paul Needham ◽  
Dave Vieglais ◽  
...  

The Code of Practice for Research Data Usage Metrics standardizes the generation and distribution of usage metrics for research data, enabling for the first time the consistent and credible reporting of research data usage. This is the first release of the Code of Practice and the recommendations are aligned as much as possible with the COUNTER Code of Practice Release 5 that standardizes usage metrics for many scholarly resources, including journals and books. With the Code of Practice for Research Data Usage Metrics data repositories and platform providers can report usage metrics following common best practices and using a standard report format. This is an essential step towards realizing usage metrics as a critical component in our understanding of how publicly available research data are being reused. This complements ongoing work on establishing best practices and services for data citation.


2017 ◽  
Author(s):  
Sarala M. Wimalaratne ◽  
Nick Juty ◽  
John Kunze ◽  
Greg Janée ◽  
Julie A. McMurry ◽  
...  

AbstractMost biomedical data repositories issue locally-unique accessions numbers, but do not provide globally unique, machine-resolvable, persistent identifiers for their datasets, as required by publishers wishing to implement data citation in accordance with widely accepted principles. Local accessions may however be prefixed with a namespace identifier, providing global uniqueness. Such “compact identifiers” have been widely used in biomedical informatics to support global resource identification with local identifier assignment.We report here on our project to provide robust support for machine-resolvable, persistent compact identifiers in biomedical data citation, by harmonizing the Identifiers.org and N2T.net (Name-To-Thing) meta-resolvers and extending their capabilities. Identifiers.org services hosted at the European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), and N2T.net services hosted at the California Digital Library (CDL), can now resolve any given identifier from over 600 source databases to its original source on the Web, using a common registry of prefix-based redirection rules.We believe these services will be of significant help to publishers and others implementing persistent, machine-resolvable citation of research data.


2015 ◽  
Author(s):  
Joan Starr ◽  
Eleni Castro ◽  
Mercè Crosas ◽  
Michel Dumontier ◽  
Robert R. Downs ◽  
...  

Reproducibility and reusability of research results is an important concern in scientific communication and science policy. A foundational element of reproducibility and reusability is the open and persistently available presentation of research data. However, many common approaches for primary data publication in use today do not achieve sufficient long-term robustness, openness, accessibility or uniformity. Nor do they permit comprehensive exploitation by modern Web technologies. This has led to several authoritative studies recommending uniform direct citation of data archived in persistent repositories. Data are to be considered as first-class scholarly objects, and treated similarly in many ways to cited and archived scientific and scholarly literature. Here we briefly review the most current and widely agreed set of principle-based recommendations for scholarly data citation, the Joint Declaration of Data Citation Principles (JDDCP). We then present a framework for operationalizing the JDDCP; and a set of initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data. The main target audience for the common implementation guidelines in this article consists of publishers, scholarly organizations, and persistent data repositories, including technical staff members in these organizations. But ordinary researchers can also benefit from these recommendations. The guidance provided here is intended to help achieve widespread, uniform human and machine accessibility of deposited data, in support of significantly improved verification, validation, reproducibility and re-use of scholarly/scientific data.


2013 ◽  
Author(s):  
Todd J Vision ◽  
Heather A Piwowar

Recently introduced funding agency policies seek to increase the availability of data from individual published studies for reuse by the research community at large. The success of such policies can be measured both by data input (“is useful data being made available?”) and research output (“are these data being reused by others?”). A key determinant of data input is the extent to which data producers receive adequate professional credit for making data available. One of us (HP) previously reported a large citation difference for published microarray studies with and without data available in a public repository. Analysis of a much larger sample, with more covariates, provides a more reliable estimate of this citation boost, as well as additional insights into patterns of reuse and how the availability of data affects publication impact. A more recent study tracking the reuse of 100 datasets from each of ten different primary data repositories reveals large variation in patterns of reuse and citation. Our findings (a) illuminate ways in which the reuses of archived data tend to differ in purpose from that of the original producers; (b) inform data archiving policy, such as how long data embargoes need to be in order to protect the proprietary interests of producers; (c) and allow us to answer the vexing question of what the return on investment is for data archiving. In conducting these studies, we have become aware of gaps in data citation practice and infrastructure that limit the extent to which researchers receive credit for their contributions. We describe early efforts to bake good data citation and usage tracking into cyberinfrastructure as part of DataONE, the Data Observation Network for Earth. Finally, we introduce total-impact, a tool that allows researchers to track the diverse impacts of all their research outputs, including data, and empowers them to be recognized for their scholarly work on their own terms. Software and Data Availability: Research software and data: https://github.com/hpiwowar (CCZero for data where possible, MIT for code); Dryad: new BSD license: http://code.google.com/p/dryad; DataONE: Apache license: http://www.dataone.org/developer-resources; total-impact: MIT license: https://github.com/total-impact. This is an abstract that was submitted to the iEvoBio 2012 conference, held on July 10-11, 2012, in Ottawa, Canada.


Sign in / Sign up

Export Citation Format

Share Document