scholarly journals Crossref: The sustainable source of community-owned scholarly metadata

2020 ◽  
Vol 1 (1) ◽  
pp. 414-427 ◽  
Author(s):  
Ginny Hendricks ◽  
Dominika Tkaczyk ◽  
Jennifer Lin ◽  
Patricia Feeney

This paper describes the scholarly metadata collected and made available by Crossref, as well as its importance in the scholarly research ecosystem. Containing over 106 million records and expanding at an average rate of 11% a year, Crossref’s metadata has become one of the major sources of scholarly data for publishers, authors, librarians, funders, and researchers. The metadata set consists of 13 content types, including not only traditional types, such as journals and conference papers, but also data sets, reports, preprints, peer reviews, and grants. The metadata is not limited to basic publication metadata, but can also include abstracts and links to full text, funding and license information, citation links, and the information about corrections, updates, retractions, etc. This scale and breadth make Crossref a valuable source for research in scientometrics, including measuring the growth and impact of science and understanding new trends in scholarly communications. The metadata is available through a number of APIs, including REST API and OAI-PMH. In this paper, we describe the kind of metadata that Crossref provides and how it is collected and curated. We also look at Crossref’s role in the research ecosystem and trends in metadata curation over the years, including the evolution of its citation data provision. We summarize the research used in Crossref’s metadata and describe plans that will improve metadata quality and retrieval in the future.

2020 ◽  
Vol 1 (1) ◽  
pp. 428-444 ◽  
Author(s):  
Silvio Peroni ◽  
David Shotton

OpenCitations is an infrastructure organization for open scholarship dedicated to the publication of open citation data as Linked Open Data using Semantic Web technologies, thereby providing a disruptive alternative to traditional proprietary citation indexes. Open citation data are valuable for bibliometric analysis, increasing the reproducibility of large-scale analyses by enabling publication of the source data. Following brief introductions to the development and benefits of open scholarship and to Semantic Web technologies, this paper describes OpenCitations and its data sets, tools, services, and activities. These include the OpenCitations Data Model; the SPAR (Semantic Publishing and Referencing) Ontologies; OpenCitations’ open software of generic applicability for searching, browsing, and providing REST APIs over resource description framework (RDF) triplestores; Open Citation Identifiers (OCIs) and the OpenCitations OCI Resolution Service; the OpenCitations Corpus (OCC), a database of open downloadable bibliographic and citation data made available in RDF under a Creative Commons public domain dedication; and the OpenCitations Indexes of open citation data, of which the first and largest is COCI, the OpenCitations Index of Crossref Open DOI-to-DOI Citations, which currently contains over 624 million bibliographic citations and is receiving considerable usage by the scholarly community.


2018 ◽  
Vol 38 (6) ◽  
pp. 378 ◽  
Author(s):  
Adian Fatchur Rochim ◽  
Abdul Muis ◽  
Riri Fitri Sari

<div class="page" title="Page 1"><div class="layoutArea"><div class="column"><p><span>H-index has been widely used as one of the bibliometric measurement methods for researchers’ performance. </span><span>On the other hand, H-index has been unfair for figuring authors that have high number of citations but fewer number </span><span>of papers (perfectionist researcher) and researchers that have many papers but fewer citations (productive researcher). The main objective of this article is to improve H-index for accommodating and calculating perfectionist and productive researchers’ impact based on Jain’s Fairness Index algorithm and Lotka’s Law. For improving H-index by RA-index is proposed. To prove the proposed a method, 1,710 citation data sets of top cited researchers from Scopus based on author names list from Webometrics site are used. Fairness index of the RA-index has the average of 91 per cent, which is higher than the fairness of H-Index 80 per cent has been found. </span></p></div></div></div>


Author(s):  
Tarek Saier ◽  
Michael Färber ◽  
Tornike Tsereteli

AbstractCitation information in scholarly data is an important source of insight into the reception of publications and the scholarly discourse. Outcomes of citation analyses and the applicability of citation-based machine learning approaches heavily depend on the completeness of such data. One particular shortcoming of scholarly data nowadays is that non-English publications are often not included in data sets, or that language metadata is not available. Because of this, citations between publications of differing languages (cross-lingual citations) have only been studied to a very limited degree. In this paper, we present an analysis of cross-lingual citations based on over one million English papers, spanning three scientific disciplines and a time span of three decades. Our investigation covers differences between cited languages and disciplines, trends over time, and the usage characteristics as well as impact of cross-lingual citations. Among our findings are an increasing rate of citations to publications written in Chinese, citations being primarily to local non-English languages, and consistency in citation intent between cross- and monolingual citations. To facilitate further research, we make our collected data and source code publicly available.


2019 ◽  
Vol 121 (2) ◽  
pp. 1213-1228 ◽  
Author(s):  
Ivan Heibi ◽  
Silvio Peroni ◽  
David Shotton

Abstract In this paper, we present COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations (http://opencitations.net/index/coci). COCI is the first open citation index created by OpenCitations, in which we have applied the concept of citations as first-class data entities, and it contains more than 445 million DOI-to-DOI citation links derived from the data available in Crossref. These citations are described using the resource description framework by means of the newly extended version of the OpenCitations Data Model (OCDM). We introduce the workflow we have developed for creating these data, and also show the additional services that facilitate the access to and querying of these data via different access points: a SPARQL endpoint, a REST API, bulk downloads, Web interfaces, and direct access to the citations via HTTP content negotiation. Finally, we present statistics regarding the use of COCI citation data, and we introduce several projects that have already started to use COCI data for different purposes.


Author(s):  
José Augusto Salim ◽  
Antonio Saraiva

For those biologists and biodiversity data managers who are unfamiliar with information science data practices of data standardization, the use of complex software to assist in the creation of standardized datasets can be a barrier to sharing data. Since the ratification of the Darwin Core Standard (DwC) (Darwin Core Task Group 2009) by the Biodiversity Information Standards (TDWG) in 2009, many datasets have been published and shared through a variety of data portals. In the early stages of biodiversity data sharing, the protocol Distributed Generic Information Retrieval (DiGIR), progenitor of DwC, and later the protocols BioCASe and TDWG Access Protocol for Information Retrieval (TAPIR) (De Giovanni et al. 2010) were introduced for discovery, search and retrieval of distributed data, simplifying data exchange between information systems. Although these protocols are still in use, they are known to be inefficient for transferring large amounts of data (GBIF 2017). Because of that, in 2011 the Global Biodiversity Information Facility (GBIF) introduced the Darwin Core Archive (DwC-A), which allows more efficient data transfer, and has become the preferred format for publishing data in the GBIF network. DwC-A is a structured collection of text files, which makes use of the DwC terms to produce a single, self-contained dataset. Many tools for assisting data sharing using DwC-A have been introduced, such as the Integrated Publishing Toolkit (IPT) (Robertson et al. 2014), the Darwin Core Archive Assistant (GBIF 2010) and the Darwin Core Archive Validator. Despite promoting and facilitating data sharing, many users have difficulties using such tools, mainly because of the lack of training in information science in the biodiversity curriculum (Convention on Biological Diversiity 2012, Enke et al. 2012). However, most users are very familiar with spreadsheets to store and organize their data, but the adoption of the available solutions requires data transformation and training in information science and more specifically, biodiversity informatics. For an example of how spreadsheets can simplify data sharing see Stoev et al. (2016). In order to provide a more "familiar" approach to data sharing using DwC-A, we introduce a new tool as a Google Sheet Add-on. The Add-on, called Darwin Core Archive Assistant Add-on can be installed in the user's Google Account from the G Suite MarketPlace and used in conjunction with the Google Sheets application. The Add-on assists the mapping of spreadsheet columns/fields to DwC terms (Fig. 1), similar to IPT, but with the advantage that it does not require the user to export the spreadsheet and import it into another software. Additionally, the Add-on facilitates the creation of a star schema in accordance with DwC-A, by the definition of a "CORE_ID" (e.g. occurrenceID, eventID, taxonID) field between sheets of a document (Fig. 2). The Add-on also provides an Ecological Metadata Language (EML) (Jones et al. 2019) editor (Fig. 3) with minimal fields to be filled in (i.e., mandatory fields required by IPT), and helps users to generate and share DwC-Archives stored in the user's Google Drive, which can be downloaded as a DwC-A or automatically uploaded to another public storage resource like a user's Zenodo Account (Fig. 4). We expect that the Google Sheet Add-on introduced here, in conjunction with IPT, will promote biodiversity data sharing in a standardized format, as it requires minimal training and simplifies the process of data sharing from the user's perspective, mainly for those users not familiar with IPT, but that historically have worked with spreadsheets. Although the DwC-A generated by the add-on still needs to be published using IPT, it does provide a simpler interface (i.e., spreadsheet) for mapping data sets to DwC than IPT. Even though the IPT includes many more features than the Darwin Core Assistant Add-on, we expect that the Add-on can be a "starting point" for users unfamiliar with biodiversity informatics before they move on to more advanced data publishing tools. On the other hand, Zenodo integration allows users to share and cite their standardized data sets without publishing them via IPT, which can be useful for users without access to an IPT installation. Additionally, we are working on new features and future releases will include the automatic generation of Global Unique Identifiers for shared records, the possibility of adding additional data standards and DwC extensions, integration with GBIF REST API and with IPT REST API.


2020 ◽  
Author(s):  
Tamer S. Abu-Alam ◽  
Per Pippin Aspaas ◽  
Leif Longva Longva ◽  
Karl Magnus Nilsen ◽  
Obiajulu Odu

Data from the Polar Regions are of critical importance to modern polar research. Regardless of their disciplinary and institutional affiliations, researchers rely heavily on the comparison of existing data with new data sets to assess changes that are taking effect. However, in a recent survey of 113 major polar data providers, we found that an estimated 60% of the existing polar research data is unfindable through common search engines and can only be accessed through institutional webpages. Moreover, a study by Johnson et al. (2019) showed that in social science and indigenous knowledge, the findability gap is around 84%. This results in an awareness of the need of the scientific community to harvest different metadata related to the Polar Regions and collect these in a homogenous, seamless database and making this database available to researchers, students and the public through one search platform.This contribution describes the progress in an ongoing project, Open Polar (https://site.uit.no/open-polar/) started in 2019 at UiT The Arctic University of Norway. The project aims to collect metadata about all the open-access scholarly data and documents related to the Polar Regions in a homogenous and seamless database. The suggested service will include three parts: 1) harvesting metadata; 2) enriching and filtrating of the harvested metadata relevant to Polar Regions; and 3) making the collected records available and searchable to the end-users through an interactive user interface. The service will help to make the polar related research data and documents more visible and searchable to the end-users and thereby reducing the findability gap.


2018 ◽  
Author(s):  
Cheryl E. Ball

Watch the VIDEO.Vega is a new, open-source academic publishing system, built as a collaboration between US-based library publishers and an Oslo-based design studio, Sanity (nee Bengler). Vega was built with a $1m Andrew W. Mellon Foundation grant to fill a need in academic publishing for open-source, easy-to-use editorial management system that could highlight the publication of multimedia artifacts in a born-digital publishing workflow. Vega facilitates authoring, editing, and publishing academic content as reusable structured data and facilitates innovation in indexing and presentation of this content. This short presentation will focus on the design goals underpinning Vega, specifically the goal of providing an intuitive, scholar-friendly way of editing academic documents as semantically clear, structured data while also featuring the inclusion of multimedia assets as part of the scholarly record.The Vega publishing platform incorporates collaborative digital authoring and editing platforms and a fully customizable front end for a branded reader experience. Vega is built on an open source data store and includes strong, yet flexible, editorial workflows that train editors in digital publishing best practices, especially with multimedia. For instance, the editorial dashboard features peer review options for different tracks/sections in publications, including the traditional double anonymous review process as well as collaborative reviews and fully open peer reviews, depending on each venue's and each text's needs. The UX in Vega also allows editors to see what's going on in their venues at a glance -- through a visualization that tracks each text in each stage of the editorial and production process and allows for editors to engage directly with that text with a double click regardless of where it is in the process.This presentation will highlight how Vega facilitates multiple workflows for multimodal publications (i.e. pdf, website, data, interactive experience), and how it facilitates archivability and indexability. Vega will change the scholarly communications landscape and is now available for use through free download or hosting options.


2018 ◽  
Author(s):  
Rachael Lammey ◽  
Dom Mitchell ◽  
Fiona Counsell

Over the last five years, scholarly publishing has turned its attention towards metadata; increasingly recognizing it as a strategic priority for digital development. In a 2017 Imbue Partners study based on interviews with industry leaders, metadata was identified as of the highest importance in digital transformation. With an understanding that metadata improvements can be most efficiently made in collaboration, scholarly communications communities are now convening to find ways to jointly improve metadata.Metadata 2020 is a project-driven collaboration of over 120 individuals across scholarly communications, convening to find ways to improve metadata that is connected, reusable, and open for all research outputs. Having launched in September 2017, the collaboration now includes 5 community groups and 6 projects.We include publishers, librarians, service providers, data publishers and repositories, researchers and funders in our work to advance metadata for scholarly communications, to ensure consideration of all stakeholder needs in scholarly communications.In this presentation, we will outline some of the key observations of our community groups around the metadata challenges they face; and explain how these observations evolved into the formation of our 6 cross-community projects.  The projects that we will briefly outline include:Researcher CommunicationsMetadata Recommendations and Element MappingsDefining the Terms We Use About MetadataIncentives for Improving Metadata QualityShared Best Practice and PrinciplesMetadata Evaluation and GuidanceGiving a brief overview of the remit and progress to date of each project, we will invite participation from attendees, and provide information about when key resources and guidance will be made available to the communities we serve.


2021 ◽  
Vol 37 (2) ◽  
pp. 119-132
Author(s):  
Claudia C. Delgado-Carreón ◽  
Juan D. Machin-Mastromatteo ◽  
José Refugio Romo-González ◽  
Josmel Pacheco-Mendoza

Purpose This work studied the influence of creativity-related traits in university professors’ scientific productivity. Design/methodology/approach A survey, applied to 120 university professors, included closed-ended questions for participants to rate 33 items derived from the specialized literature and classified into five dimensions (novelty; flexibility-fluidity; achievements-dedication; confidence; and problem-solving). After the survey was applied, data were merged with three other data sets: bibliometric data (Scopus), Altmetrics (Dimensions) and peer-reviews and editorial management (Publons) for the period from 2013 to 2018. Descriptive, correlational and inferential statistical analyzes were conducted on the data collected. Findings There was little relationship between professors’ creativity scores and their bibliometric and Altmetric indicators. The highest-rated creativity dimension was flexibility-fluidity and the most prominent creativity-related trait was “I perform my activities with dedication” (belonging to the achievements-dedication dimension). During the period studied, professors published 379 documents, but there were large gaps among their indicators; for instance, only 61 professors published in journals indexed in Scopus during the period. The inferential analysis implied that the professors with the best indicators did not present substantial differences in their creativity scores when compared to their colleagues with fewer or no indicators. However, descriptive and correlational insights may aid in fostering the aspects that can positively influence creativity and the indicators studied. Originality/value Although there is a wealth of literature about the study of creativity and part of it tackles creativity and scientific research at a theoretical level, this paper did not find other empirical studies that analyzed the relationship between creativity and scientific production. It might be important for librarians to be familiar with user studies such as the present, as they may consider studying these kinds of aspects in their users. Moreover, this study can be interesting because librarians have increasingly been involved in the evaluation of scientific production and in training processes for enhancing it within their institutions. Here, information professionals have found opportunities to improve users’ knowledge, performance and experiences on digital scientific ecosystems and their indicators.


2021 ◽  
pp. 016555152110277
Author(s):  
Alfonso Quarati

Open Government Data (OGD) have the potential to support social and economic progress. However, this potential can be frustrated if these data remain unused. Although the literature suggests that OGD data sets’ metadata quality is one of the main factors affecting their use, to the best of our knowledge, no quantitative study provided evidence of this relationship. Considering about 400,000 data sets of 28 national, municipal and international OGD portals, we have programmatically analysed their usage, their metadata quality and the relationship between the two. Our analysis has highlighted three main findings. First, regardless of their size, the software platform adopted, and their administrative and territorial coverage, most OGD data sets are underutilised. Second, OGD portals pay varying attention to the quality of their data sets’ metadata. Third, we did not find clear evidence that data sets’ usage is positively correlated to better metadata publishing practices. Finally, we have considered other factors, such as data sets’ category, and some demographic characteristics of the OGD portals, and analysed their relationship with data sets’ usage, obtaining partially affirmative answers.


Sign in / Sign up

Export Citation Format

Share Document