scholarly journals Data Curation in Practice: Extract Tabular Data from PDF Files Using a Data Analytics Tool

2021 ◽  
Vol 10 (3) ◽  
Author(s):  
Allis J. Choi ◽  
Xuying Xin

Data curation is the process of managing data to make it available for reuse and preservation and to allow FAIR (findable, accessible, interoperable, reusable) uses. It is an important part of the research lifecycle as researchers are often either required by funders or generally encouraged to preserve the dataset and make it discoverable and reusable. This has been especially important as the Open Access (OA) policy is being implemented in many institutions across the nation. In facilitating research data discovery and enhancing its easier reuse, an efficient data repository and its data curation play key roles. In this article, we briefly discuss the local institutional repository at Penn State University and the general data curation practices we adopt for the deposited files and datasets, then we focus on a data analytics tool that has recently been applied to extract tabular data from PDF files. This is an enhancement to the existing data curation practices as it adds additional tabular data to deposits with PDF files where tables are often embedded and not easily reused.

2021 ◽  
Vol 10 (3) ◽  
Author(s):  
Helenmary Sheridan ◽  
Anthony J. Dellureficio ◽  
Melissa A. Ratajeski ◽  
Sara Mannheimer ◽  
Terrie R. Wheeler

Institutional data repositories are the acknowledged gold standard for data curation platforms in academic libraries. But not every institution can sustain a repository, and not every dataset can be archived due to legal, ethical, or authorial constraints. Data catalogs—metadata-only indices of research data that provide detailed access instructions and conditions for use—are one potential solution, and may be especially suitable for "challenging" datasets. This article presents the strengths of data catalogs for increasing the discoverability and accessibility of research data. The authors argue that data catalogs are a viable alternative or complement to data repositories, and provide examples from their institutions' experiences to show how their data catalogs address specific curatorial requirements. The article also reports on the development of a community of practice for data catalogs and data discovery initiatives.


2021 ◽  
Vol 10 (3) ◽  
Author(s):  
Alexandra Cooper ◽  
Michael Steeleworthy ◽  
Ève Paquette-Bigras ◽  
Erin Clary ◽  
Erin MacPherson ◽  
...  

Purpose: This paper introduces the Portage Network’s Dataverse Curation Guide and the new bilingual curation framework developed to support it. Brief Description: Canadian academic institutions and national organizations have been building infrastructure, staffing, and programming to support research data management. Amidst this work, a notable gap emerged between requirements for data curation in general repositories like Dataverse and the requisite workflows and guidance materials needed by curators to meet them. In response, Portage, a national network of data experts, organized a working group to develop a Dataverse curation guide built upon the Data Curation Network’s CURATED workflow. To create a bilingual resource, the original CURATE(D) acronym was modified to CURATION—which has the same meaning in both French and English—and steps were augmented with Dataverse-specific guidance and mapped to three conceptualized levels of curation to assist curators in prioritizing curation actions. Methods: An environmental scan of relevant deposit and curation guidance materials from Canadian and international institutions identified the need for a comprehensive Dataverse Curation Guide, as most existing resources were either depositor-focused or contained only partial workflows. The resulting Guide synthesized these guidance materials into the CURATION steps and mapped actions to various theoretical levels of data repository services and levels of curation. Resources: The following documents are supplemental to the Dataverse Curation Guide: the Portage Dataverse North Metadata Best Practices Guide, the Scholars Portal Dataverse Guide, and the Data Curation Network CURATED Workflow and Data Curation Primers.


2009 ◽  
Vol 4 (2) ◽  
pp. 12-27 ◽  
Author(s):  
Karen S. Baker ◽  
Lynn Yarmey

Scientific researchers today frequently package measurements and associated metadata as digital datasets in anticipation of storage in data repositories. Through the lens of environmental data stewardship, we consider the data repository as an organizational element central to data curation. One aspect of non-commercial repositories, their distance-from-origin of the data, is explored in terms of near and remote categories. Three idealized repository types are distinguished – local, center, and archive - paralleling research, resource, and reference collection categories respectively. Repository type characteristics such as scope, structure, and goals are discussed. Repository similarities in terms of roles, activities and responsibilities are also examined. Data stewardship is related to care of research data and responsible scientific communication supported by an infrastructure that coordinates curation activities; data curation is defined as a set of repeated and repeatable activities focusing on tending data and creating data products within a particular arena. The concept of “sphere-of-context” is introduced as an aid to distinguishing repository types. Conceptualizing a “web-of-repositories” accommodates a variety of repository types and represents an ecologically inclusive approach to data curation.


2021 ◽  
pp. e1232

Data Soup is a collaboration between the Journal of eScience Librarianship (JeSLIB) and the Data Curation Networkto host a series of community focused webinars/discussions to exchange practices for curating research data of different formats or subject areas among data curators. The lineup of the inaugural webinar includes the following speakers and topics from the recent JeSLIB Special Issue: Data Curation in Practice: Creating Guidance for Canadian Dataverse Curators: Portage Network’s Dataverse Curation Guide Alexandra Cooper, Michael Steeleworthy, Ève Paquette-Bigras, Erin Clary, Erin MacPherson, Louise Gillis, and Jason Brodeur, https://escholarship.umassmed.edu/jeslib/vol10/iss3/2; Active Curation of Large Longitudinal Surveys: A Case Study Inna Kouper, Karen L. Tucker, Kevin Tharp, Mary Ellen van Booven, and Ashley Clark, https://doi.org/10.7191/jeslib.2021.1210; Data Curation through Catalogs: A Repository-Independent Model for Data Discovery Helenmary Sheridan, Anthony J. Dellureficio, Melissa A. Ratajeski, Sara Mannheimer, and Terrie R. Wheeler, https://doi.org/10.7191/jeslib.2021.1203.


Publications ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 25
Author(s):  
Clara Turp ◽  
Lee Wilson ◽  
Julienne Pascoe ◽  
Alex Garnett

The Federated Research Data Repository (FRDR), developed through a partnership between the Canadian Association of Research Libraries’ Portage initiative and the Compute Canada Federation, improves research data discovery in Canada by providing a single search portal for research data stored across Canadian governmental, institutional, and discipline-specific data repositories. While this national discovery layer helps to de-silo Canadian research data, challenges in data discovery remain due to a lack of standardized metadata practices across repositories. In recognition of this challenge, a Portage task group, drawn from a national network of experts, has engaged in a project to map subject keywords to the Online Computer Library Center’s (OCLC) Faceted Application of Subject Terminology (FAST) using the open source OpenRefine software. This paper will describe the task group’s project, discuss the various approaches undertaken by the group, and explore how this work improves data discovery and may be adopted by other repositories and metadata aggregators to support metadata standardization.


Author(s):  
João Rocha da Silva ◽  
Cristina Ribeiro ◽  
João Correia Lopes

This chapter consists of a solution for the management of research data at a higher education and research institution. The chapter is based on a small-scale data audit study, which included contacts with researchers and yielded some preliminary requirements and use cases. These requirements led to the design of a data curation workflow involving the researcher, the curator, and a data repository. The authors describe the features of the data repository prototype, which is an extension to the widely used DSpace repository platform and introduced a set of features mentioned by the majority of the interviewed researchers as relevant for a data repository. The data repository platform contributes to the curation workflow at the university, with XML technology at its core—data is stored using XML documents, which can be systematically processed and queried unlike its original-format counterpart. This system is capable of indexing, querying, and retrieving, in whole or in part, datasets represented in tabular form. There is also the possibility of using elements from domain-specific XML schemas for the cataloguing process, improving the interoperability and quality of the deposited data.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
SiZhe Xiao ◽  
Tsz Yan Ng ◽  
Tao T. Yang

PurposeThe purpose of this paper is to look at the journey and experience of the University of Hong Kong (HKU) Research Data Management (RDM) practice to respond to the needs of researchers in an academic library.Design/methodology/approachThe research data services (RDS) practice is based on the FAIR data principle. And the authors designed the RDM Stewardship framework to implement the RDS step by step.FindingsThe HKU Libraries developed and implemented a set of RDS under a research data stewardship framework in response to the recent evolving research needs for RDM amongst the academic communities. The services cover policy and procedure settings for research data planning, research data infrastructure establishment, data curation services and provision of online resources and instructional guidelines.Originality/value This study provides an example of an approach to respond to the needs of the academic libraries about how to start the RDS including the data policy, data repository, data librarianship and data curation.


Sign in / Sign up

Export Citation Format

Share Document