Curating Humanities Research Data: Managing Workflows for Adjusting a Repository Framework

Hagen Peukert

doi:10.2218/ijdc.v12i2.571

Curating Humanities Research Data: Managing Workflows for Adjusting a Repository Framework

International Journal of Digital Curation ◽

10.2218/ijdc.v12i2.571 ◽

1970 ◽

Vol 12 (2) ◽

Cited By ~ 1

Author(s):

Hagen Peukert

Keyword(s):

Heterogeneous Data ◽

Research Data ◽

Data Curation ◽

Management Problem ◽

Strategic Positioning ◽

Data Repositories ◽

Standard Case ◽

Traditional Sense ◽

Definition Of ◽

Data Subject

Handling heterogeneous data, subject to minimal costs, can be perceived as a classic management problem. The approach at hand applies established managerial theorizing to the field of data curation. It is argued, however, that data curation cannot merely be treated as a standard case of applying management theory in a traditional sense. Rather, the practice of curating humanities research data, the specifications and adjustments of the model suggested here reveal an intertwined process, in which knowledge of both strategic management and solid information technology have to be considered. Thus, suggestions on the strategic positioning of research data, which can be used as an analytical tool to understand the proposed workflow mechanisms, and the definition of workflow modules, which can be flexibly used in designing new standard workflows to configure research data repositories, are put forward.

Download Full-text

Connecting Data Publication to the Research Workflow: A Preliminary Analysis

International Journal of Digital Curation ◽

10.2218/ijdc.v12i1.533 ◽

2017 ◽

Vol 12 (1) ◽

pp. 88-105 ◽

Cited By ~ 3

Author(s):

Sünje Dallmeier-Tiessen ◽

Varsha Khodiyar ◽

Fiona Murphy ◽

Amy Nurnberger ◽

Lisa Raymond ◽

...

Keyword(s):

Working Group ◽

Research Data ◽

Data Publishing ◽

Data Repository ◽

Research Activity ◽

Data Repositories ◽

Data Publication ◽

Loosely Coupled ◽

Definition Of ◽

Data Documentation

The data curation community has long encouraged researchers to document collected research data during active stages of the research workflow, to provide robust metadata earlier, and support research data publication and preservation. Data documentation with robust metadata is one of a number of steps in effective data publication. Data publication is the process of making digital research objects ‘FAIR’, i.e. findable, accessible, interoperable, and reusable; attributes increasingly expected by research communities, funders and society. Research data publishing workflows are the means to that end. Currently, however, much published research data remains inconsistently and inadequately documented by researchers. Documentation of data closer in time to data collection would help mitigate the high cost that repositories associate with the ingest process. More effective data publication and sharing should in principle result from early interactions between researchers and their selected data repository. This paper describes a short study undertaken by members of the Research Data Alliance (RDA) and World Data System (WDS) working group on Publishing Data Workflows. We present a collection of recent examples of data publication workflows that connect data repositories and publishing platforms with research activity ‘upstream’ of the ingest process. We re-articulate previous recommendations of the working group, to account for the varied upstream service components and platforms that support the flow of contextual and provenance information downstream. These workflows should be open and loosely coupled to support interoperability, including with preservation and publication environments. Our recommendations aim to stimulate further work on researchers’ views of data publishing and the extent to which available services and infrastructure facilitate the publication of FAIR data. We also aim to stimulate further dialogue about, and definition of, the roles and responsibilities of research data services and platform providers for the ‘FAIRness’ of research data publication workflows themselves.

Download Full-text

Introduction to the Special JeSLIB Issue on Data Curation in Practice

Journal of eScience Librarianship ◽

10.7191/jeslib.2021.1222 ◽

2021 ◽

Vol 10 (3) ◽

Author(s):

Cynthia Hudson Vitale ◽

Jake R. Carlson ◽

Hannah Hadley ◽

Lisa Johnston

Keyword(s):

Research Integrity ◽

Scientific Communication ◽

Research Data ◽

Data Curation ◽

Research Projects ◽

Academic Institutions ◽

Data Repositories ◽

Communication Processes ◽

Potential Impact

Research data curation is a set of scientific communication processes and activities that support the ethical reuse of research data and uphold research integrity. Data curators act as key collaborators with researchers to enrich the scholarly value and potential impact of their data through preparing it to be shared with others and preserved for the long term. This special issues focuses on practical data curation workflows and tools that have been developed and implemented within data repositories, scholarly societies, research projects, and academic institutions.

Download Full-text

Extending the Research Data Toolkit: Data Curation Primers

International Journal of Digital Curation ◽

10.2218/ijdc.v15i1.713 ◽

2020 ◽

Vol 15 (1) ◽

pp. 14

Author(s):

Cynthia Hudson-Vitale ◽

Hannah Hadley ◽

Jennifer Moore ◽

Lisa Johnston ◽

Wendy Kozlowski ◽

...

Keyword(s):

Heterogeneous Data ◽

Research Data ◽

Cutting Edge ◽

Data Curation ◽

Data Types ◽

Know How ◽

Digital Curation ◽

Data Formats ◽

Or Organization ◽

Disciplinary Area

Niche and proprietary data formats used in cutting-edge research and technology have specific curation considerations and challenges. The increased demand for subject liaisons, library archivists, and digital curators to curate this variety of data types created locally at an institution or organization poses difficulties. Subject liaisons possess discipline knowledge and expertise for a given domain or discipline and digital curation experts know how to properly steward data assets generally. Yet, a gap often exists between the expertise available within the organization and local curation needs. While many institutions and organizations have expertise in certain domains and areas, oftentimes the heterogeneous data types received for deposit extend beyond this expertise. Additionally, evolving research methods and new, cutting-edge technology used in research often result in unfamiliar and niche data formats received for deposit. Knowing how to ‘get-started’ in curating these file types and formats can be a particular challenge. To address this need, the data curation community have been developing a new set of tools - data curation primers. These primers are evolving documents that detail a specific subject, disciplinary area or curation task, and that can be used as a reference or jump-start to curating research data. This paper will provide background on the data curation primers and their content detail the process of their development, highlight the data curation primers published to date, emphasize how curators can incorporate these resources into workflows, and show curators how they can get involved and share their own expertise.

Download Full-text

Data Curation through Catalogs: A Repository-Independent Model for Data Discovery

Journal of eScience Librarianship ◽

10.7191/jeslib.2021.1203 ◽

2021 ◽

Vol 10 (3) ◽

Author(s):

Helenmary Sheridan ◽

Anthony J. Dellureficio ◽

Melissa A. Ratajeski ◽

Sara Mannheimer ◽

Terrie R. Wheeler

Keyword(s):

Community Of Practice ◽

Academic Libraries ◽

Gold Standard ◽

Research Data ◽

Data Curation ◽

Viable Alternative ◽

Potential Solution ◽

Data Discovery ◽

Data Repositories ◽

Independent Model

Institutional data repositories are the acknowledged gold standard for data curation platforms in academic libraries. But not every institution can sustain a repository, and not every dataset can be archived due to legal, ethical, or authorial constraints. Data catalogs—metadata-only indices of research data that provide detailed access instructions and conditions for use—are one potential solution, and may be especially suitable for "challenging" datasets. This article presents the strengths of data catalogs for increasing the discoverability and accessibility of research data. The authors argue that data catalogs are a viable alternative or complement to data repositories, and provide examples from their institutions' experiences to show how their data catalogs address specific curatorial requirements. The article also reports on the development of a community of practice for data catalogs and data discovery initiatives.

Download Full-text

Data Stewardship: Environmental Data Curation and a Web-of-Repositories

International Journal of Digital Curation ◽

10.2218/ijdc.v4i2.90 ◽

2009 ◽

Vol 4 (2) ◽

pp. 12-27 ◽

Cited By ~ 30

Author(s):

Karen S. Baker ◽

Lynn Yarmey

Keyword(s):

Scientific Communication ◽

Research Data ◽

Environmental Data ◽

Data Curation ◽

Data Repository ◽

Data Repositories ◽

Reference Collection ◽

Inclusive Approach ◽

Data Stewardship ◽

Organizational Element

Scientific researchers today frequently package measurements and associated metadata as digital datasets in anticipation of storage in data repositories. Through the lens of environmental data stewardship, we consider the data repository as an organizational element central to data curation. One aspect of non-commercial repositories, their distance-from-origin of the data, is explored in terms of near and remote categories. Three idealized repository types are distinguished – local, center, and archive - paralleling research, resource, and reference collection categories respectively. Repository type characteristics such as scope, structure, and goals are discussed. Repository similarities in terms of roles, activities and responsibilities are also examined. Data stewardship is related to care of research data and responsible scientific communication supported by an infrastructure that coordinates curation activities; data curation is defined as a set of repeated and repeatable activities focusing on tending data and creating data products within a particular arena. The concept of “sphere-of-context” is introduced as an aid to distinguishing repository types. Conceptualizing a “web-of-repositories” accommodates a variety of repository types and represents an ecologically inclusive approach to data curation.

Download Full-text

Revisiting the Data Lifecycle with Big Data Curation

International Journal of Digital Curation ◽

10.2218/ijdc.v10i2.342 ◽

2016 ◽

Vol 10 (2) ◽

pp. 176-192 ◽

Cited By ~ 10

Author(s):

Line Pouchard

Keyword(s):

Big Data ◽

Life Cycle ◽

High Performance ◽

Research Data ◽

Data Curation ◽

Complex Data ◽

Life Cycle Model ◽

Cycle Model ◽

Data Repositories ◽

Data Life Cycle

As science becomes more data-intensive and collaborative, researchers increasingly use larger and more complex data to answer research questions. The capacity of storage infrastructure, the increased sophistication and deployment of sensors, the ubiquitous availability of computer clusters, the development of new analysis techniques, and larger collaborations allow researchers to address grand societal challenges in a way that is unprecedented. In parallel, research data repositories have been built to host research data in response to the requirements of sponsors that research data be publicly available. Libraries are re-inventing themselves to respond to a growing demand to manage, store, curate and preserve the data produced in the course of publicly funded research. As librarians and data managers are developing the tools and knowledge they need to meet these new expectations, they inevitably encounter conversations around Big Data. This paper explores definitions of Big Data that have coalesced in the last decade around four commonly mentioned characteristics: volume, variety, velocity, and veracity. We highlight the issues associated with each characteristic, particularly their impact on data management and curation. We use the methodological framework of the data life cycle model, assessing two models developed in the context of Big Data projects and find them lacking. We propose a Big Data life cycle model that includes activities focused on Big Data and more closely integrates curation with the research life cycle. These activities include planning, acquiring, preparing, analyzing, preserving, and discovering, with describing the data and assuring quality being an integral part of each activity. We discuss the relationship between institutional data curation repositories and new long-term data resources associated with high performance computing centers, and reproducibility in computational science. We apply this model by mapping the four characteristics of Big Data outlined above to each of the activities in the model. This mapping produces a set of questions that practitioners should be asking in a Big Data project

Download Full-text

Methods for Pain Assessment in Calves and Their Use for the Evaluation of Pain during Different Procedures—A Review

Animals ◽

10.3390/ani11051235 ◽

2021 ◽

Vol 11 (5) ◽

pp. 1235

Author(s):

Theresa Tschoner

Keyword(s):

Pain Management ◽

Pain Assessment ◽

Research Data ◽

Narrative Review ◽

Literature Research ◽

Field Practice ◽

Definition Of ◽

Evaluation And Assessment

The evaluation and assessment of the level of pain calves are experiencing is important, as the experience of pain (e.g., due to routine husbandry procedures) severely affects the welfare of calves. Studies about the recognition of pain in calves, and especially pain management during and after common procedures, such as castration, dehorning, and disbudding, have been published. This narrative review discusses and summarizes the existing literature about methods for pain assessment in calves. First, it deals with the definition of pain and the challenges associated with the recognition of pain in calves. Then it proceeds to outline the different options and methods for subjective and objective pain assessment in calves, as described in the literature. Research data show that there are several tools suitable for the assessment of pain in calves, at least for research purposes. Finally, it concludes that for research purposes, various variables for the assessment of pain in calves are used in combination. However, there is no variable which can be used solely for the exclusive assessment of pain in calves. Also, further research is needed to describe biomarkers or variables which are easily accessible in the field practice.

Download Full-text

metaXplor: an interactive viral and microbial metagenomic data manager

GigaScience ◽

10.1093/gigascience/giab001 ◽

2021 ◽

Vol 10 (2) ◽

Author(s):

Guilhem Sempéré ◽

Adrien Pétel ◽

Magsen Abbé ◽

Pierre Lefeuvre ◽

Philippe Roumagnac ◽

...

Keyword(s):

Heterogeneous Data ◽

Metagenomic Data ◽

Online Data ◽

Data Repositories ◽

Ongoing Research ◽

Efficient Management ◽

Public Data ◽

Reference Databases ◽

Interactive Data ◽

User Friendly

Abstract Background Efficiently managing large, heterogeneous data in a structured yet flexible way is a challenge to research laboratories working with genomic data. Specifically regarding both shotgun- and metabarcoding-based metagenomics, while online reference databases and user-friendly tools exist for running various types of analyses (e.g., Qiime, Mothur, Megan, IMG/VR, Anvi'o, Qiita, MetaVir), scientists lack comprehensive software for easily building scalable, searchable, online data repositories on which they can rely during their ongoing research. Results metaXplor is a scalable, distributable, fully web-interfaced application for managing, sharing, and exploring metagenomic data. Being based on a flexible NoSQL data model, it has few constraints regarding dataset contents and thus proves useful for handling outputs from both shotgun and metabarcoding techniques. By supporting incremental data feeding and providing means to combine filters on all imported fields, it allows for exhaustive content browsing, as well as rapid narrowing to find specific records. The application also features various interactive data visualization tools, ways to query contents by BLASTing external sequences, and an integrated pipeline to enrich assignments with phylogenetic placements. The project home page provides the URL of a live instance allowing users to test the system on public data. Conclusion metaXplor allows efficient management and exploration of metagenomic data. Its availability as a set of Docker containers, making it easy to deploy on academic servers, on the cloud, or even on personal computers, will facilitate its adoption.

Download Full-text

Improvements for research data repositories: The case of text spam

Journal of Information Science ◽

10.1177/0165551521998636 ◽

2021 ◽

pp. 016555152199863

Author(s):

Ismael Vázquez ◽

María Novo-Lourés ◽

Reyes Pavón ◽

Rosalía Laza ◽

José Ramón Méndez ◽

...

Keyword(s):

Web Application ◽

Research Data ◽

Data Sets ◽

Data Repositories ◽

Software Applications ◽

Public Data ◽

Protection Mechanisms ◽

Experimental Protocols ◽

Learning Research ◽

Processing Steps

Current research has evolved in such a way scientists must not only adequately describe the algorithms they introduce and the results of their application, but also ensure the possibility of reproducing the results and comparing them with those obtained through other approximations. In this context, public data sets (sometimes shared through repositories) are one of the most important elements for the development of experimental protocols and test benches. This study has analysed a significant number of CS/ML ( Computer Science/ Machine Learning) research data repositories and data sets and detected some limitations that hamper their utility. Particularly, we identify and discuss the following demanding functionalities for repositories: (1) building customised data sets for specific research tasks, (2) facilitating the comparison of different techniques using dissimilar pre-processing methods, (3) ensuring the availability of software applications to reproduce the pre-processing steps without using the repository functionalities and (4) providing protection mechanisms for licencing issues and user rights. To show the introduced functionality, we created STRep (Spam Text Repository) web application which implements our recommendations adapted to the field of spam text repositories. In addition, we launched an instance of STRep in the URL https://rdata.4spam.group to facilitate understanding of this study.

Download Full-text

The Impact of the Privacy Rule on Cancer Research: Variations in Attitudes and Application of Regulatory Standards

Journal of Clinical Oncology ◽

10.1200/jco.2009.22.3289 ◽

2009 ◽

Vol 27 (24) ◽

pp. 4014-4020 ◽

Cited By ~ 6

Author(s):

Elizabeth Goss ◽

Michael P. Link ◽

Suanna S. Bruinooge ◽

Theodore S. Lawrence ◽

Joel E. Tepper ◽

...

Keyword(s):

Cancer Research ◽

Future Research ◽

Research Committee ◽

Data Repositories ◽

Privacy Rule ◽

Regulatory Standards ◽

American Society ◽

Definition Of ◽

High Level ◽

The Impact

Purpose The American Society of Clinical Oncology (ASCO) Cancer Research Committee designed a qualitative research project to assess the attitudes of cancer researchers and compliance officials regarding compliance with the US Privacy Rule and to identify potential strategies for eliminating perceived or real barriers to achieving compliance. Methods A team of three interviewers asked 27 individuals (13 investigators and 14 compliance officials) from 13 institutions to describe the anticipated approach of their institutions to Privacy Rule compliance in three hypothetical research studies. Results The interviews revealed that although researchers and compliance officials share the view that patients' cancer diagnoses should enjoy a high level of privacy protection, there are significant tensions between the two groups related to the proper standards for compliance necessary to protect patients. The disagreements are seen most clearly with regard to the appropriate definition of a “future research use” of protected health information in biospecimen and data repositories and the standards for a waiver of authorization for disclosure and use of such data. Conclusion ASCO believes that disagreements related to compliance and the resulting delays in certain projects and abandonment of others might be eased by additional institutional training programs and consultation on Privacy Rule issues during study design. ASCO also proposes the development of best practices documents to guide 1) creation of data repositories, 2) disclosure and use of data from such repositories, and 3) the design of survivorship and genetics studies.

Download Full-text