The Red Queen in the Repository

One of the grand curation challenges is to secure metadata quality in the ever-changing environment of metadata standards and file formats. As the Red Queen tells Alice in Through the Looking-Glass: “Now, here, you see, it takes all the running you can do, to keep in the same place.” That is, there is some “running” needed to keep metadata records in a research data repository fit for long-term use and put in place. One of the main tools of adaptation and keeping pace with the evolution of new standards, formats – and versions of standards in this ever-changing environment are validation schemas. Validation schemas are mainly seen as methods of checking data quality and fitness for use, but are also important for long-term preservation. We might like to think that our present (meta)data standards and formats are made for eternity, but in reality we know that standards evolve, formats change (some even become obsolete with time), and so do our needs for storage, searching and future dissemination for re-use. Eventually, we come to a point where transformation of our archival records and migration to other formats will be necessary. This could also mean that even if the AIPs, the Archival Information Packages stay the same in storage, the DIPs, the Dissemination Information Packages that we want to extract from the archive are subject to change of format. Further, in order for archival information packages to be self-sustainable, as required in the OAIS model, it is important to take interdependencies between individual files in the information packages into account. This should be done already by the time of ingest and validation of the SIPs, the Submission Information Packages, and along the line at different points of necessary transformation/migration (from SIP to AIP, from AIP to DIP etc.), in order to counter obsolescence. This paper investigates possible validation errors and missing elements in metadata records from three general purpose, multidisciplinary research data repositories – Figshare, Harvard’s Dataverse and Zenodo, and explores the potential effects of these errors on future transformation to AIPs and migration to other formats within a digital archive.

Download Full-text

GAMS – An infrastructure for the long-term preservation and publication of research data from the Humanities

Mitteilungen der Vereinigung Österreichischer Bibliothekarinnen und Bibliothekare ◽

10.31263/voebm.v71i1.1992 ◽

2018 ◽

Vol 71 (1) ◽

pp. 207-216 ◽

Cited By ~ 1

Author(s):

Johannes Hubert Stigler ◽

Elisabeth Steiner

Keyword(s):

System Architecture ◽

Academic Research ◽

Research Data ◽

Data Repository ◽

Data Repositories ◽

Domain Specific ◽

Preservation Policy ◽

Long Term Preservation ◽

Data Centres

Research data repositories and data centres are becoming more and more important as infrastructures in academic research. The article introduces the Humanities’ research data repository GAMS, starting with the system architecture to preservation policy and content policy. Challenges of data centres and repositories and the general and domain-specific approaches and solutions are outlined. Special emphasis lies on the sustainability and long-term perspective of such infrastructures, not only on the technical but above all on the organisational and financial level.

Download Full-text

Research data management and services: Resources for different data practitioners

IASSIST Quarterly ◽

10.29173/iq995 ◽

2021 ◽

Vol 45 (3-4) ◽

Author(s):

Gilbert Mushi

Keyword(s):

Data Management ◽

Developed Countries ◽

Management Plan ◽

Research Data ◽

Data Repository ◽

Data Repositories ◽

Research Libraries ◽

Research Data Management ◽

Metadata Standards ◽

Training Resources

The emergence of data-driven research and demands for the establishment of Research Data Management (RDM) has created interest in academic institutions and research organizations globally. Some of the libraries especially in developed countries have started offering RDM services to their communities. Although lagging behind, some academic libraries in developing countries are at the stage of planning or implementing the service. However, the level of RDM awareness is very low among researchers, librarians and other data practitioners. The objective of this paper is to present available open resources for different data practitioners particularly researchers and librarians. It includes training resources for both researchers and librarians, Data Management Plan (DMP) tool for researchers; data repositories available for researchers to freely archive and share their research data to the local and international communities. A case study with a survey was conducted at the University of Dodoma to identify relevant RDM services so that librarians could assist researchers to make their data accessible to the local and international community. The study findings revealed a low level of RDM awareness among researchers and librarians. Over 50% of the respondent indicated their perceived knowledge as poor in the following RDM knowledge areas; DMP, data repository, long term digital preservation, funders RDM mandates, metadata standards describing data and general awareness of RDM. Therefore, this paper presents available open resources for different data practitioners to improve RDM knowledge and boost the confidence of academic and research libraries in establishing the service.

Download Full-text

Chances and challenges of a long-term data repository in multiple sclerosis: 20th birthday of the German MS registry

Scientific Reports ◽

10.1038/s41598-021-92722-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Lisa-Marie Ohle ◽

David Ellenberger ◽

Peter Flachenecker ◽

Tim Friede ◽

Judith Haas ◽

...

Keyword(s):

Multiple Sclerosis ◽

Sociodemographic Factors ◽

Healthcare Research ◽

Structure Design ◽

Data Repository ◽

Research Projects ◽

Real World Data ◽

Data Repositories ◽

Term Data

AbstractIn 2001, the German Multiple Sclerosis Society, facing lack of data, founded the German MS Registry (GMSR) as a long-term data repository for MS healthcare research. By the establishment of a network of participating neurological centres of different healthcare sectors across Germany, GMSR provides observational real-world data on long-term disease progression, sociodemographic factors, treatment and the healthcare status of people with MS. This paper aims to illustrate the framework of the GMSR. Structure, design and data quality processes as well as collaborations of the GMSR are presented. The registry’s dataset, status and results are discussed. As of 08 January 2021, 187 centres from different healthcare sectors participate in the GMSR. Following its infrastructure and dataset specification upgrades in 2014, more than 196,000 visits have been recorded relating to more than 33,000 persons with MS (PwMS). The GMSR enables monitoring of PwMS in Germany, supports scientific research projects, and collaborates with national and international MS data repositories and initiatives. With its recent pharmacovigilance extension, it aligns with EMA recommendations and helps to ensure early detection of therapy-related safety signals.

Download Full-text

The on-premise data sharing infrastructure e!DAL: Foster FAIR data for faster data acquisition

GigaScience ◽

10.1093/gigascience/giaa107 ◽

2020 ◽

Vol 9 (10) ◽

Cited By ~ 1

Author(s):

Daniel Arend ◽

Patrick König ◽

Astrid Junker ◽

Uwe Scholz ◽

Matthias Lange

Keyword(s):

Data Management ◽

Data Storage ◽

Best Practice ◽

Research Process ◽

Research Data ◽

Primary Data ◽

Data Repository ◽

Plant Genomics ◽

Quality Of Data

Abstract Background The FAIR data principle as a commitment to support long-term research data management is widely accepted in the scientific community. Although the ELIXIR Core Data Resources and other established infrastructures provide comprehensive and long-term stable services and platforms for FAIR data management, a large quantity of research data is still hidden or at risk of getting lost. Currently, high-throughput plant genomics and phenomics technologies are producing research data in abundance, the storage of which is not covered by established core databases. This concerns the data volume, e.g., time series of images or high-resolution hyper-spectral data; the quality of data formatting and annotation, e.g., with regard to structure and annotation specifications of core databases; uncovered data domains; or organizational constraints prohibiting primary data storage outside institional boundaries. Results To share these potentially dark data in a FAIR way and master these challenges the ELIXIR Germany/de.NBI service Plant Genomic and Phenomics Research Data Repository (PGP) implements a “bring the infrastructure to the data” approach, which allows research data to be kept in place and wrapped in a FAIR-aware software infrastructure. This article presents new features of the e!DAL infrastructure software and the PGP repository as a best practice on how to easily set up FAIR-compliant and intuitive research data services. Furthermore, the integration of the ELIXIR Authentication and Authorization Infrastructure (AAI) and data discovery services are introduced as means to lower technical barriers and to increase the visibility of research data. Conclusion The e!DAL software matured to a powerful and FAIR-compliant infrastructure, while keeping the focus on flexible setup and integration into existing infrastructures and into the daily research process.

Download Full-text

Connecting Data Publication to the Research Workflow: A Preliminary Analysis

International Journal of Digital Curation ◽

10.2218/ijdc.v12i1.533 ◽

2017 ◽

Vol 12 (1) ◽

pp. 88-105 ◽

Cited By ~ 3

Author(s):

Sünje Dallmeier-Tiessen ◽

Varsha Khodiyar ◽

Fiona Murphy ◽

Amy Nurnberger ◽

Lisa Raymond ◽

...

Keyword(s):

Working Group ◽

Research Data ◽

Data Publishing ◽

Data Repository ◽

Research Activity ◽

Data Repositories ◽

Data Publication ◽

Loosely Coupled ◽

Definition Of ◽

Data Documentation

The data curation community has long encouraged researchers to document collected research data during active stages of the research workflow, to provide robust metadata earlier, and support research data publication and preservation. Data documentation with robust metadata is one of a number of steps in effective data publication. Data publication is the process of making digital research objects ‘FAIR’, i.e. findable, accessible, interoperable, and reusable; attributes increasingly expected by research communities, funders and society. Research data publishing workflows are the means to that end. Currently, however, much published research data remains inconsistently and inadequately documented by researchers. Documentation of data closer in time to data collection would help mitigate the high cost that repositories associate with the ingest process. More effective data publication and sharing should in principle result from early interactions between researchers and their selected data repository. This paper describes a short study undertaken by members of the Research Data Alliance (RDA) and World Data System (WDS) working group on Publishing Data Workflows. We present a collection of recent examples of data publication workflows that connect data repositories and publishing platforms with research activity ‘upstream’ of the ingest process. We re-articulate previous recommendations of the working group, to account for the varied upstream service components and platforms that support the flow of contextual and provenance information downstream. These workflows should be open and loosely coupled to support interoperability, including with preservation and publication environments. Our recommendations aim to stimulate further work on researchers’ views of data publishing and the extent to which available services and infrastructure facilitate the publication of FAIR data. We also aim to stimulate further dialogue about, and definition of, the roles and responsibilities of research data services and platform providers for the ‘FAIRness’ of research data publication workflows themselves.

Download Full-text

A Content Analysis of Indian Research Data Repositories Prospects and Possibilities

DESIDOC Journal of Library & Information Technology ◽

10.14429/djlit.39.06.15137 ◽

2019 ◽

Vol 39 (06) ◽

pp. 280-289 ◽

Cited By ~ 1

Author(s):

Raj Kumar Bhardwaj

Keyword(s):

Application Programming Interface ◽

Research Data ◽

Identification System ◽

Microsoft Excel ◽

Data Repositories ◽

Data Formats ◽

Author Identification ◽

Metadata Standards ◽

Application Programming ◽

Content Coverage

The study aims to trace the development of Indian research data repositories (RDRs) and explore their content with the view of identifying prospects and possibilities. Further, it analyses the distribution of data repositories on the basis of content coverage, types of content, author identification system followed, software and the application programming interface used, subject wise number of repositories etc. The study is based on data repositories listed on the registry of data repositories accessible at http://www.re3data.org.The dataset was exported in Microsoft Excel format for analysis. A simple percentage method was followed in data analyses and results are presented through Tables and Figures. The study found a total of 2829 data repositories in existence worldwide. Further, it was seen that 1526 (53.9 %) are open and 924 (32.4 %) are restricted data repositories. Also, there are embargoed data repositories numbering 225 (8.0 %) and closed ones numbering 154 (5.4 %). There are 2829 RDRs covering 72 countries in the world. The study found that out of total 45 Indian RDRs, only 30 (67 %) are open, followed by restricted 12 (27 %) and 3 (6 %) that are closed. Majority of Indian RDRs (20) were developed in the year 2014. The study found that the majority of Indian RDRs (17) are‘disciplinary’. Further, the study also revealed that statistical data formats are available in a maximum of 31 (68.9 %) Indian RDRs. It was also seen that the majority of Indian RDRs (28) has datasets relating to ‘Life Sciences’. It was identified that only 20% of data repositories have been using metadata standards in metadata; the remaining 80% do not use any standards in metadata entry. This study covered only the research data repositories in India registered on the registry of data repositories. RDRs not listed in the registry of data repositories are left out.

Download Full-text

A study of Open Access research data repositories developed by BRICS countries

Digital Library Perspectives ◽

10.1108/dlp-02-2020-0012 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Safat Mushtaq Misgar ◽

Ajra Bhat ◽

Zahid Ashraf Wani

Keyword(s):

Open Access ◽

Emerging Economies ◽

Research Policy ◽

Research Output ◽

Research Data ◽

Data Repository ◽

Research Activity ◽

Data Repositories ◽

Content Type ◽

Subject Coverage

Purpose In the present era, research data is a concern for researchers, as they are trying to find new ways to communicate their research findings and conclusions to other researchers in order to increase visibility and credibility. BRICS nations are fast emerging economies and contribute significantly in research output. This study makes an effort to analyze and explore the role of BRICS nations towards open access research data repository registered with Registry of Research Data Repositories. Design/methodology/approach The data were gathered from re3data repository, and the search was limited to BRICS nations. The data were further analyzed and tabulated as per set parameters, namely, country-wise distribution, types of contents, subject coverage and language diversity. Findings The findings depict that in terms of strength, India has the highest number of data repositories, thereby achieved the first rank among BRICS nations, and South Africa has the least number of data repositories, whereas in terms of content type and subject coverage, India again is leading among BRICS nations. The English language is used by repositories as the main language of the interface. Practical implications The study helps to understand the development of research data repositories by BRICS nations. The study is further beneficial to researchers, as Registry of Research Data Repository provides a single platform to access repositories from various disciplines. Readily available data saves time, money and efforts of researchers and helps the researcher in completing their research activity in a very short span of time. Originality/value The paper has investigated open access data repositories of BRICS nation that has not been attempted earlier. This gives readers comprehensive overview of research data repositories developed in fast emerging economies of the global. The paper can be very helpful for information managers, OA promoters and education and research policy makers to devise plans and policy bearing in mind the evolving research channels in emerging economies.

Download Full-text

Mechanisms of gene death in the Red Queen race revealed by the analysis ofde novomicroRNAs

10.1101/349217 ◽

2018 ◽

Author(s):

Guang-An Lu ◽

Yixin Zhao ◽

Ao Lan ◽

Zhongqi Liufu ◽

Haijun Wen ◽

...

Keyword(s):

De Novo ◽

Mirna Gene ◽

Red Queen ◽

Fitness Effects ◽

New Genes ◽

Knockout Mutants ◽

Laboratory Populations ◽

The Many ◽

The Red Queen

AbstractThe prevalence ofde novocoding genes is controversial due to the length and coding constraints. Non-coding genes, especially small ones, are freer to evolvede novoby comparison. The best examples are microRNAs (miRNAs), a large class of regulatory molecules ~22 nt in length. Here, we study 6de novomiRNAs inDrosophilawhich, like most new genes, are testis-specific. We ask how and whyde novogenes die because gene death must be sufficiently frequent to balance the many new births. By knocking out each miRNA gene, we could analyze their contributions to each of the 9 components of male fitness (sperm production, length, competitiveness etc.). To our surprise, the knockout mutants often perform better in some components, and slightly worse in others, than the wildtype. When two of the younger miRNAs are assayed in long-term laboratory populations, their total fitness contributions are found to be essentially zero. These results collectively suggest that adaptivede novogenes die regularly, not due to the loss of functionality, but due to the canceling-out of positive and negative fitness effects, which may be characterized as “quasi-neutrality”. Sincede novogenes often emerge adaptively and become lost later, they reveal ongoing period-specific adaptations, reminiscent of the “Red-Queen” metaphor for long term evolution.

Download Full-text

Introduction to the Special JeSLIB Issue on Data Curation in Practice

Journal of eScience Librarianship ◽

10.7191/jeslib.2021.1222 ◽

2021 ◽

Vol 10 (3) ◽

Author(s):

Cynthia Hudson Vitale ◽

Jake R. Carlson ◽

Hannah Hadley ◽

Lisa Johnston

Keyword(s):

Research Integrity ◽

Scientific Communication ◽

Research Data ◽

Data Curation ◽

Research Projects ◽

Academic Institutions ◽

Data Repositories ◽

Communication Processes ◽

Potential Impact

Research data curation is a set of scientific communication processes and activities that support the ethical reuse of research data and uphold research integrity. Data curators act as key collaborators with researchers to enrich the scholarly value and potential impact of their data through preparing it to be shared with others and preserved for the long term. This special issues focuses on practical data curation workflows and tools that have been developed and implemented within data repositories, scholarly societies, research projects, and academic institutions.

Download Full-text

Data management planning and repository demands for qualitative research

KWALON ◽

10.5117/2016.021.001.006 ◽

2016 ◽

Vol 21 (1) ◽

Author(s):

René van Horik

Keyword(s):

Data Management ◽

Research Data ◽

Digital Data ◽

Data Repository ◽

Data Sets ◽

Data Repositories ◽

Management Planning ◽

Research Activities ◽

Trusted Data ◽

New Research

Summary Nowadays, research without a role for digital data and data analysis tools is barely possible. As a result, we see an increasing interest in research data management, as this enables the replication of research outcomes and the reuse of research data for new research activities. Data management planning outlines how to handle data, both during research and after the research is completed. Trusted data repositories are places were research data are archived and made available for the long term. This article covers the state of the art concerning data management and data repository demands with a focus on qualitative data sets.

Download Full-text