An analysis of file format control in institutional repositories
Purpose – The purpose of this paper is to analyze the file formats of the digital objects stored in two of the largest open-access repositories in Spain, DDUB and TDX, and determines the implications of these formats for long-term preservation, focussing in particular on the different versions of PDF. Design/methodology/approach – To be able to study the two repositories, the authors harvested all the files corresponding to every digital object and some of their associated metadata using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and Open Archives Initiative Object Reuse and Exchange (OAI-ORE) protocols. The file formats were analyzed with DROID software and some additional tools. Findings – The results show that there is no alignment between the preservation policies declared by institutions, the technical tools available, and the actual stored files. Originality/value – The results show that file controls currently applied to institutional repositories do not suffice to grant their stated mission of long-term preservation of scientific literature.