scholarly journals Towards the Development of a Test Corpus of Digital Objects for the Evaluation of File Format Identification Tools and Signatures

2012 ◽  
Vol 7 (1) ◽  
pp. 16-26 ◽  
Author(s):  
Andrew Fetherston ◽  
Tim Gollins

The digital preservation community currently utilises a number of tools and automated processes to identify and validate digital objects. The identification of digital objects is a vital first step in their long-term preservation, but the results returned by tools used for this purpose are lacking in transparency, and are not easily tested or verified. This paper suggests that a test corpus of digital objects is one way of providing this verification and validation, ultimately improving trust in the tools, and providing further stimulus to their development. Issues to be considered are outlined, and attention is drawn to particular examples of existing digital corpora which could conceivably provide a useable framework or starting point for our own communities needs. This paper does not seek to answer all questions in this area, but merely attempts to set out areas for consideration in any next step that is taken.

Author(s):  
Sheila Morrissey ◽  
John Meyer ◽  
Sushil Bhattarai ◽  
Sachin Kurdikar ◽  
Jie Ling ◽  
...  

In the problem space of long-term preservation of digital objects, the disciplined use of XML affords a reasonable solution to many of the issues associated with ensuring the interpretability and renderability of at least some digital artifacts. This paper describes the experience of Portico, a digital preservation service that preserves scholarly literature in electronic form. It describes some of the challenges and practices entailed in processing and producing XML for the archive, including issues of syntax, semantics, linking, versioning, and prospective issues of scale, variety of formats, and the larger infrastructure of tools and practices required for the use of XML for the long haul.


2018 ◽  
Vol 35 (2) ◽  
pp. 8-12 ◽  
Author(s):  
Sneha Tripathi

Purpose The purpose of this study is to investigate the concept of digital preservation and traditional preservation per se and discusses various issues related to long-term preservation in a digital environment. Design/methodology/approach The study attempts to look into the various aspects of preservation in context of digital objects (borne or digitalized) especially. Bundling of an object (digital), digital storage, quality control and risk preparedness are some of the pointers studied to perceive an overall scenario for long-term preservation of an object. Findings Various methods have been suggested to deal with the issues related to long-term preservation of an object which can be used to frame an organization’s policy for long-term preservation. Originality/value The study emphasizes on collective measures incorporating traditional and digital means to ensure long-term preservation. It lists down various checklists to deal with various issues pertinent to long-term digital preservation.


Author(s):  
Gareth Kay ◽  
Libor Coufal ◽  
Mark Pearson

This article introduces the National Library of Australia’s Digital Preservation Knowledge Base which helps the Library to manage digital objects from its collections over the long term. The Knowledge Base includes information on file formats, rendering software, operating systems, hardware and, most importantly, the relationships between them. Most of the work on the Knowledge Base over the last few years has been focused on the mapping of functional relationships between file formats, their versions and software applications. The information is gathered through unique empirical research and is initially being recorded in a multiple-worksheet Excel file in a semi-structured format, though development of a prototype graph database is underway.


2009 ◽  
Vol 4 (2) ◽  
pp. 4-7
Author(s):  
Andrew Wilson

Long-term preservation of digital objects requires curators to be able to guarantee the archival authenticity of the objects in a digital repository. In IJDC 4(1), Ronald Jantz suggested that a digital certificate is sufficient to ensure authenticity. The letter writer takes issue with this view and points out some of the archival misinterpretations in the Jantz article. He maintains that the archival literature is a rich source for discussions of authenticity that has been ignored to the detriment of Jantz’s arguments.


2018 ◽  
pp. 218-233
Author(s):  
Mayank Yuvaraj

During the course of planning an institutional repository, digital library collections or digital preservation service it is inevitable to draft file format policies in order to ensure long term digital preservation, its accessibility and compatibility. Sincere efforts have been made to encourage the adoption of standard formats yet the digital preservation policies vary from library to library. The present paper is based against this background to present the digital preservation community with a common understanding of the common file formats used in the digital libraries or institutional repositories. The paper discusses both open and proprietary file formats for several media.


2013 ◽  
pp. 74-86
Author(s):  
David Giaretta

To preserve digitally encoded information over a long term following the OAIS Reference Model requires that the information remains accessible, understandable and usable by a specified Designated Community. These are significant challenges for repositories. It will be argued that infrastructure which is needed to support this preservation must be seen in the context of the broader science data infrastructure which international and national funders seek to put in place. Moreover aspects of the preservation components of this infrastructure must themselves be preservable, resulting in a recursive system which must also be highly adaptable, loosely coupled and asynchronous. Even more difficult is to be able to judge whether any proposal is actually likely to be effective. From the earliest discussions of concerns about the preservability of digital objects there have been calls for some way of judging the quality of digital repositories. In this chapter several interrelated efforts which contribute to solutions for these issues will be outlined. Evidence about the challenges which must be overcome and the consistency of demands across nations, disciplines and organisations will be presented, based on extensive surveys which have been carried out by the PARSE.Insight project (http://www.parse-insight.eu). The key points about the revision of the OAIS Reference Model which is underway will be provided; OAIS provides many of the key concepts which underpin the efforts to judge solutions. In the past few years the Trustworthy Repositories Audit and Certification: Criteria and Checklist (TRAC) document has been produced, as well as a number of related checklists. These efforts provide the background of the international effort (the RAC Working Group http://wiki.digitalrepositoryauditandcertification.org) to produce a full ISO standard on which an accreditation and certification process can be built. If successful this standard and associated processes will allow funders to have an independent evaluation of the effectiveness of the archives they support and data producers to have a basis for deciding which repository to entrust with their valuable data. It could shape the digital preservation market. The CASPAR project (http://www.casparpreserves.eu) is an EU part funded project with total spend of 16MEuros which is trying to faithfully implement almost all aspects of the OAIS Reference Model in particular the Information Model. The latter involves tools for capturing all types of Representation Information (Structure, Semantics and all Other types), and tools for defining the Designated Community. This chapter will describe implementations of tools and infrastructure components to support repositories in their task of long term preservation of digital resources, including the capture and preservation of digital rights management and evidence of authenticity associated with digital objects. In order to justify their existence, most repositories must also support contemporaneous use of contemporary as well as “historical” resources; the authors will show how the same techniques can support both, and hence link to the fuller science data infrastructure.


2013 ◽  
Vol 8 (1) ◽  
pp. 120-130
Author(s):  
Ross Spencer

To preserve digital information it is vital that the format of that information can be identified, in-perpetuity. This is the major focus of research within the field of Digital Preservation. The National Archives of the UK called for the Digital Preservation and Digital Curation communities to develop a test corpus of digital objects to help further develop tools to aid this purpose. Following that call, an attempt has been made to develop the suite.This paper initially outlines a methodology to generate a skeleton corpus using simple user-generated digital objects. It then explores the lessons learnt in the generation of a corpus using scripting language techniques from the file format signatures described in The National Archives PRONOM technical registry. It will also discuss the use of the digital signature for this purpose, the benefits of developing a test corpus using this technique. Finally, this paper will outline a methodology for future research before exploring how the community can best make use of the output of this project and how this project needs to be taken forward to completion.


2009 ◽  
Vol 4 (3) ◽  
pp. 146-155 ◽  
Author(s):  
Dirk Von Suchodoletz ◽  
Jeffrey Van der Hoeven

Emulation used as a long-term preservation strategy offers the potential to keep digital objects in their original condition and experience them within their original computer environment. However, having just an emulator in place is not enough. To apply emulation as a fully fledged strategy, an automated and user-friendly approach is required. This cannot be done without knowledge and contextual information of the original software. This paper combines the existing concept of a view path, which captures the contextual information of software, together with new insights into improving the concept with extra metadata. It provides regularly updated instructions for archival management to preserve and access its artefacts. The view-path model requires extensions to the metadata set of the primary object of interest and depends on additionally stored secondary objects for environment recreation like applications or operating systems. This article also addresses a strategy of rendering digital objects by running emulation processes remotely. The advantage of this strategy is that it improves user convenience while maximizing emulation capability.


Sign in / Sign up

Export Citation Format

Share Document