scholarly journals Examining Attributes of Open Standard File Formats for Long-term Preservation and Open Access

2012 ◽  
Vol 31 (4) ◽  
pp. 46 ◽  
Author(s):  
Eun G Park ◽  
Sam Oh

This study examines the attributes that have been used to assess file formats in literature and compiles the most frequently used attributes of file formats in order to establish open standard file format selection criteria.  A comprehensive review was undertaken to identify the current knowledge regarding file format selection criteria. The findings indicate that the most common criteria can be categorized into five major groups: functionality, metadata, openness, interoperability and independence. These attributes appear to be closely related. Additional attributes include presentation, authenticity, adoption, protection, preservation, reference and others. 

2018 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Roland Erwin Suri ◽  
Mohamed El-Saad

PurposeChanges in file format specifications challenge long-term preservation of digital documents. Digital archives thus often focus on specific file formats that are well suited for long-term preservation, such as the PDF/A format. Since only few customers submit PDF/A files, digital archives may consider converting submitted files to the PDF/A format. The paper aims to discuss these issues.Design/methodology/approachThe authors evaluated three software tools for batch conversion of common file formats to PDF/A-1b: LuraTech PDF Compressor, Adobe Acrobat XI Pro and 3-HeightsTMDocument Converter by PDF Tools. The test set consisted of 80 files, with 10 files each of the eight file types JPEG, MS PowerPoint, PDF, PNG, MS Word, MS Excel, MSG and “web page.”FindingsBatch processing was sometimes hindered by stops that required manual interference. Depending on the software tool, three to four of these stops occurred during batch processing of the 80 test files. Furthermore, the conversion tools sometimes failed to produce output files even for supported file formats: three (Adobe Pro) up to seven (LuraTech and 3-HeightsTM) PDF/A-1b files were not produced. Since Adobe Pro does not convert e-mails, a total of 213 PDF/A-1b files were produced. The faithfulness of each conversion was investigated by comparing the visual appearance of the input document with that of the produced PDF/A-1b document on a computer screen. Meticulous visual inspection revealed that the conversion to PDF/A-1b impaired the information content in 24 of the converted 213 files (11 percent). These reproducibility errors included loss of links, loss of other document content (unreadable characters, missing text, document part missing), updated fields (reflecting time and folder of conversion), vector graphics issues and spelling errors.Originality/valueThese results indicate that large-scale batch conversions of heterogeneous files to PDF/A-1b cause complex issues that need to be addressed for each individual file. Even with considerable efforts, some information loss seems unavoidable if large numbers of files from heterogeneous sources are migrated to the PDF/A-1b format.


2018 ◽  
pp. 218-233
Author(s):  
Mayank Yuvaraj

During the course of planning an institutional repository, digital library collections or digital preservation service it is inevitable to draft file format policies in order to ensure long term digital preservation, its accessibility and compatibility. Sincere efforts have been made to encourage the adoption of standard formats yet the digital preservation policies vary from library to library. The present paper is based against this background to present the digital preservation community with a common understanding of the common file formats used in the digital libraries or institutional repositories. The paper discusses both open and proprietary file formats for several media.


1999 ◽  
Vol 71 (8) ◽  
pp. 1549-1556 ◽  
Author(s):  
Peter Lampen ◽  
Jörg Lambert ◽  
R. J. Lancashire ◽  
R. S. McDonald ◽  
P. S. McIntyre ◽  
...  

Version 5.00 of the JCAMP-DX specifications were published for NMR and Mass Spectrometry file formats in Appl. Spectrosc.47, 1093-1099, (1993) and Appl. Spectrosc.48, 1545-1552, (1994). Since publication of these protocols developments in spectroscopy have led to a large number of requests for additions for applications not originally covered. Following careful consideration, it has become apparent that a few minor modifications will significantly increase the range of possible applications.In addition, new data labels have been introduced to ensure that files are year 2000 compliant and allow for conformity with good laboratory practices (GLP). These modifications are detailed in this publication as well as examples of the official NTUPLE JCAMP-DX definition as applied to NMR data.


2019 ◽  
Author(s):  
Niels Hulstaert ◽  
Timo Sachsenberg ◽  
Mathias Walzer ◽  
Harald Barsnes ◽  
Lennart Martens ◽  
...  

AbstractThe field of computational proteomics is approaching the big data age, driven both by a continuous growth in the number of samples analysed per experiment, as well as by the growing amount of data obtained in each analytical run. In order to process these large amounts of data, it is increasingly necessary to use elastic compute resources such as Linux-based cluster environments and cloud infrastructures. Unfortunately, the vast majority of cross-platform proteomics tools are not able to operate directly on the proprietary formats generated by the diverse mass spectrometers. Here, we presented ThermoRawFileParser, an open-source, crossplatform tool that converts Thermo RAW files into open file formats such as MGF and to the HUPO-PSI standard file format mzML. To ensure the broadest possible availability, and to increase integration capabilities with popular workflow systems such as Galaxy or Nextflow, we have also built Conda and BioContainers containers around ThermoRawFileParser. In addition, we implemented a user-friendly interface (ThermoRawFileParserGUI) for those users not familiar with command-line tools. Finally, we performed a benchmark of ThermoRawFileParser and msconvert to verify that the converted mzML files contain reliable quantitative results.


Sign in / Sign up

Export Citation Format

Share Document