Proceedings of the International Symposium on XML for the Long Haul: Issues in the Long-term Preservation of XML
Latest Publications


TOTAL DOCUMENTS

8
(FIVE YEARS 0)

H-INDEX

3
(FIVE YEARS 0)

Published By Mulberry Technologies, Inc.

9781935958024

Author(s):  
Georg Rehm ◽  
Oliver Schonefeld ◽  
Thorsten Trippel ◽  
Andreas Witt

Data providers, users, and funders alike want and need sustainability of language resources (e.g. language corpora, grammars, etc.); sustainability requires making the resources available according to defined processes, platforms, or archives in a reproducible and reliable way. A three-year project on sustainability of linguistic resources conducted at Tübingen, Hamburg, and Potsdam illuminates some of the difficulties: the prevalence of stand-off markup (requiring a layer of specialized tools atop the XML stack), machine-generated XML of low clarity, ad hoc non-standard tag sets, discoverability, and selection criteria for long-term archiving. XML and other standards are necessary but not sufficient ingredients in the mix.


Author(s):  
Liam R. E. Quin

When documents are stored for any significant length of time, or when they are used, whether continuously or occasionally, over an extended period, the original people and culture and context associated with their creation become unavailable. If the documents are to remain useful, it is necessary to retain sufficient knowledge about how they can be used that the future people involved can still gain value from them. This document is a position paper for discussion.


Author(s):  
Cathy Moran Hajo

Scholarly editions must be used for generations; by nature they require a stable long-term publication format. Some editors have eagerly embraced digital editing and XML, but many more editors remain unconvinced that digital publications can last as long as printed books. Community standards and DTDs for editions have not been widely adopted and editors lack consensus about what a digital edition should be. XML's stability and sustainability is critical to efforts to go beyond “the book,” and to develop new ways of presenting texts and scholarly commentary. To build 21st century editions, we need tools to make XML encoding easier, to encourage collaboration, to exploit social media, and to separate transcriptions of texts from the editorial scholarship applied to them.


Author(s):  
Andrew Dombrowski ◽  
Quinn Dombrowski

Previous literature characterizing XML semantics (Sperberg-McQueen et al. 2000, Renear et al. 2002, Piez 2002) takes reasonably syntactically and semantically plausible markup and/or schemas as a starting point. In contrast, for this paper we aim to work towards such a schema as an idealized end goal, by characterizing the necessary— if not sufficient— semantic constraints that differentiate a schema intended for archival use from nonsense and implausible schemas, as well as schemas that fail to sufficiently take semantics into account. In addition to the goal of providing a novel approach to the perenially thorny problem of XML semantics, we are particularly concerned with the interaction between the goals of archival purposes and XML semantics.


Author(s):  
Joshua Lubell

Product data can be usefully defined as structured information about objects that are produced by industrial and business processes. In terms of information types, data formats, usage, and lifespan, product data is both complex and diverse, encompassing 3D image modeling information, dimensions, tolerances, and other model annotations, supplementary material such as test analysis, videos, datasets, and human-readable documentation. Although the metadata issues in this problem space present some unique challenges, there are valuable lessons to be learned from the library metadata and packaging standards and how they relate to product metadata. Extending the library standards to represent subsets of information from emerging product lifecycle management standards could help tame the complexity of long-term archival of product data.


Author(s):  
Laine Ruus

Traditional quantitative social science data analysis requires three ingredients: the raw data, metadata (what we used to call a codebook), and software. Software changes all the time, within some limits. Raw data without metadata is useless: it might as well be generated by a random number generator. And metadata without data is like the index to a periodical the last remaining copy of which was sent for recycling last month. Over time, metadata have been expected to support many different functions, and microsolutions have never quite satisfied many, much less all, of those functions. Until recently, that is: a roughly 25-year process of historical evolution has led to DDI, the Data Documentation Initiative, which unites several levels of metadata in one emerging standard.


Author(s):  
Jeff Beck

PubMed Central (PMC) is an XML-based archive of life sciences journal literature at the U.S. National Institutes of Heath that allows public access to full-text journal articles. The archive was created in 2000 and has grown steadily to over 2 million records. The project has been successful in part because of the strict XML control and the flexibility that PMC givesre its submitters. This paper gives an overview of the PMC data evaluation process; the XML processing model; the PMC philosophy toward XML use, including use of the NLM DTD, XML Taggging Style, usability or reusablilty of the XML, public XML tools, and our people; and some challenges we continue to face maintaining the archive.


Author(s):  
Sheila Morrissey ◽  
John Meyer ◽  
Sushil Bhattarai ◽  
Sachin Kurdikar ◽  
Jie Ling ◽  
...  

In the problem space of long-term preservation of digital objects, the disciplined use of XML affords a reasonable solution to many of the issues associated with ensuring the interpretability and renderability of at least some digital artifacts. This paper describes the experience of Portico, a digital preservation service that preserves scholarly literature in electronic form. It describes some of the challenges and practices entailed in processing and producing XML for the archive, including issues of syntax, semantics, linking, versioning, and prospective issues of scale, variety of formats, and the larger infrastructure of tools and practices required for the use of XML for the long haul.


Sign in / Sign up

Export Citation Format

Share Document