Proceedings of the Symposium on Cultural Heritage Markup
Latest Publications


TOTAL DOCUMENTS

12
(FIVE YEARS 0)

H-INDEX

1
(FIVE YEARS 0)

Published By Mulberry Technologies, Inc.

9781935958123

Author(s):  
Alexei Lavrentiev ◽  
Yann Leydier ◽  
Dominique Stutzmann

This papers presents an experience of specifying and implementing an XML format for text to image alignment at word and character level within the TEI framework. The format in question is a supplementary markup layer applied to heterogeneous transcriptions of medieval Latin and French manuscripts encoded using different “flavors” of the TEI (normalized for critical editions, diplomatic or palaeographic transcriptions). One of the problems that had to be solved was identifying “non-alignable” spans in various kinds of transcriptions. Originally designed in the framework of a research project on the ontology of letter-forms in medieval Latin and vernacular (mostly French) manuscripts and inscriptions, this format can be of use for all kinds of projects that involve fine-grain alignment of transcriptions with zones on digital images.


Author(s):  
Joshua D. Sosin

IDEs aims to provide core infrastructure for the field of Greek epigraphy (the study of texts carved on stone), by supporting annotation across an array of disparate digital resources. Epigraphy was born in the early to mid 19th century and has been productive ever since. Perhaps a million Latin and Greek inscriptions are known today. These objects are often badly preserved, physically removed from their original context or even lost; many are repeatedly re-published, emended, joined to other fragments, re-dated, re-provenanced, and not only do they lack a single unambiguous identification system, but many thousands are known by multiple, competing and badly controlled bibliographic shorthands. They are unstable in many senses. Print publication of inscriptions in the late 19th century and throughout the 20th is marked by a considerable and fulsome descriptive rigor. In the generation straddling the 20th and 21st centuries, scholars developed a rich variety digital epigraphy tools. But in all cases these were descendants of previous print resources and entailed significant suppression of the semantic richness that was the (albeit loosely controlled) norm in print publication. In a way, then, much of our effort is devoted to creating a framework for allowing users to re-infuse a suite of late 20th-century tools with the 19th-century scholarly sensibility (and even the very data!) that long informed print epigraphy.


Author(s):  
Nathan P. Gibson ◽  
Winona Salesky ◽  
David A. Michelson

One of the major digital challenges of the Syriaca.org research project has been to encode and visualize personal names of authors in Middle Eastern languages (especially Syriac and Arabic). TEI-XML and HTML are digital standards for the encoding and visualization of cultural heritage data and have features for encoding names and displaying Middle Eastern languages. Because these formats were developed primarily for Western cultural data, however, representing our non-Western data in these formats has required complex adaptation particularly in regard to marking up name parts, customizing search algorithms, displaying bidirectional text, and displaying Syriac text with embedded fonts. These requirements have led us to develop small-scale systems that may be of use to other cultural heritage preservation projects involving names for ancient and, especially, non-Western entities.


Author(s):  
Pietro Maria Liuzzo

EAGLE, The Europeana network for Ancient Greek and Latin Epigraphy, a project co-founded by the European Commission, has as its sole aim to harmonize and aggregate data for Europeana “the trusted source of cultural heritage”. Very easy to say, but no easy task in practice. It is extremely challenging to achieve the trust users have in the original resources with the aggregated content.


Author(s):  
Robert Walpole

This paper describes the context and rationale for developing a new metadata vocabulary for digital records at the UK National Archives as part of the Digital Records Infrastructure project. It describes the specific requirements for metadata in relation to digital records and the evolution of an approach to modelling this metadata which is based on Dublin Core Metadata Initiative (DCMI) Metadata Terms and RDF/XML as a markup solution. It will demonstrate not only how this solution meets the archival requirements but also enables powerful new ways of searching records and linking them to other information sources.


Author(s):  
Raffaele Viglianti

The Shelley-Godwin Archive uses TEI to encode manuscript text from two perspectives: one focused on the document and one focused on the text. This short presentation addresses issues of adopting stand-off markup as a technique for the project's encoding goals.


Author(s):  
Robin La Fontaine

Cultural Heritage markup can quickly become complex because of the need to represent multiple, and even overlapping, hierarchical structures. It can therefore become very difficult to maintain correctly. This talk suggests that a better approach is now possible: markup that is designed to represent different aspects of a text could be handled separately from the point of view of checking and maintenance, and then only combined into a single document when needed, e.g. for some kind of analysis. Advances in comparison and merge tools for XML make this a possibility.


Author(s):  
Amir Zeldes

This paper briefly discusses markup, metadata and evaluation issues that arise when projects do not include a critical edition adjudicating different variants, but instead incorporate multiple, full diplomatic transcriptions. When used naively, such corpora will cause duplicate results that are hard to discern in quantitative studies, and in cases of incomplete, unexact or fragmentary parallel witnesses, substantially complicate the decision about what users actually want to have. Using a case study on Coptic manuscripts, the paper suggests that as a provisional strategy, documents should be partitioned as finely grained as necessary such that each section's parallel witness status is encoded, and that for each parallel set, it can be useful to define a redundancy metadatum which identifies the 'best' candidate for quantitative study among the available choices.


Author(s):  
Laura Randall

In these times of electronic journal publishing, adopting a continuous publication model is easy: Open an issue, publish articles electronically as they flow through the pipeline, close an issue. Even print journals offer this quick access to the content, publishing online before issuing the printed publication. The goal is clear: Provide access to the information as soon as possible. These models incorporating quick electronic access offer clear benefits to the community, so it's no wonder the model is so widely adopted. But these models aren't new to the digital age. They're not exclusive to electronic publishing. Almost 200 years ago, at least one journal publisher was facing the same struggle of how to get information to their readers quickly. In the editor's words, from January 1828, "We only ask that those printed sheets which lie from one to thirteen weeks in the printing-office...may appear...half-monthly.... To those wo startle at innovation, we put forth this plain question:—Can there be any objection, that each packet...of this Journal should go forth to those who wish to have it every fifteen days...?" This publication model, familiar as it is, presents its own set of challenges to our modern system. The journal is being digitized as part of a National Library of Medicine (NLM) and Wellcome Library project to digitize NLM's collection and be made available to the public through PubMed Central (https://www.nlm.nih.gov/news/welcome_library_agreement.html). So our challenge now is this: How do we integrate a 200-year-old publication model in current vocabularies when we've re-invented the same model in a different medium?


Author(s):  
Hugh A. Cayless

Cultural heritage materials are remarkable for their complexity and heterogenity. This often means that when you’ve solved one problem, you’ve solved one problem. Arrayed against this difficulty, we have a nice big pile of tools and technologies with an alphabet soup of names like XML, TEI, RDF, OAIS, SIP, DIP, XIP, AIP, and BIBFRAME, coupled with a variety of programming languages or storage and publishing systems. All of our papers today address in some way the question of how you deal with messy, complex, human data using the available toolsets and how those toolsets have to be adapted to cope with our data. How do you avoid having your solution dictated by the tools available? How do you know when you’re doing it right? Our speakers are all trying, in various ways, to reconfigure their tools or push past those tools’ limitations, and they are going to tell us how they’re doing it.


Sign in / Sign up

Export Citation Format

Share Document