Proceedings of Balisage: The Markup Conference 2008

A Hybrid Parallel Processing for XML Parsing and Schema Validation

Proceedings of Balisage: The Markup Conference 2008 ◽

10.4242/balisagevol1.wu01 ◽

2009 ◽

Cited By ~ 1

Author(s):

Yu Wu ◽

Qi Zhang ◽

Zhiqiang Yu ◽

Jianhui Li

Keyword(s):

Parallel Processing ◽

The Other ◽

Xml Parsing ◽

Data Parallel ◽

Desktop Computers ◽

Xml Document ◽

Pipeline Processing ◽

Overall Performance ◽

Performance Results ◽

Computing Machines

XML is playing crucial roles in web services, databases, and document representing and processing. However, the processing of XML document has been regarded as the main performance bottleneck especially for the processing of very large XML data. On the other hand, multi-core processing gains increasingly popularity both on the desktop computers and server computing machines. To take full advantage of multi-cores, we present a novel hybrid parallel XML processing model, which combines data-parallel and pipeline processing. It first partitions the XML by chunks to perform data parallel processing for both XML parsing and schema validation, then organize and execute them as a two stage pipeline to exploit more parallelism. The hybrid parallel XML processing model has shown great overall performance advantage on multi-core platform as indicated by the experiment performance results.

Download Full-text

Cool versus useful

Proceedings of Balisage: The Markup Conference 2008 ◽

10.4242/balisagevol1.usdin01 ◽

2009 ◽

Author(s):

B. Tommie Usdin

Keyword(s):

Document Modeling ◽

Gravitational Pull ◽

Tool Technology

True versus Useful, or True versus Likely-to-be-useful, are tradeoffs we find ourselves making in document modeling and many other markup-related situations all the time. But Cool versus Useful is a far more difficult tradeoff, especially since our world now includes a number of very cool techniques, tools, and specifications. Cool toys can have a lot of gravitational pull; attracting attention, users, projects, and funding. Unfortunately, there is sometimes a disconnect between the appeal of a particular tool/technology and its applicability in a particular circumstance.

Download Full-text

Markup Discontinued

Proceedings of Balisage: The Markup Conference 2008 ◽

10.4242/balisagevol1.sperberg-mcqueen01 ◽

2009 ◽

Cited By ~ 7

Author(s):

C. M. Sperberg-McQueen ◽

Claus Huitfeldt

Keyword(s):

Data Structure ◽

Data Structures ◽

The Other ◽

Discontinuous Elements ◽

Parent Child

That the textual phenomena of interest for markup are not always hierarchically arranged is well known and widely discussed. Less frequently discussed is the fact that they are also not always contiguous, so that the units of our analysis cannot always correspond to single elements in the document. Various notations for discontinuous elements exist, but the mapping from those notations to data structures has not been well analysed or understood. And as far as we know, there are no standard mechanisms for validating discontinuous elements. We propose a data structure (a modification of the Goddag structure) to better handle discontinuous elements: we relax the rule that every pair of elements where one contains the other be related by a path of parent/child links. Parent/child links are then not an automatic result of containment. We conclude with a brief sketch of the issues involved in extending current validation mechanisms to handle discontinuity.

Download Full-text

Secure Publishing using Schema-level Role-based Access Control Policies for Fragments of XML Documents

Proceedings of Balisage: The Markup Conference 2008 ◽

10.4242/balisagevol1.muldner01 ◽

2009 ◽

Author(s):

Tomasz Müldner ◽

Robin McNeill ◽

Jan Krzysztof Miziołek

Keyword(s):

Access Control ◽

Fixed Number ◽

Original Structure ◽

Role Based Access Control ◽

Control Policies ◽

Xml Documents ◽

Access Control Policies ◽

Minimum Number ◽

Role Based ◽

Implementation Tool

Popularity of social networks is growing rapidly and secure publishing is an important implementation tool for these networks. At the same time, recent implementations of access control policies (ACPs) for sharing fragments of XML documents have moved from distributing to users numerous sanitized sub-documents to disseminating a single document multi-encrypted with multiple cryptographic keys, in such a way that the stated ACPs are enforced. Any application that uses this implementation of ACPs will incur a high cost of generating keys separately for each document. However, most such applications, such as secure publishing, use similar documents, i.e. documents based on a selected schema. This paper describes RBAC defined at the schema level, (SRBAC), and generation of the minimum number of keys at the schema level. The main advantage of our approach is that for any application that uses a fixed number of schemas, keys can be generated (or even pre-generated) only once, and then reused in all documents valid for the given schema. While in general, key generation at the schema level has to be pessimistic, our approach tries to minimize the number of generated keys. Incoming XML documents are efficiently encrypted using single-pass SAX parsing in such a way that the original structure of these documents is completely hidden. We also describe distributing to each user only keys needed for decrypting accessible nodes, and for applying the minimal number of encryption operations to an XML document required to satisfy the protection requirements of the policy.

Download Full-text

Structural Metadata and the Social Limitation of Interoperability: A Sociotechnical View of XML and Digital Library Standards Development

Proceedings of Balisage: The Markup Conference 2008 ◽

10.4242/balisagevol1.mcdonough01 ◽

2009 ◽

Cited By ~ 3

Author(s):

Jerome McDonough

Keyword(s):

Local Control ◽

Digital Library ◽

Markup Languages ◽

The Past ◽

Library Community ◽

The Social ◽

The World ◽

Standards Development ◽

External Connection ◽

Standard Development

The past decade has seen the rise of both the XML standard and a variety of XML-based structural metadata schemas within the digital library community. Both XML itself, and the metadata schemas developed by the digital library community can be considered as cases of sociotechnical artifacts, constructions that bear within them their designers' worldview of how people within the world should appropriate and use their technology. If we examine the metadata schemas produced by the digital library community, we find that the designers' inscription strongly favors local control over encoding practice to insuring interoperability between institutions. If the goal of digital library interoperability is to be realized, schema designers will need to acknowledge the tension between local control and external connection using markup languages, and adjust their standard development efforts accordingly.

Download Full-text

Dirty laundry: Committee disasters, what happened, what we learned

Proceedings of Balisage: The Markup Conference 2008 ◽

10.4242/balisagevol1.bosak01 ◽

2009 ◽

Author(s):

Jon Bosak ◽

Mavis Cournane ◽

Patrick Durusau ◽

James David Mason ◽

David Orchard ◽

...

Keyword(s):

Working Group ◽

Group Process

Markup standards and projects are created, managed, and sometimes destroyed through group process. While this process is often a bit bumpy, there are some occasions when it goes spectacularly badly. Tales of these committee disasters can be not only entertaining, but also (and more importantly) informative. Panelists will spend a maximum of 10 minutes each, describing a committee/working group disaster of some sort, including: what went wrong, how it could have been prevented, how it could have been (or how it was) resolved. Participants may anonymize their tales of woe, provided they assure us that the events they describe actually occurred and that that they were actually involved.

Download Full-text

The Apache Qpid XML Exchange

Proceedings of Balisage: The Markup Conference 2008 ◽

10.4242/balisagevol1.robie01 ◽

2009 ◽

Author(s):

Jonathan Robie

Keyword(s):

Web Services ◽

Open Source ◽

High Performance ◽

Security Management ◽

Message Oriented Middleware ◽

Direct Support ◽

Solid Foundation ◽

Mission Critical ◽

Point To Point ◽

The Cost

XML is widely used for messaging applications. Message-oriented Middleware (MOM) is a natural fit for XML messaging, but it has been plagued by a lack of standards. Each vendor's system uses its own proprietary protocols, so clients from one system generally can not communicate with servers from another system. Developers who are drawn to XML because it is simple, open, interoperable, language independent, and platform independent often use REST for messaging because it shares the same virtues. When XML developers need high-performance, guaranteed delivery, transactions, security, management, asynchronous notification, or direct support for common messaging paradigms like point-to-point, broadcast, request/response, and publish/subscribe, they have been forced to sacrifice some of the virtues that drew them to XML in the first place. Java JMS is an API, defined only for Java, and it does not define a wire protocol that would allow applications running on different platforms or written in different languages to interoperate. SOAP and Web Services offer interoperability if the same underlying protocols are used and if the same WSI-protocol is used by all parties, but at the cost of more complexity than a MOM system. And as the basic components of enterprise messaging have been added piece by piece to the original specifications, Web Services have become complex, defined in a large number of overlapping specifications, without a coherent and simple architecture. The new Advanced Message Queueing Protocol (AMQP) is an open, language independent, platform independent standard for enterprise messaging. It provides precisely the coherent and simple architecture that has been missing for sophisticated messaging applications. Red Hat Enterprise MRG includes a multi-language, multi-platform, open source implementation of AMQP. We develop the messaging component as part of the upstream Apache Qpid project. In order to meet the needs of XML messaging systems, we contributed the Apache Qpid XML Exchange, which provides XQuery-based routing for XML content and message properties. Together, AMQP, Apache Qpid, and the Qpid XML Exchange provide a solid foundation for mission critical XML messaging applications.

Download Full-text

An Onion of Documents and Metadata

Proceedings of Balisage: The Markup Conference 2008 ◽

10.4242/balisagevol1.mason01 ◽

2009 ◽

Author(s):

D. Matthew Kelleher ◽

Albert J. Klein ◽

James David Mason

Keyword(s):

Shelf Life ◽

National Security ◽

Test Equipment ◽

Computing Environment ◽

Output Data ◽

Distant Future ◽

Work In Progress ◽

Xml Documents ◽

Xml Publishing

When a product cannot be tested as a finished unit, its warranty, as it were, depends on extensive testing of its component parts and assemblies. The record for products of the Y-12 National Security Complex has for many years been in the form of lengthy paper documents. Recently we have begun a process to capture some of this information in XML documents. However, this is not simply another XML publishing project. Because our products have a potentially very long shelf life and we cannot foresee the computing environment in their distant future existence, we must take extraordinary measures to document not only the products themselves but also the environment in which the documentation has been prepared. Adding complexity to this documentation challenge is a parallel effort to capture the output data of test equipment and wrap it in XML. While this project is very much a work in progress, we can see that one major component of its possible success will be the coordination of complex metadata.

Download Full-text

Translation between RDF and Topic Maps: Divide and Translate

Proceedings of Balisage: The Markup Conference 2008 ◽

10.4242/balisagevol1.dichev01 ◽

2009 ◽

Cited By ~ 1

Author(s):

Christo Dichev ◽

Darina Dicheva ◽

Boriana Ditcheva ◽

Mike Moran

Keyword(s):

The Novel ◽

Topic Maps ◽

Data Translation ◽

The Right ◽

Key Aspects ◽

Topic Map

This paper addresses the issue of sharing and integrating data across RDF and Topic Map representations. The novel aspect of tackling the RDF - Topic Maps interoperability problem is the attempt to identify the right balance between the following key aspects: (i) semantics-preserving data translation; (ii) completeness of the translation; (iii) pragmatics and usability of the translation. The proposed strategy towards achieving this goal is based on exploiting the ontological correspondence between RDF and Topic Maps. The design focus is placed on a translation respecting the meaning and the readability of the RDF - Topic Maps translation. The paper analyzes the feasibility of the interoperability task, presents some requirements derived from this analysis, and proposes a method for RDF - Topic Maps translation. The proposed method is implemented as a plug-in of the TM4L topic maps editor.

Download Full-text

Linking Page Images to Transcriptions with SVG

Proceedings of Balisage: The Markup Conference 2008 ◽

10.4242/balisagevol1.cayless01 ◽

2009 ◽

Cited By ~ 1

Author(s):

Hugh A. Cayless

Keyword(s):

Open Source ◽

Original Image ◽

Standard Methods ◽

Fine Grained

This paper will present the results of ongoing experimentation with the linking of manuscript images to TEI transcriptions. The method being tested involves the automated conversion of images containing text to SVG, using Open Source tools. Once the text has been converted to SVG paths, these can be grouped in the document to mark the words therein and these groups can then be linked using standard methods to tokenized versions of the transcriptions. The goal of these experiments is to achieve a much more fine-grained linking and annotation mechanism than is so far possible with available tools, e.g. the Image Markup Tool and TEI P5 facsimile markup, both of which annotate only rectangular sections of an image. The method envisioned here would produce a legible tracing of the word, expressed in XML, to which transcripts and annotations might be attached and which can be superimposed upon the original image.

Download Full-text

Proceedings of Balisage: The Markup Conference 2008
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Mulberry Technologies, Inc.

A Hybrid Parallel Processing for XML Parsing and Schema Validation

Cool versus useful

Markup Discontinued

Secure Publishing using Schema-level Role-based Access Control Policies for Fragments of XML Documents

Structural Metadata and the Social Limitation of Interoperability: A Sociotechnical View of XML and Digital Library Standards Development

Dirty laundry: Committee disasters, what happened, what we learned

The Apache Qpid XML Exchange

An Onion of Documents and Metadata

Translation between RDF and Topic Maps: Divide and Translate

Linking Page Images to Transcriptions with SVG

Export Citation Format

Proceedings of Balisage: The Markup Conference 2008Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Mulberry Technologies, Inc.

A Hybrid Parallel Processing for XML Parsing and Schema Validation

Cool versus useful

Markup Discontinued

Secure Publishing using Schema-level Role-based Access Control Policies for Fragments of XML Documents

Structural Metadata and the Social Limitation of Interoperability: A Sociotechnical View of XML and Digital Library Standards Development

Dirty laundry: Committee disasters, what happened, what we learned

The Apache Qpid XML Exchange

An Onion of Documents and Metadata

Translation between RDF and Topic Maps: Divide and Translate

Linking Page Images to Transcriptions with SVG

Proceedings of Balisage: The Markup Conference 2008
Latest Publications