Proceedings of Balisage: The Markup Conference 2019

As is typical for many society publishers, OSA–The Optical Society, has both a journal and a conference program. Integrating both journal articles and conference papers within a single data source opens up a pathway to conduct business intelligence analysis over the entire corpus of the published research material, which can benefit both programs and advance the society’s mission. In 2017, having successfully completed a project to convert almost 100 years of its journal legacy material to JATS XML, OSA decided to convert its conference content as well, tag it in a JATS-compatible way, and to combine both content segments in a single MarkLogic database. While it has been well-accepted that JATS and BITS cover the markup needs for journal and book content, respectively, it is less clear what Tag Set would be most suitable for tagging conference proceedings. Even though we thought “we had seen it all” in converting journal content, in the course of the project we learned that handling conference metadata and journal metadata presents very different challenges. In this paper, we share our experience with using BITS for marking up individual conference papers and how our business decisions shaped how we structure the XML. We demonstrate that because BITS was explicitly designed to enable the construction of books composed of units that could be part of many collections, the BITS metadata model is well-suited for representing conference paper’s nested collections, both event- and publication-related. To ensure data quality, we have built workflows, designed XML tools (e.g., Tag Subset, Schematron), and instituted visual QC procedures that allowed us to achieve our objective. We conclude our paper with lessons learned from this project and new opportunities its successful completion has opened up.

Download Full-text

Do we really want to see markup?

Proceedings of Balisage: The Markup Conference 2019 ◽

10.4242/balisagevol23.mason01 ◽

2019 ◽

Cited By ~ 1

Author(s):

James David Mason

Markup fanatics have long cried, “We need to see the markup!” Yet since the earliest stages of developing the SGML standard, there has been an urge even among standards developers to avoid having to write tags everywhere. The recent urge to create “Invisible XML” is but the latest symptom of a smoldering disease, from which I too suffer.

Download Full-text

Beyond OmniMark

Proceedings of Balisage: The Markup Conference 2019 ◽

10.4242/balisagevol23.wilmott01 ◽

2019 ◽

Cited By ~ 1

Author(s):

Sam Wilmott

Keyword(s):

Programming Languages ◽

Programming Language ◽

Markup Language ◽

Application Area

The field of programming languages is in continual flux: there are new languages coming along every few years. In the field of text and markup processing languages, things have settled down rather, with XSLT dominating and a few other languages like OmniMark filling in the gaps, but it is no more exempt from change than any other application area. Whether change is always improvement is not a certainty, but we must always be striving for improvement, so one hopes that there is such a thing that applies to our text and markup language field. This paper starts with an overview of some existing text and markup processing languages, and concludes with an outline and examples of a new programming language, that I hope will make text and markup processing easier than is now the case, or at least provide thoughts as to which directions things can go.

Download Full-text

XProc 3.0

Proceedings of Balisage: The Markup Conference 2019 ◽

10.4242/balisagevol23.walsh02 ◽

2019 ◽

Cited By ~ 1

Author(s):

Norman Walsh ◽

Achim Berndzen

Keyword(s):

Control Structures ◽

Modern Control

XProc 3.0 is an XML pipeline language for constructing markup centric workflows. With a rich vocabulary of steps and modern control structures, it allows the author to easily build complex pipelines.

Download Full-text

Accessibility: Not Just a Good Idea

Proceedings of Balisage: The Markup Conference 2019 ◽

10.4242/balisagevol23.perera01 ◽

2019 ◽

Cited By ~ 1

Author(s):

Chandi Perera

Keyword(s):

Developing Countries ◽

Visual Impairment ◽

Learning Disability ◽

Positive Impact ◽

Developed Countries ◽

Permanent Disability ◽

The World ◽

Legal Responsibilities ◽

Content Accessibility ◽

Global Population

Around 15% of the global population has a permanent disability, including approximately 285 million people with a visual impairment and an estimated 700 million people with dyslexia, the most common form of learning disability. The World Blind Union estimates less than 10% of published works are made into accessible formats in developed countries which drops to less than 1% in developing countries. As markup professionals and content models experts, there is a lot we can do to make a positive impact towards making more content accessible. This session will look at accessibility; our social, ethical, and legal responsibilities around content accessibility; and what we can do to make content more accessible.

Download Full-text

We Created Document Dysfunction

Proceedings of Balisage: The Markup Conference 2019 ◽

10.4242/balisagevol23.paoli01 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jean Paoli

Keyword(s):

Information Management ◽

Text Messages ◽

Small Data ◽

Human Beings ◽

Business Information ◽

Management Problems ◽

The Future

Some of us building software need to take a hard look in the mirror. For years, we have promised that technology would solve the world’s information management problems, but 85% of business information is still “dark data,” with potentially useful insights lost in a rising tide of disconnected documents, emails, Slack conversations, voice-to-text messages, etc. We need an effective approach to documents and want to start a public conversation about these issues. We believe that effective solutions should be based on: Declarative Markup; AI sympathetic to “Small Data”; focus on company-specific documents; applying AI to documents as a whole; and solutions that do not disrupt existing workflows or require massive investment. The future is not about AI making human beings obsolete; the future is about AI making human beings and companies more productive, effective, and creative

Download Full-text

SCAP Composer

Proceedings of Balisage: The Markup Conference 2019 ◽

10.4242/balisagevol23.lubell01 ◽

2019 ◽

Cited By ~ 2

Author(s):

Joshua Lubell

Keyword(s):

Data Stream ◽

Extensible Markup Language ◽

Markup Language ◽

Software Application ◽

Incremental Approach ◽

Source Data ◽

Extensible Markup ◽

Element Type

The Security Content Automation Protocol (SCAP) schema for source data stream collections standardizes the requirements for packaging Extensible Markup Language (XML) security content into bundles for easy deployment. SCAP bundles must be self-contained such that each bundle contains all necessary information without external references, and reversible such that XML components are unmodified when unbundled and re-bundled into new collections. These requirements (along with the need for very long, globally unique identifiers) make authoring the content and bundling a challenge. SCAP Composer, a software application that uses a Darwin Information Typing Architecture (DITA) specialized element type for source data stream collections, makes the authoring process easier. SCAP Composer takes an incremental approach to aiding SCAP content authors: it helps only with creating source data stream collections; it does not offer any help with creating the XML resources encapsulated in a data stream collection. SCAP Composer is implemented using the DITA Open Toolkit and can be used with any DITA authoring software that includes the Toolkit, or with a standalone Toolkit.

Download Full-text

Rules for the Rulemakers

Proceedings of Balisage: The Markup Conference 2019 ◽

10.4242/balisagevol23.beck01 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jeffrey Beck

Keyword(s):

Working Group ◽

Best Practice ◽

Guidance Document ◽

Practice Recommendations ◽

The Way

Maximal flexibility of rules, or ease of reuse — choose one. The tighter the rules, the more consistent documents will be and the easier it will be to reuse them, but only if the rules are reasonable enough to be adopted. (If all the data creators ignore the rules, reuse doesn’t get easier.) JATS4R (JATS for Reuse) is a NISO working group devoted to optimizing the reusability of scholarly content by developing best-practice recommendations for tagging content in JATS XML. The group has devoted particular attention to the flexibility/reuse tradeoff for rules on attribute use and controlled values, and we eventually decided that we needed some rules for ourselves, on how to write rules for attributes in our recommendations. In the process of developing our guidance document for writing rules for attribute values in our recommendations, we learned (or at least articulated) some things along the way.

Download Full-text

Aparecium

Proceedings of Balisage: The Markup Conference 2019 ◽

10.4242/balisagevol23.sperberg-mcqueen01 ◽

2019 ◽

Cited By ~ 2

Author(s):

C. M. Sperberg-McQueen

Keyword(s):

Xml Data ◽

Technical Issues

Aparecium is an XQuery / XSLT library for reading non-XML data as XML, under the control of an “invisible XML” grammar describing the structure of the input. The use of the library is illustrated with an application, and some technical issues in the development of the library are discussed.

Download Full-text

Thinking, wishing, saying

Proceedings of Balisage: The Markup Conference 2019 ◽

10.4242/balisagevol23.sperberg-mcqueen02 ◽

2019 ◽

Author(s):

C. M. Sperberg-McQueen

Can we have rules for our documents we cannot write down in a schema language? If a conformance requirement is not mechanically checkable, is it a conformance requirement? If a rule is not testable, is it a rule?

Download Full-text

Proceedings of Balisage: The Markup Conference 2019
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Mulberry Technologies, Inc.

Using BITS for conference paper conversion

Do we really want to see markup?

Beyond OmniMark

XProc 3.0

Accessibility: Not Just a Good Idea

We Created Document Dysfunction

SCAP Composer

Rules for the Rulemakers

Aparecium

Thinking, wishing, saying

Export Citation Format

Proceedings of Balisage: The Markup Conference 2019Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Mulberry Technologies, Inc.

Using BITS for conference paper conversion

Do we really want to see markup?

Beyond OmniMark

XProc 3.0

Accessibility: Not Just a Good Idea

We Created Document Dysfunction

SCAP Composer

Rules for the Rulemakers

Aparecium

Thinking, wishing, saying

Proceedings of Balisage: The Markup Conference 2019
Latest Publications