As is typical for many society publishers, OSA–The Optical Society, has both a journal and a conference program. Integrating both journal articles and conference papers within a single data source opens up a pathway to conduct business intelligence analysis over the entire corpus of the published research material, which can benefit both programs and advance the society’s mission.
In 2017, having successfully completed a project to convert almost 100 years of its journal legacy material to JATS XML, OSA decided to convert its conference content as well, tag it in a JATS-compatible way, and to combine both content segments in a single MarkLogic database. While it has been well-accepted that JATS and BITS cover the markup needs for journal and book content, respectively, it is less clear what Tag Set would be most suitable for tagging conference proceedings.
Even though we thought “we had seen it all” in converting journal content, in the course of the project we learned that handling conference metadata and journal metadata presents very different challenges. In this paper, we share our experience with using BITS for marking up individual conference papers and how our business decisions shaped how we structure the XML. We demonstrate that because BITS was explicitly designed to enable the construction of books composed of units that could be part of many collections, the BITS metadata model is well-suited for representing conference paper’s nested collections, both event- and publication-related. To ensure data quality, we have built workflows, designed XML tools (e.g., Tag Subset, Schematron), and instituted visual QC procedures that allowed us to achieve our objective. We conclude our paper with lessons learned from this project and new opportunities its successful completion has opened up.