Merging Multi-Version Texts: a Generic Solution to the Overlap Problem
Multi-Version Documents or MVDs, as described in Schmidt and Colomb (Schm09), provide a simple format for representing overlapping structures in digital text. They permit the reuse of existing technologies, such as XML, to encode the content of individual versions, while allowing overlapping hierarchies (separate, partial or conditional) and textual variation (insertions, deletions, alternatives and transpositions) to exist within the same document. Most desired operations on MVDs may be performed by simple algorithms in linear time. However, creating and editing MVDs is a much harder and more complex operation that resembles the multiple-sequence alignment problem in biology. The inclusion of the transposition operation into the alignment process makes this a hard problem, with no solutions known to be both optimal and practical. However, a suitable heuristic algorithm can be devised, based in part on the most recent biological alignment programs, whose time complexity is quadratic in the worst case, and is often much faster. The results are satisfactory both in terms of speed and alignment quality. This means that MVDs can be considered as a practical and editable format suitable for representing many cases of overlapping structure in digital text.