String Comparison in XSLT with tan:diff()
Classical models of string comparison have been difficult to implement in XSLT, in part because those models are designed for imperative, stateful programming. In this article I introduce tan:diff(), an XSLT function built upon a different approach to string comparison, one more conducive to a declarative, stateless language. tan:diff() is efficient and fast, even on pairs of very long strings (100K to 1M characters), in part because of its staggered-sample approach, in part because of its stategies for optimizing enormous strings (> 1M characters). Its results are of optimal quality: the function normally returns a minimal diff (shortest edit script). As an open-source function, tan:diff() enables developers to incorporate robust text comparison directly into XML applications.