In this paper we consider two sequence tagging tasks for medieval Latin:
part-of-speech tagging and lemmatization. These are both basic, yet
foundational preprocessing steps in applications such as text re-use detection.
Nevertheless, they are generally complicated by the considerable orthographic
variation which is typical of medieval Latin. In Digital Classics, these tasks
are traditionally solved in a (i) cascaded and (ii) lexicon-dependent fashion.
For example, a lexicon is used to generate all the potential lemma-tag pairs
for a token, and next, a context-aware PoS-tagger is used to select the most
appropriate tag-lemma pair. Apart from the problems with out-of-lexicon items,
error percolation is a major downside of such approaches. In this paper we
explore the possibility to elegantly solve these tasks using a single,
integrated approach. For this, we make use of a layered neural network
architecture from the field of deep representation learning.