The Electronic Text Corpus of Sumerian Literature
With invaluable help from and in close co-operation with colleagues from around the world, the Electronic Text Corpus of Sumerian Literature project at the University of Oxford has compiled, lemmatised and made publicly available a large body of Sumerian literature. Building a corpus of literary compositions originally written on clay tablets in the cuneiform script, and dating back nearly four thousand years, poses special challenges, not least with regard to mark-up and automatic processing of data. Some of these challenges are discussed in this paper together with issues relating to the fact that Sumerian is a language isolate and lacks resources we take for granted when working with other languages, modern or extinct, such as a standardised sign list and a comprehensive dictionary.