Automatic Identification of Spelling Variation in Historical Texts
Languages in earlier stages of development differ from their modern analogues, reflecting syntactic, semantic and morphological changes over time. The study of these and other phenomena is the major concern of historical linguistics. The development of literacy and advances in technology mean that human language has often been reserved in physical form. Whilst these artefacts will eventually include video and sound recordings, the current life blood of historical linguistics is text. The written word is the de facto source of evidence for earlier stages of languages and “the first-order witnesses to the more distance linguistic past are written texts.” (Lass, 1997) This dissertation proposes and evaluates a method for identifying spelling variants in historical documents.