Computational stylistics and authorship attribution: what it measures and why it works
The topic of this thesis is the computational methods for measurement of authorialstyle and algorithms of authorial attribution.The first aim of the thesis was an attempt at a quantifiable separation of various layers of authorial style (in the present case the lexical and grammatical layers) in order to estimate their influence on the results of a chosen method of authorial attribution. Within the scope of these studies I compared the distance, so called Burrows's Delta, between a pair of English novels by two chosen authors and automatically generated texts, whose statistical distributions of parts of speech were borrowed from one of the authors, while the vocabulary from the other one; additionally, in the computatrificial texts I left the sets of words of the first author if they belonged to a particular part of speech. Such procedure allowed to create a hybrid text, which was attributed to the first author, even though the majority of lexical items were that of the second author.The second aim was to identify the influences of the style and language of the original on the style of the translation. This part of research involved among others adapting Polish and English part of speech tag sets to form a common translatorial tag set. Beside making a couple of simple observations concerning the distributions and coocurrences of parts of speech in the two languages, I managed to determine some features of the selected translatorial corpus, which lie on the fringes of what seems a norm for Polish.The third aim was testing the accuracy of state of the art (unsupervised) clustering methods for automatic grouping of texts according to their author. The results show that the methods recognise authorship worse than the known supervised machine learning methods.In the thesis I made use of corpora totalling around 550 digitised English language novels and 100 Polish ones, as well as a parallel corpus of 39 novels of a single English author together with their translations by a single Polish translator. The research conducted involved utilising existing part of speech taggers (both for English and Polish), authorship attribution programmes, and programmes for graph clustering.