scholarly journals Computational Stylistics for Authorship Attribution - based on early Modern Korean Novels

2015 ◽  
Vol null (170) ◽  
pp. 207-240
Author(s):  
김일환 ◽  
이도길
Author(s):  
David L Hoover

Abstract An authorship attribution investigation ideally begins with a well-defined set of possible authors and an adequate number of firmly attributed roughly contemporaneous long texts in the same genre by those authors. Many significant or intriguing problems, however, suffer from deficiencies or limitations that reduce the effectiveness or validity of some kinds of analysis and make others impossible. These problematic situations can be approached by creating simulations that are designed to overcome or mitigate the difficulties of the problems. The results of the simulations can be used to suggest at least tentative solutions. Here, simulations are used to investigate four difficult problems. One involves fewer and shorter texts than would be ideal–texts that are also chronologically earlier than the known texts by the target author. The second involves too small a number of well attributed texts by the authors in question, and initial uncertainty about the genres of the texts, the number of authors involved, and their genders. The third is a tricky case of co-authorship with only relatively vague and uncertain evidence about the nature and extent of each author’s contribution; here simulations with sections of well-attributed texts by the two authors are used to test Rolling Classify. The fourth addresses the sparsity of well-attributed and confidently-dated Early Modern plays, using simulations to evaluate Brian Vickers’ rare n-gram approach to the attribution of such plays.


2021 ◽  
Author(s):  
Mária Timári ◽  
Tímea Borbála Bajzát ◽  
Gábor Palkó

In the field of computational stylistics it is a widespread assumption that there exists a unique pattern of a person’s language use and this is a socalled authorial fingerprint (Baayen et al. 2002). Identifying the authorial fingerprint can become a base of using quantitative text similarity studies for authorship attribution. Although the metaphor of the “fingerprint” may give the false impression that this pattern can be read from the author’s texts in an objective way. The modelling of this “fingerprint” is in fact a creative digital humanities task which constructs a pattern that is based on a selection and combination of dozens of linguistic features that can be interpreted statistically and that can only be interpreted in comparison with other authorial texts. Considering the size of the corpora and the complexity of text characteristics and similarity calculations, the method cannot be done without the use of computer algorithms. There are already some softwares (e.g. R-Stylo, JGAAP, Websty) that apparently offer an accessible way to researchers for analyzing texts, (however these methods limit the process of searching for patterns), while on the other hand it is possible to implement the calculations using custom program codes. In our research we endeavour to find the most efficient measures in authorship attribution for Hungarian texts.


Sign in / Sign up

Export Citation Format

Share Document