Simulations and difficult problems
Abstract An authorship attribution investigation ideally begins with a well-defined set of possible authors and an adequate number of firmly attributed roughly contemporaneous long texts in the same genre by those authors. Many significant or intriguing problems, however, suffer from deficiencies or limitations that reduce the effectiveness or validity of some kinds of analysis and make others impossible. These problematic situations can be approached by creating simulations that are designed to overcome or mitigate the difficulties of the problems. The results of the simulations can be used to suggest at least tentative solutions. Here, simulations are used to investigate four difficult problems. One involves fewer and shorter texts than would be ideal–texts that are also chronologically earlier than the known texts by the target author. The second involves too small a number of well attributed texts by the authors in question, and initial uncertainty about the genres of the texts, the number of authors involved, and their genders. The third is a tricky case of co-authorship with only relatively vague and uncertain evidence about the nature and extent of each author’s contribution; here simulations with sections of well-attributed texts by the two authors are used to test Rolling Classify. The fourth addresses the sparsity of well-attributed and confidently-dated Early Modern plays, using simulations to evaluate Brian Vickers’ rare n-gram approach to the attribution of such plays.