scholarly journals Numerals in authorial Turkish-language texts and the stylometric analysis

2021 ◽  
Vol 270 ◽  
pp. 01038
Author(s):  
Andrei Zenkov ◽  
Eugene Zenkov ◽  
Miroslav Zenkov ◽  
Larisa Sazanova

Two approaches to the statistical analysis of texts are suggested, both based on the study of numerals occurrence in coherent texts. The first approach is related to the study of the frequency distribution of various leading digits of numerals occurring in the text. These frequencies are unequal: the digit 1 is strongly dominating; usually, the incidence of subsequent digits is monotonically decreasing. The frequencies of occurrence of the digit 1, as well as, to a lesser extent, the digits 2 and 3, are usually a characteristic author’s style feature, manifested in all (sufficiently long) texts of any author. This approach is convenient for testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach is the extension of the first one and requires the study of the frequency distribution of numerals themselves (not their leading digits). The approach yields non-trivial information about the author, stylistic and genre peculiarities of the texts and is suited for the advanced discourse analysis. This paper deals with the application of the second approach to the literary texts in Turkish. We have analysed almost the whole corpus of works by are illustrated by examples of computer analysis of the literary texts by O. Pamuk and Y. Kemal – two of Turkey’s most prominent novelists. The hierarchical cluster analysis based on the occurrence of numerals in the texts by Pamuk and Kemal shows the author, genre, and chronology differences of numerals usage in the literary texts of these authors.

Stats ◽  
2021 ◽  
Vol 4 (4) ◽  
pp. 1051-1068
Author(s):  
Andrei V. Zenkov

We suggest two approaches to the statistical analysis of texts, both based on the study of numerals occurrence in literary texts. The first approach is related to Benford’s Law and the analysis of the frequency distribution of various leading digits of numerals contained in the text. In coherent literary texts, the share of the leading digit 1 is even larger than prescribed by Benford’s Law and can reach 50 percent. The frequencies of occurrence of the digit 1, as well as, to a lesser extent, the digits 2 and 3, are usually a characteristic the author’s style feature, manifested in all (sufficiently long) literary texts of any author. This approach is convenient for testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach is the extension of the first one and requires the study of the frequency distribution of numerals themselves (not their leading digits). The approach yields non-trivial information about the author, stylistic and genre peculiarities of the texts and is suited for the advanced stylometric analysis. The proposed approaches are illustrated by examples of computer analysis of the literary texts in English and Russian.


2021 ◽  
Vol 93 ◽  
pp. 03026
Author(s):  
Andrei Zenkov ◽  
Eugene Zenkov ◽  
Ansgar Belke

Two approaches to the statistical analysis of texts are suggested, both based on the study of numerals occurring in literary texts. The first approach is related to the study of the frequency distribution of various leading digits of numerals occurring in the text. This approach is convenient for testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach requires the study of the frequencies of numerals themselves. The approach yields information about the author, stylistic and genre peculiarities of the texts and is suited for advanced study of authorial texts. The hypothesis that I. Ilf and E. Petrov are fake authors of novels "The Twelve Chairs" and "The Little Golden Calf", and they were ghosted by M. Bulgakov, is checked. The frequency distribution of numerals, as well as its cluster analysis, do not confirm this hypothesis.


2017 ◽  
Vol 15 (1) ◽  
pp. 253-287 ◽  
Author(s):  
Georgios Ioannou

Abstract This is a corpus-based study of the development of the verb pleróo in Ancient Greek, originally meaning fill, from the 6th c. bce in Classical Greek, up to the end of the 3rd c. bce in Hellenistic Koiné. It implements a hierarchical cluster analysis and a multiple correspondence analysis of the sum of the attested instances of pleróo of that period, divided by century. It explores the gains following a syncretism between two methodological strands: earlier introspective analyses postulating variant construals over intuitively grasped schematic configurations such as image schemas, and strictly inductive methods based on statistical analyses of correlations between co-occurring formal and semantic features. Thus, it examines the relevance of the container image-schema to the architecture of the schematic construction corresponding to the prototypical and historically preceding sense of pleróo, fill. Consequently, it observes how shifts in the featural configurations detected through statistical analysis, leading to the emergence of new senses, correspond to successive shifts on the perspectival salience of elements in the schematic construction of the verb.


2009 ◽  
Vol 102 (3) ◽  
pp. 1911-1920 ◽  
Author(s):  
Bruno B. Averbeck ◽  
Alexandra Battaglia-Mayer ◽  
Carla Guglielmo ◽  
Roberto Caminiti

Considerable information has been gathered on the anatomical connectivity within the parieto-frontal network of the primate brain. To examine the statistical regularities in this connectivity, we carried out hierarchical cluster analysis and found statistically significant clusters of areas: four in the parietal and six in the frontal lobe. Clusters were based on patterns of inputs from all cortical areas. Both parietal and frontal clusters were composed of sets of spatially contiguous architectonic areas. The four parietal clusters were composed of sets of anterior (somatosensory), dorsal, inferior, and medio-lateral parietal cortical areas. The six frontal clusters were composed of sets of dorsal premotor, ventral premotor, primary motor, cingulate motor, and dorsal and ventral prefrontal cortical areas. Furthermore, connectivity between frontal and parietal clusters was topographic and reciprocal. Thus we found substantial statistical structure and organization in the parieto-frontal network that gives a simplified but accurate description of this system.


2018 ◽  
Vol 16 (2) ◽  
pp. 348-398
Author(s):  
Francisco Gonzálvez-García ◽  
Christopher S. Butler

Abstract This article builds on the work reported in Butler and Gonzálvez-García (2014), in which 16 functional and/or cognitive/constructionist theories were compared on the basis of questionnaires completed by experts and a reading of the literature on each approach. The aim is to extend this work to cover Valency Theory (VT henceforth), arguably the most widely used approach to the study of German syntax. We first report on a statistical analysis (correlation, multidimensional scaling and hierarchical cluster analysis) of the data from the questionnaires completed by two VT experts, in relation to those completed by experts in other approaches. We then present an analysis of each item in the questionnaire in relation to VT, leading to a positive or negative evaluation for each questionnaire item. The results are again analysed statistically. The picture that emerges is of a theory which, though distinctive, has clear relationships with a broad group of cognitively-oriented approaches.


2017 ◽  
Vol 9 (1) ◽  
pp. 137 ◽  
Author(s):  
José Manuel Castellano ◽  
Efstathios Stefos ◽  
Lisa Gaye Williams Goodrich

The objective of the study is to examine the educational level and the social profile of the indigenous people of Ecuador by means of a descriptive and multidimensional statistical analysis of this sector of the Ecuadorian population, based on data from the National Survey of Employment, Unemployment and Underemployment from 2015. The descriptive analysis shows the frequency and percentages of the variables used in the investigation, while the multidimensional statistical analysis is used in order to show the principal and most important criteria of differentiation and classification among the groups of people investigated. These methods involve a factorial analysis of multiple correspondences which demostrates the criteria of differentiation and a hierarchical cluster analysis to define groups of people according to their common traits.


2017 ◽  
Vol 10 (6) ◽  
pp. 51
Author(s):  
Olga Elizabeth Minchala Buri ◽  
Efstathios Stefos

The objective of this study is to examine the social profile of students who are enrolled in Basic General Education in Ecuador. Both a descriptive and multidimensional statistical analysis was carried out based on the data provided by the National Survey of Employment, Unemployment and Underemployment in 2015. The descriptive analysis shows the frequency and percentages of variables used in the investigation, and the multidimensional statistical analysis demonstrates the principal and more important criteria of differentiation and classification among the clusters of students who were investigated. These methods involve factorial analysis of multiple correspondences which demonstrate criteria of differentiation and a hierarchical cluster analysis to define clusters of students according to their common traits.


Author(s):  
Miranda G. Capra

Software and product designers use card sorting to understand item groups and relationships. In the usability community, a common method of formal statistical analysis for open card sort data is hierarchical cluster analysis, which results in a tree of the items sorted into distinct, nested clusters. Hierarchical cluster analysis is appropriate for highly structured settings, like software menus. However, many situations call for softer clusters, such as designing websites where multiple pages link to the same target page. Factor analysis summarizes the categories created in card sorts and generates clusters that can overlap. This paper explains how to prepare card sort data for statistical analysis, describes the results of factor analysis and how to interpret them, and discusses when hierarchical cluster analysis and factor analysis are appropriate.


Sign in / Sign up

Export Citation Format

Share Document