zipf's law
Recently Published Documents


TOTAL DOCUMENTS

405
(FIVE YEARS 85)

H-INDEX

38
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Natalia Levshina

Zipf’s law of abbreviation, which posits a negative correlation between word frequency and length, is one of the most famous and robust cross-linguistic generalizations. At the same time, it has been shown that contextual informativity (average surprisal given previous context) can be more strongly correlated with word length, although this tendency is not observed consistently, depending on several methodological choices. The present study, which examines a more diverse sample of languages than in the previous studies (Arabic, Finnish, Hungarian, Indonesian, Russian, Spanish and Turkish), reveals intriguing cross-linguistic differences, which can be explained by typological properties of the languages. I use large web-based corpora from the Leipzig Corpora Collection to estimate word lengths in UTF-8 characters, as well as word frequency, informativity given previous word and informativity given next word, applying different methods of bigrams processing. The results show consistent cross-linguistic differences in the size of correlations between word length and the corpus-based measures. I argue that these differences can be explained by the properties of noun phrases in a language, most importantly, the order of heads and modifiers and their relative morphological complexity, as well as by orthographic conventions.


2021 ◽  
Author(s):  
Babak Ravandi ◽  
Valentina Concu

Abstract Hierarchies are the backbones of complex systems and their analysis allows for a deeper understanding of their structure and how they evolve. We consider languages to be also complex adaptive systems. Hence, we analyzed the hierarchical organization of historical syntactic networks from German that were created from a corpus of texts from the 11th to 17th centuries. We tracked the emergence of syntactic structures in these networks and mapped them to specific communicative needs. We named these emerging structures communicative hierarchies. We hypothesise that the communicative needs of speakers are the organizational force of syntax. We propose that the emergence of these multiple communicative hierarchies is what shapes syntax, and that these hierarchies are the prerequisite to the Zipf's law. The emergence of communicative hierarchies indicates that the objective of language evolution is not only to increase the efficiency of transferring information. Language is also evolving to increase our capacity to communicate more sophisticated abstractions as we advance as a species.


REGIONOLOGY ◽  
2021 ◽  
Vol 29 (3) ◽  
pp. 642-665
Author(s):  
Irina A. Sekushina

Introduction. In modern Economics, one of the most common and simplest methods of analyzing the balance of urban settlement systems is to assess their compliance with Zipf's law or the rank–size rule. The basis of this pattern is the relationship between urban population and its place in the hierarchy of towns ranked in descending order of size. Based on the results of the study conducted, the article assesses the balance of the urban settlement system of the European North Russia, as one of its regions, by analyzing its compliance with Zipf’s law. Materials and Methods. The official data from the Federal State Statistics Service on the population of towns in the European North of Russia for 1959, 1989 and 2019 were used as materials of the study. The method of constructing a linear regression between the logarithm of the actual population and the logarithm of the rank of the town was used to verify Zipf's law for the urban network of the region in a certain period. In order to substantiate the conclusions drawn, an analysis of the dynamics of the number of towns and the share of the population living in them was carried out. The monographic method, as well as the methods of tabular and graphical data visualization, was used to interpret the results of the calculations. Results. Based on the analysis of data on the application of the rank–size rule for the towns in the European North of Russia, it has been found that Zipf’s law was not fully observed in any time period, which indicates the imbalance of the existing urban settlement system. In the period from 1959 to 2019, there was an increase in the concentration of the population in the major cities of the region. The imbalance is also caused by the growing number of small towns with a population that does not correspond to the optimal value according to Zipf's law. Discussion and Conclusion. Based on the calculations, the author has come to the conclusion that the cities of Arkhangelsk and Cherepovets have the potential for growth, as well as some others with a population of up to 100 thousand people. The practical significance of the study lies in the possibility of using the results obtained to prognosticate the population of towns in the European North of Russia when planning the location of production facilities, as well as transport and social infrastructure in the region.


2021 ◽  
Vol 127 (12) ◽  
Author(s):  
Onofrio Mazzarisi ◽  
Amanda de Azevedo-Lopes ◽  
Jeferson J. Arenzon ◽  
Federico Corberi
Keyword(s):  

2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Keith A. Burghardt ◽  
Zihao He ◽  
Allon G. Percus ◽  
Kristina Lerman

AbstractResearch institutions provide the infrastructure for scientific discovery, yet their role in the production of knowledge is not well characterized. To address this gap, we analyze interactions of researchers within and between institutions from millions of scientific papers. Our analysis reveals that collaborations densify as each institution grows, but at different rates (heterogeneous densification). We also find that the number of institutions scales with the number of researchers as a power law (Heaps’ law) and institution sizes approximate Zipf’s law. These patterns can be reproduced by a simple model in which researchers are preferentially hired by large institutions, while new institutions complimentarily generate more new institutions. Finally, new researchers form triadic closures with collaborators. This model reveals an economy of scale in research: larger institutions grow faster and amplify collaborations. Our work deepens the understanding of emergent behavior in research institutions and their role in facilitating collaborations.


2021 ◽  
Author(s):  
Jonathan Wren ◽  
Constantin Georgescu

Abstract Although citations are used as a quantifiable, objective metric of academic influence, references could be added to a paper to inflate the perceived influence of a body of research. This reference list manipulation (RLM) could take place during the peer-review process, or prior to it. Surveys have estimated how many people may have been affected by coercive RLM at one time or another, but it is not known how many authors engage in RLM, nor to what degree. By examining a subset of active, highly published authors (n = 20,803) in PubMed, we find the frequency of non-self citations (NSC) to one author coming from one paper approximates Zipf’s law, permitting the task to be approached statistically. Framed as an anomaly detection problem, higher confidence is gained the more outlier status is correlated across dimensions relative to non-outliers. We find the NSC Gini Index correlates highly with anomalous patterns across multiple RLM-related distributions. Between 81 (FDR < 0.05) and 231 (FDR < 0.10) authors are outliers on the curve, suggestive of chronic, repeated RLM. Approximately 16% of all authors may have engaged in RLM to some degree. Authors who use 18% or more of their references for self-citation are significantly more likely to have NSC Gini distortions, suggesting a potential willingness to coerce others to cite them.


2021 ◽  
Vol 106 ◽  
pp. 105460
Author(s):  
Xiangdong Sun ◽  
Ouyang Yuan ◽  
Zhao Xu ◽  
Yanhui Yin ◽  
Qian Liu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document