Power Law (Zipf's Law)

Author(s):  
Dov Greenbaum
Keyword(s):  
2002 ◽  
Vol 05 (01) ◽  
pp. 1-6 ◽  
Author(s):  
RAMON FERRER i CANCHO ◽  
RICARD V. SOLÉ

Random-text models have been proposed as an explanation for the power law relationship between word frequency and rank, the so-called Zipf's law. They are generally regarded as null hypotheses rather than models in the strict sense. In this context, recent theories of language emergence and evolution assume this law as a priori information with no need of explanation. Here, random texts and real texts are compared through (a) the so-called lexical spectrum and (b) the distribution of words having the same length. It is shown that real texts fill the lexical spectrum much more efficiently and regardless of the word length, suggesting that the meaningfulness of Zipf's law is high.


2020 ◽  
Vol 24 ◽  
pp. 275-293
Author(s):  
Aristides V. Doumas ◽  
Vassilis G. Papanicolaou

The origin of power-law behavior (also known variously as Zipf’s law) has been a topic of debate in the scientific community for more than a century. Power laws appear widely in physics, biology, earth and planetary sciences, economics and finance, computer science, demography and the social sciences. In a highly cited article, Mark Newman [Contemp. Phys. 46 (2005) 323–351] reviewed some of the empirical evidence for the existence of power-law forms, however underscored that even though many distributions do not follow a power law, quite often many of the quantities that scientists measure are close to a Zipf law, and hence are of importance. In this paper we engage a variant of Zipf’s law with a general urn problem. A collector wishes to collect m complete sets of N distinct coupons. The draws from the population are considered to be independent and identically distributed with replacement, and the probability that a type-j coupon is drawn is denoted by pj, j = 1, 2, …, N. Let Tm(N) the number of trials needed for this problem. We present the asymptotics for the expectation (five terms plus an error), the second rising moment (six terms plus an error), and the variance of Tm(N) (leading term) as N →∞, when pj = aj / ∑j=2N+1aj, where aj = (ln j)−p, p > 0. Moreover, we prove that Tm(N) (appropriately normalized) converges in distribution to a Gumbel random variable. These “log-Zipf” classes of coupon probabilities are not covered by the existing literature and the present paper comes to fill this gap. In the spirit of a recent paper of ours [ESAIM: PS 20 (2016) 367–399] we enlarge the classes for which the Dixie cup problem is solved w.r.t. its moments, variance, distribution.


2021 ◽  
Vol 145 ◽  
pp. 104324
Author(s):  
Juan C Quiroz ◽  
Liliana Laranjo ◽  
Catalin Tufanaru ◽  
Ahmet Baki Kocaballi ◽  
Dana Rezazadegan ◽  
...  

Fractals ◽  
2004 ◽  
Vol 12 (01) ◽  
pp. 49-53 ◽  
Author(s):  
TAISEI KAIZOJI ◽  
MASAHIDE NUKI

We show power-scaling behaviors for fluctuations in share volume, which no other studies have done so far. After analyzing a database of the daily transactions for all securities listed on the Tokyo Stock Exchange, we selected 1050 large companies that each had an unbroken series of daily trading activity from January 1975 to January 2002. We found that the cumulative distributions of daily fluctuations in share volumes can be well described by a power-law decay, and that the cumulative distributions for almost all of the companies can be characterized by an exponent within the stable Lévy domain 0 < α < 2. Furthermore, more than 35% of the cumulative distributions can be well approximated by Zipf's law, i.e. the cumulative distributions have an exponent close to unity.


Author(s):  
Dariusz Skotarek

Zipf’s Law states that within a given text the frequency of any word is inversely proportional to its rank in the frequency table of the words used in that text. It is a statistical regularity of a power law that occurs ubiquitously in language – so far every language that has been tested was found to display the Zipfian distribution. Toki Pona is an experimental artificial language spoken by hundreds of users. It is extremely minimalistic – its vocabulary consists of mere 120 words. A comparative statistical analysis of two parallel texts in French and Toki Pona showed that even a language of such scarce vocabulary adheres to Zipf’s Law just like natural languages.


Glottotheory ◽  
2019 ◽  
Vol 9 (2) ◽  
pp. 113-129
Author(s):  
Victor Davis

Abstract Heap’s Law https://dl.acm.org/citation.cfm?id=539986 Heaps, H S 1978 Information Retrieval: Computational and Theoretical Aspects (Academic Press). states that in a large enough text corpus, the number of types as a function of tokens grows as N = K{M^\beta } for some free parameters K, \beta . Much has been written http://iopscience.iop.org/article/10.1088/1367-2630/15/9/093033 Font-Clos, Francesc 2013 A scaling law beyond Zipf’s law and its relation to Heaps’ law (New Journal of Physics 15 093033)., http://iopscience.iop.org/article/10.1088/1367-2630/11/12/123015 Bernhardsson S, da Rocha L E C and Minnhagen P 2009 The meta book and size-dependent properties of written language (New Journal of Physics 11 123015)., http://iopscience.iop.org/article/10.1088/1742-5468/2011/07/P07013 Bernhardsson S, Ki Baek and Minnhagen 2011 A paradoxical property of the monkey book (Journal of Statistical Mechanics: Theory and Experiment, Volume 2011)., http://milicka.cz/kestazeni/type-token_relation.pdf Milička, Jiří 2009 Type-token & Hapax-token Relation: A Combinatorial Model (Glottotheory. International Journal of Theoretical Linguistics 2 (1), 99–110)., https://www.nature.com/articles/srep00943 Petersen, Alexander 2012 Languages cool as they expand: Allometric scaling and the decreasing need for new words (Scientific Reports volume 2, Article number: 943). about how this result and various generalizations can be derived from Zipf’s Law. http://dx.doi.org/10.1037/h0052442 Zipf, George 1949 Human behavior and the principle of least effort (Reading: Addison-Wesley). Here we derive from first principles a completely novel expression of the type-token curve and prove its superior accuracy on real text. This expression naturally generalizes to equally accurate estimates for counting hapaxes and higher n-legomena.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Giordano De Marzo ◽  
Andrea Gabrielli ◽  
Andrea Zaccaria ◽  
Luciano Pietronero

2021 ◽  
Vol 7 (s3) ◽  
Author(s):  
Matthew Stave ◽  
Ludger Paschen ◽  
François Pellegrino ◽  
Frank Seifart

Abstract Zipf’s Law of Abbreviation and Menzerath’s Law both make predictions about the length of linguistic units, based on corpus frequency and the length of the carrier unit. Each contributes to the efficiency of languages: for Zipf, units are more likely to be reduced when they are highly predictable, due to their frequency; for Menzerath, units are more likely to be reduced when there are more sub-units to contribute to the structural information of the carrier unit. However, it remains unclear how the two laws work together in determining unit length at a given level of linguistic structure. We examine this question regarding the length of morphemes in spoken corpora of nine typologically diverse languages drawn from the DoReCo corpus, showing that Zipf’s Law is a stronger predictor, but that the two laws interact with one another. We also explore how this is affected by specific typological characteristics, such as morphological complexity.


Sign in / Sign up

Export Citation Format

Share Document