scholarly journals Zipf’s Law, unbounded complexity and open-ended evolution

2018 ◽  
Vol 15 (149) ◽  
pp. 20180395 ◽  
Author(s):  
Bernat Corominas-Murtra ◽  
Luís F. Seoane ◽  
Ricard Solé

A major problem for evolutionary theory is understanding the so-called open-ended nature of evolutionary change, from its definition to its origins. Open-ended evolution (OEE) refers to the unbounded increase in complexity that seems to characterize evolution on multiple scales. This property seems to be a characteristic feature of biological and technological evolution and is strongly tied to the generative potential associated with combinatorics, which allows the system to grow and expand their available state spaces. Interestingly, many complex systems presumably displaying OEE, from language to proteins, share a common statistical property: the presence of Zipf’s Law. Given an inventory of basic items (such as words or protein domains) required to build more complex structures (sentences or proteins) Zipf’s Law tells us that most of these elements are rare whereas a few of them are extremely common. Using algorithmic information theory, in this paper we provide a fundamental definition for open-endedness, which can be understood as postulates . Its statistical counterpart, based on standard Shannon information theory, has the structure of a variational problem which is shown to lead to Zipf’s Law as the expected consequence of an evolutionary process displaying OEE. We further explore the problem of information conservation through an OEE process and we conclude that statistical information (standard Shannon information) is not conserved, resulting in the paradoxical situation in which the increase of information content has the effect of erasing itself. We prove that this paradox is solved if we consider non-statistical forms of information. This last result implies that standard information theory may not be a suitable theoretical framework to explore the persistence and increase of the information content in OEE systems.

Author(s):  
Yizhen Wu ◽  
Mingyue Jiang ◽  
Zhijian Chang ◽  
Yuanqing Li ◽  
Kaifang Shi

Currently, whether the urban development in China satisfies Zipf’s law across different scales is still unclear. Thus, this study attempted to explore whether China’s urban development satisfies Zipf’s law across different scales from the National Polar-Orbiting Partnership’s Visible Infrared Imaging Radiometer Suite (NPP-VIIRS) nighttime light data. First, the NPP-VIIRS data were corrected. Then, based on the Zipf law model, the corrected NPP-VIIRS data were used to evaluate China’s urban development at multiple scales. The results showed that the corrected NPP-VIIRS data could effectively reflect the state of urban development in China. Additionally, the Zipf index (q) values, which could express the degree of urban development, decreased from 2012 to 2018 overall in all provinces, prefectures, and counties. Since the value of q was relatively close to 1 with an R2 value > 0.70, the development of the provinces and prefectures was close to the ideal Zipf’s law state. In all counties, q > 1 with an R2 value > 0.70, which showed that the primate county had a relatively stronger monopoly capacity. When the value of q < 1 with a continuous declination in the top 2000 counties, the top 250 prefectures, and the top 20 provinces in equilibrium, there was little difference in the scale of development at the multiscale level with an R2 > 0.90. The results enriched our understanding of urban development in terms of Zipf’s law and had valuable implications for relevant decision-makers and stakeholders.


2020 ◽  
Vol 6 (1) ◽  
pp. 114
Author(s):  
Saeid Maadani ◽  
Gholam Reza Mohtashami Borzadaran ◽  
Abdol Hamid Rezaei Roknabadi

The variance of Shannon information related to the random variable \(X\), which is called varentropy, is a measurement that indicates, how the information content of \(X\) is scattered around its entropy and explains its various applications in information theory, computer sciences, and statistics. In this paper, we introduce a new generalized varentropy based on the Tsallis entropy and also obtain some results and bounds for it. We compare the varentropy with the Tsallis varentropy. Moreover, we explain the Tsallis varentropy of the order statistics and analyse this concept in residual (past) lifetime distributions and then introduce two new classes of distributions by them.


2013 ◽  
Vol 10 (2) ◽  
pp. 2029-2065 ◽  
Author(s):  
S. V. Weijs ◽  
N. van de Giesen ◽  
M. B. Parlange

Abstract. When inferring models from hydrological data or calibrating hydrological models, we might be interested in the information content of those data to quantify how much can potentially be learned from them. In this work we take a perspective from (algorithmic) information theory (AIT) to discuss some underlying issues regarding this question. In the information-theoretical framework, there is a strong link between information content and data compression. We exploit this by using data compression performance as a time series analysis tool and highlight the analogy to information content, prediction, and learning (understanding is compression). The analysis is performed on time series of a set of catchments, searching for the mechanisms behind compressibility. We discuss both the deeper foundation from algorithmic information theory, some practical results and the inherent difficulties in answering the question: "How much information is contained in this data?". The conclusion is that the answer to this question can only be given once the following counter-questions have been answered: (1) Information about which unknown quantities? (2) What is your current state of knowledge/beliefs about those quantities? Quantifying information content of hydrological data is closely linked to the question of separating aleatoric and epistemic uncertainty and quantifying maximum possible model performance, as addressed in current hydrological literature. The AIT perspective teaches us that it is impossible to answer this question objectively, without specifying prior beliefs. These beliefs are related to the maximum complexity one is willing to accept as a law and what is considered as random.


2013 ◽  
Vol 17 (8) ◽  
pp. 3171-3187 ◽  
Author(s):  
S. V. Weijs ◽  
N. van de Giesen ◽  
M. B. Parlange

Abstract. When inferring models from hydrological data or calibrating hydrological models, we are interested in the information content of those data to quantify how much can potentially be learned from them. In this work we take a perspective from (algorithmic) information theory, (A)IT, to discuss some underlying issues regarding this question. In the information-theoretical framework, there is a strong link between information content and data compression. We exploit this by using data compression performance as a time series analysis tool and highlight the analogy to information content, prediction and learning (understanding is compression). The analysis is performed on time series of a set of catchments. We discuss both the deeper foundation from algorithmic information theory, some practical results and the inherent difficulties in answering the following question: "How much information is contained in this data set?". The conclusion is that the answer to this question can only be given once the following counter-questions have been answered: (1) information about which unknown quantities? and (2) what is your current state of knowledge/beliefs about those quantities? Quantifying information content of hydrological data is closely linked to the question of separating aleatoric and epistemic uncertainty and quantifying maximum possible model performance, as addressed in the current hydrological literature. The AIT perspective teaches us that it is impossible to answer this question objectively without specifying prior beliefs.


Glottotheory ◽  
2019 ◽  
Vol 9 (2) ◽  
pp. 113-129
Author(s):  
Victor Davis

Abstract Heap’s Law https://dl.acm.org/citation.cfm?id=539986 Heaps, H S 1978 Information Retrieval: Computational and Theoretical Aspects (Academic Press). states that in a large enough text corpus, the number of types as a function of tokens grows as N = K{M^\beta } for some free parameters K, \beta . Much has been written http://iopscience.iop.org/article/10.1088/1367-2630/15/9/093033 Font-Clos, Francesc 2013 A scaling law beyond Zipf’s law and its relation to Heaps’ law (New Journal of Physics 15 093033)., http://iopscience.iop.org/article/10.1088/1367-2630/11/12/123015 Bernhardsson S, da Rocha L E C and Minnhagen P 2009 The meta book and size-dependent properties of written language (New Journal of Physics 11 123015)., http://iopscience.iop.org/article/10.1088/1742-5468/2011/07/P07013 Bernhardsson S, Ki Baek and Minnhagen 2011 A paradoxical property of the monkey book (Journal of Statistical Mechanics: Theory and Experiment, Volume 2011)., http://milicka.cz/kestazeni/type-token_relation.pdf Milička, Jiří 2009 Type-token & Hapax-token Relation: A Combinatorial Model (Glottotheory. International Journal of Theoretical Linguistics 2 (1), 99–110)., https://www.nature.com/articles/srep00943 Petersen, Alexander 2012 Languages cool as they expand: Allometric scaling and the decreasing need for new words (Scientific Reports volume 2, Article number: 943). about how this result and various generalizations can be derived from Zipf’s Law. http://dx.doi.org/10.1037/h0052442 Zipf, George 1949 Human behavior and the principle of least effort (Reading: Addison-Wesley). Here we derive from first principles a completely novel expression of the type-token curve and prove its superior accuracy on real text. This expression naturally generalizes to equally accurate estimates for counting hapaxes and higher n-legomena.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Giordano De Marzo ◽  
Andrea Gabrielli ◽  
Andrea Zaccaria ◽  
Luciano Pietronero

2021 ◽  
Vol 7 (s3) ◽  
Author(s):  
Matthew Stave ◽  
Ludger Paschen ◽  
François Pellegrino ◽  
Frank Seifart

Abstract Zipf’s Law of Abbreviation and Menzerath’s Law both make predictions about the length of linguistic units, based on corpus frequency and the length of the carrier unit. Each contributes to the efficiency of languages: for Zipf, units are more likely to be reduced when they are highly predictable, due to their frequency; for Menzerath, units are more likely to be reduced when there are more sub-units to contribute to the structural information of the carrier unit. However, it remains unclear how the two laws work together in determining unit length at a given level of linguistic structure. We examine this question regarding the length of morphemes in spoken corpora of nine typologically diverse languages drawn from the DoReCo corpus, showing that Zipf’s Law is a stronger predictor, but that the two laws interact with one another. We also explore how this is affected by specific typological characteristics, such as morphological complexity.


1987 ◽  
Vol 23 (3) ◽  
pp. 171-182 ◽  
Author(s):  
Ye-Sho Chen ◽  
Ferdinand F. Leimkuhler

Sign in / Sign up

Export Citation Format

Share Document