A French Word-Frequency Distribution Curve

Language ◽  
1944 ◽  
Vol 20 (4) ◽  
pp. 231 ◽  
Author(s):  
J. Richard Reid
2016 ◽  
Vol 55 (1) ◽  
pp. 61-69
Author(s):  
Neringa Bružaitė ◽  
Tomas Rekašius

The paper examines Lithuanian texts of different authors and genres. The main points ofinterest – the number of words, the number of different words and word frequencies. Structural type distributionand Zipf’s law are applied for describing the frequency distribution of words in the text. It is obvious that thelexical diversity of any text can be defined by different words that are used in the text, also called vocabulary.It is shown that the information contained in a reduced vocabulary is enough for dividing the texts analyzedin this article into groups by genre and author using a hierarchical clustering method. In this case, distancesbetween clusters are measured using the Jaccard distance measure, and clusters are aggregated using the Wardmethod.


2012 ◽  
Vol 6 (8) ◽  
pp. 1161-1169
Author(s):  
I.A. Sajid ◽  
M.M. Ahmed ◽  
S.G. Ziavras

Nature ◽  
1956 ◽  
Vol 178 (4545) ◽  
pp. 1308-1308 ◽  
Author(s):  
A. F. PARKER-RHODES ◽  
T. JOYCE

Entropy ◽  
2020 ◽  
Vol 22 (2) ◽  
pp. 179 ◽  
Author(s):  
Álvaro Corral ◽  
Montserrat García del Muro

The word-frequency distribution provides the fundamental building blocks that generate discourse in natural language. It is well known, from empirical evidence, that the word-frequency distribution of almost any text is described by Zipf’s law, at least approximately. Following Stephens and Bialek (2010), we interpret the frequency of any word as arising from the interaction potentials between its constituent letters. Indeed, Jaynes’ maximum-entropy principle, with the constrains given by every empirical two-letter marginal distribution, leads to a Boltzmann distribution for word probabilities, with an energy-like function given by the sum of the all-to-all pairwise (two-letter) potentials. The so-called improved iterative-scaling algorithm allows us finding the potentials from the empirical two-letter marginals. We considerably extend Stephens and Bialek’s results, applying this formalism to words with length of up to six letters from the English subset of the recently created Standardized Project Gutenberg Corpus. We find that the model is able to reproduce Zipf’s law, but with some limitations: the general Zipf’s power-law regime is obtained, but the probability of individual words shows considerable scattering. In this way, a pure statistical-physics framework is used to describe the probabilities of words. As a by-product, we find that both the empirical two-letter marginal distributions and the interaction-potential distributions follow well-defined statistical laws.


The Lancet ◽  
1966 ◽  
Vol 288 (7456) ◽  
pp. 185-187 ◽  
Author(s):  
Ronald Finn ◽  
P.O. Jones ◽  
M.C.K. Tweedie ◽  
SybilM. Hall ◽  
OliveF. Dinsdale ◽  
...  

1952 ◽  
Vol 25 (2) ◽  
pp. 315-320
Author(s):  
M. van den Tempel

Abstract 1. The use of visible light in determining the average particle size or the particle-size distribution in Hevea latex renders the results meaningless, as only about 40 per cent of the particles have a diameter of more than 0.2 µ. 2. In considering the size-frequency distribution curve as determined by Lucas, it could be assumed that, actually, the number of particles having a diameter of less than 0.12 µ might be very much larger than has been indicated by him. The agreement with the determination of the number of particles by van Gils, however, may be taken as evidence in favor of the correctness of the curve as given by Lucas. 3. An expression has been given which describes the size-frequency distribution curve, as found by Lucas, with a high degree of accuracy. It is necessary to assume that no particles larger than 5.2 µ are present in the latex. 4. Attention is directed to the considerable difference existing between the various average diameters, caused by the strongly asymmetrical shape of the size-frequency distribution curve. This work forms part of the program of fundamental research on latex problems undertaken by the Research Department of the Rubber-Stichting, Delft, under the management of H. C. J. de Decker.


Sign in / Sign up

Export Citation Format

Share Document