A Theory of Word-Frequency Distribution

Nature ◽

10.1038/1781308a0 ◽

1956 ◽

Vol 178 (4545) ◽

pp. 1308-1308 ◽

Author(s):

A. F. PARKER-RHODES ◽

T. JOYCE

Keyword(s):

Word Frequency ◽

Frequency Distribution ◽

Word Frequency Distribution

Download Full-text

Statistical Analysis of Word Frequency Distribution in Lithuanian Texts of Different Genres

Lietuvos statistikos darbai ◽

10.15388/ljs.2016.13868 ◽

2016 ◽

Vol 55 (1) ◽

pp. 61-69

Author(s):

Neringa Bružaitė ◽

Tomas Rekašius

Keyword(s):

Statistical Analysis ◽

Word Frequency ◽

Frequency Distribution ◽

Hierarchical Clustering ◽

Distance Measure ◽

Structural Type ◽

Clustering Method ◽

Jaccard Distance ◽

Word Frequencies ◽

Word Frequency Distribution

The paper examines Lithuanian texts of different authors and genres. The main points ofinterest – the number of words, the number of different words and word frequencies. Structural type distributionand Zipf’s law are applied for describing the frequency distribution of words in the text. It is obvious that thelexical diversity of any text can be defined by different words that are used in the text, also called vocabulary.It is shown that the information contained in a reduced vocabulary is enough for dividing the texts analyzedin this article into groups by genre and author using a hierarchical clustering method. In this case, distancesbetween clusters are measured using the Jaccard distance measure, and clusters are aggregated using the Wardmethod.

Download Full-text

From Boltzmann to Zipf through Shannon and Jaynes

Entropy ◽

10.3390/e22020179 ◽

2020 ◽

Vol 22 (2) ◽

pp. 179 ◽

Author(s):

Álvaro Corral ◽

Montserrat García del Muro

Keyword(s):

Word Frequency ◽

Frequency Distribution ◽

Statistical Physics ◽

Maximum Entropy Principle ◽

Building Blocks ◽

Boltzmann Distribution ◽

Zipf’S Law ◽

Entropy Principle ◽

Word Frequency Distribution

The word-frequency distribution provides the fundamental building blocks that generate discourse in natural language. It is well known, from empirical evidence, that the word-frequency distribution of almost any text is described by Zipf’s law, at least approximately. Following Stephens and Bialek (2010), we interpret the frequency of any word as arising from the interaction potentials between its constituent letters. Indeed, Jaynes’ maximum-entropy principle, with the constrains given by every empirical two-letter marginal distribution, leads to a Boltzmann distribution for word probabilities, with an energy-like function given by the sum of the all-to-all pairwise (two-letter) potentials. The so-called improved iterative-scaling algorithm allows us finding the potentials from the empirical two-letter marginals. We considerably extend Stephens and Bialek’s results, applying this formalism to words with length of up to six letters from the English subset of the recently created Standardized Project Gutenberg Corpus. We find that the model is able to reproduce Zipf’s law, but with some limitations: the general Zipf’s power-law regime is obtained, but the probability of individual words shows considerable scattering. In this way, a pure statistical-physics framework is used to describe the probabilities of words. As a by-product, we find that both the empirical two-letter marginal distributions and the interaction-potential distributions follow well-defined statistical laws.

Download Full-text

Diversity of vocabulary and the harmonic series law of word-frequency distribution

The Psychological Record ◽

10.1007/bf03393224 ◽

1938 ◽

Vol 2 (16) ◽

pp. 379-386 ◽

Author(s):

J. B. Carroll

Keyword(s):

Word Frequency ◽

Frequency Distribution ◽

Harmonic Series ◽

Word Frequency Distribution

Download Full-text

Random texts exhibit Zipf's-law-like word frequency distribution

IEEE Transactions on Information Theory ◽

10.1109/18.165464 ◽

1992 ◽

Vol 38 (6) ◽

pp. 1842-1845 ◽

Author(s):

W. Li

Keyword(s):

Word Frequency ◽

Frequency Distribution ◽

Zipf’S Law ◽

Word Frequency Distribution

Download Full-text

A French Word-Frequency Distribution Curve

Language ◽

10.2307/410122 ◽

1944 ◽

Vol 20 (4) ◽

pp. 231 ◽

Author(s):

J. Richard Reid

Keyword(s):

Word Frequency ◽

Frequency Distribution ◽

Distribution Curve ◽

French Word ◽

Frequency Distribution Curve ◽

Word Frequency Distribution

Download Full-text

Analysis of Native and Non-native Speakers' English Compositions based on Word-frequency Distribution and Text Statistics

Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval - NLPIR 2019 ◽

10.1145/3342827.3342856 ◽

2019 ◽

Author(s):

Hajime Tsubaki

Keyword(s):

Word Frequency ◽

Frequency Distribution ◽

Native Speakers ◽

Word Frequency Distribution

Download Full-text

Parameter estimation for a word frequency distribution based on occupancy theory

Communication in Statistics- Theory and Methods ◽

10.1080/03610928608829161 ◽

1986 ◽

Vol 15 (3) ◽

pp. 935-949 ◽

Author(s):

H.S Sichel

Keyword(s):

Parameter Estimation ◽

Word Frequency ◽

Frequency Distribution ◽

Word Frequency Distribution

Download Full-text

The Small-World of ‘Le Petit Prince’: Revisiting the Word Frequency Distribution

Digital Scholarship in the Humanities ◽

10.1093/llc/fqw005 ◽

2016 ◽

pp. fqw005

Author(s):

Daniel Gamermann ◽

Carmen Moret-Tatay ◽

Esperanza Navarro-Pardo ◽

Pedro Fernandez de Córdoba Castellá

Keyword(s):

Word Frequency ◽

Frequency Distribution ◽

Small World ◽

Word Frequency Distribution

Download Full-text

An Improved TF-IDF algorithm based on word frequency distribution information and category distribution information

Proceedings of the 3rd International Conference on Intelligent Information Processing - ICIIP '18 ◽

10.1145/3232116.3232152 ◽

2018 ◽

Author(s):

Haoying Wu ◽

Na Yuan

Keyword(s):

Word Frequency ◽

Frequency Distribution ◽

Word Frequency Distribution

Download Full-text

Bayesian analysis of Word frequency distribution in context of Indian literature

Journal of Ultra Scientist of Physical Sciences Section A ◽

10.22147/jusps-a/300503 ◽

2018 ◽

Vol 30 (05) ◽

pp. 283-290

Author(s):

VASTOSHPATI SHASTRI ◽

◽

RAKESH RANJAN ◽

PRAVEEN KUMAR TRIPATHI ◽

S.K UPADHYAY, ◽

...

Keyword(s):

Bayesian Analysis ◽

Word Frequency ◽

Frequency Distribution ◽

Indian Literature ◽

Word Frequency Distribution

Download Full-text