Speech tested for Zipfian fit using rigorous statistical techniques

Paul De Palma; Leon Antonio Garcia-Camargo; Jeb Kilfoyle; Mark Vandam; Joseph Stover

doi:10.3765/plsa.v6i1.4975

Speech tested for Zipfian fit using rigorous statistical techniques

Proceedings of the Linguistic Society of America ◽

10.3765/plsa.v6i1.4975 ◽

2021 ◽

Vol 6 (1) ◽

pp. 394

Author(s):

Paul De Palma ◽

Leon Antonio Garcia-Camargo ◽

Jeb Kilfoyle ◽

Mark Vandam ◽

Joseph Stover

Keyword(s):

Error Function ◽

Likelihood Estimation ◽

Zipf’S Law ◽

Statistical Techniques ◽

Power Law Distribution ◽

Zipf's Law ◽

Complementary Error Function ◽

Physics Literature ◽

Kolmogorov Smirnov ◽

Fitting In

Zipf’s law describes the relationship between the frequencies of words in a corpus and their rank. Its most basic form is a simple series, indicating that the frequency of a word is inverselyproportional to its rank:1/2, 1/3, 1/4,...The past two decades have seen the emergence of usage-based and cognitive approaches to language study. A key observation of these approaches, along with the importance of frequency, is that speech differs in substantial and structural ways from writing. Yet, except for a few older analyses performed on very small corpora, most studies of Zipf’s law have been done on written corpora. Further, a judgement of Zifianness in much of this work is based on loose and informal criteria. In fact, sophisticated statistical techniques have been developed for curve fitting in recent years in the mathematics and physics literature. These include the use of the Kolmogorov-Smirnov statistic, along with maximum likelihood estimation to generate p-values and the use of the complementary error function for normal distributions. The latter helps determine if a corpus, failing a Zipfian fit, might be better described by another distribution. In this paper, we will:Show that three corpora of recorded speech follow a power law distribution using rigorous statis- tical techniques: Buckeye, Santa Barbara, MiCaseDescribe preliminary results showing that the techniques outlined in this paper may be useful in the diagnoses of those conditions that can include disordered speech.Explain how to do the analyses described in this paper.Explain how to download and use the R/Python code we have written and packaged as the Zipf Tool Kit

Get full-text (via PubEx)

Estimating the Cost of Car Warranty in Indonesia using the Gertsbakh-Kordonsky Method

InPrime: Indonesian Journal of Pure and Applied Mathematics ◽

10.15408/inprime.v2i1.14556 ◽

2020 ◽

Vol 2 (1) ◽

pp. 33-40

Author(s):

Anggis Sagitarisman ◽

Aceng Komarudin Mutaqin

Keyword(s):

Maximum Likelihood ◽

Likelihood Estimation ◽

Type A ◽

Claim Data ◽

Likelihood Method ◽

Selling Price ◽

One Dimensional ◽

Warranty Period ◽

Kolmogorov Smirnov ◽

Warranty Costs

AbstractCar manufacturers in Indonesia need to determine reasonable warranty costs that do not burden companies or consumers. Several statistical approaches have been developed to analyze warranty costs. One of them is the Gertsbakh-Kordonsky method which reduces the two-dimensional warranty problem to one dimensional. In this research, we apply the Gertsbakh-Kordonsky method to estimate the warranty cost for car type A in XYZ company. The one-dimensional data will be tested using the Kolmogorov-Smirnov to determine its distribution and the parameter of distribution will be estimated using the maximum likelihood method. There are three approaches to estimate the parameter of the distribution. The difference between these three approaches is in the calculation of mileage for units that do not claim within the warranty period. In the application, we use claim data for the car type A. The data exploration indicates the failure of car type A is mostly due to the age of the vehicle. The Kolmogorov-Smirnov shows that the most appropriate distribution for the claim data is the three-parameter Weibull. Meanwhile, the estimated using the Gertsbakh-Kordonsky method shows that the warranty costs for car type A are around 3.54% from the selling price of this car unit without warranty i.e. around Rp. 4,248,000 per unit.Keywords: warranty costs; the Gertsbakh-Kordonsky method; maximum likelihood estimation; Kolmogorov-Smirnov test. AbstrakPerusahaan produsen mobil di Indonesia perlu menentukan biaya garansi yang bersifat wajar tidak memberatkan perusahaan maupun konsumen. Beberapa pendekatan statistik telah dikembangkan untuk menganalisis biaya garansi. Salah satunya adalah metode Gertsbakh-Kordonsky yang mereduksi masalah garansi dua dimensi menjadi satu dimensi. Pada penelitian ini, metode Gertsbakh-Kordonsky akan digunakan untuk mengestimasi biaya garansi untuk mobil tipe A pada perusahaan XYZ. Data satu dimensi hasil reduksi diuji kecocokan distribusinya menggunakan uji kecocokan Kolmogorov-Smirnov dan taksiran parameter distribusinya menggunakan metode penaksir kemungkinan maksimum. Ada tiga pendekatan yang digunakan untuk menaksir parameter distribusi. Perbedaan dari ketiga pendekatan tersebut terletak pada perhitungan jarak tempuh untuk unit yang tidak melakukan klaim dalam periode garansi. Sebagai bahan aplikasi, kami menggunakan data klaim unit mobil tipe A. Hasil eksplorasi data menunjukkan bahwa kegagalan mobil tipe A lebih banyak disebabkan karena faktor usia kendaraan. Hasil uji kecocokan distribusi untuk data hasil reduksi menunjukkan bahwa distribusi yang cocok adalah distribusi Weibull 3-parameter. Sementara itu, hasil perhitungan taksiran biaya garansi menunjukan bahwa taksiran biaya garansi untuk unit mobil tipe A sekitar 3,54% dari harga jual unit mobil tipe A tanpa garansi, atau sekitar Rp. 4.248.000,- per unit.Kata Kunci: biaya garansi; metode Gertsbakh-Kordonsky; penaksiran kemungkinan maksimum; uji Kolmogorov-Smirnov.

Get full-text (via PubEx)

Types, Tokens, and Hapaxes: A New Heap’s Law

Glottotheory ◽

10.1515/glot-2018-0014 ◽

2019 ◽

Vol 9 (2) ◽

pp. 113-129

Author(s):

Victor Davis

Keyword(s):

First Principles ◽

Scaling Law ◽

Written Language ◽

Zipf’S Law ◽

Academic Press ◽

Zipf's Law ◽

New Words ◽

Size Dependent ◽

Link Type ◽

Combinatorial Model

Abstract Heap’s Law https://dl.acm.org/citation.cfm?id=539986 Heaps, H S 1978 Information Retrieval: Computational and Theoretical Aspects (Academic Press). states that in a large enough text corpus, the number of types as a function of tokens grows as N = K{M^\beta } for some free parameters K, \beta . Much has been written http://iopscience.iop.org/article/10.1088/1367-2630/15/9/093033 Font-Clos, Francesc 2013 A scaling law beyond Zipf’s law and its relation to Heaps’ law (New Journal of Physics 15 093033)., http://iopscience.iop.org/article/10.1088/1367-2630/11/12/123015 Bernhardsson S, da Rocha L E C and Minnhagen P 2009 The meta book and size-dependent properties of written language (New Journal of Physics 11 123015)., http://iopscience.iop.org/article/10.1088/1742-5468/2011/07/P07013 Bernhardsson S, Ki Baek and Minnhagen 2011 A paradoxical property of the monkey book (Journal of Statistical Mechanics: Theory and Experiment, Volume 2011)., http://milicka.cz/kestazeni/type-token_relation.pdf Milička, Jiří 2009 Type-token & Hapax-token Relation: A Combinatorial Model (Glottotheory. International Journal of Theoretical Linguistics 2 (1), 99–110)., https://www.nature.com/articles/srep00943 Petersen, Alexander 2012 Languages cool as they expand: Allometric scaling and the decreasing need for new words (Scientific Reports volume 2, Article number: 943). about how this result and various generalizations can be derived from Zipf’s Law. http://dx.doi.org/10.1037/h0052442 Zipf, George 1949 Human behavior and the principle of least effort (Reading: Addison-Wesley). Here we derive from first principles a completely novel expression of the type-token curve and prove its superior accuracy on real text. This expression naturally generalizes to equally accurate estimates for counting hapaxes and higher n-legomena.

Get full-text (via PubEx)

Algorithm 181: complementary error function—large X

Communications of the ACM ◽

10.1145/366604.366638 ◽

1963 ◽

Vol 6 (6) ◽

pp. 315

Author(s):

Henry C. Thacher

Keyword(s):

Error Function ◽

Complementary Error Function

Get full-text (via PubEx)

The Distribution of City Sizes in Turkey: A Failure of Zipf’s Law Due to Concavity

Regional Science Policy & Practice ◽

10.1111/rsp3.12449 ◽

2021 ◽

Author(s):

Hasan Engin Duran ◽

Andrzej Cieślik

Keyword(s):

Zipf’S Law ◽

Zipf's Law

Get full-text (via PubEx)

Dynamical approach to Zipf's law

Physical Review Research ◽

10.1103/physrevresearch.3.013084 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Giordano De Marzo ◽

Andrea Gabrielli ◽

Andrea Zaccaria ◽

Luciano Pietronero

Keyword(s):

Zipf’S Law ◽

Zipf's Law ◽

Dynamical Approach

Get full-text (via PubEx)

Optimization of morpheme length: a cross-linguistic assessment of Zipf’s and Menzerath’s laws

Linguistics Vanguard ◽

10.1515/lingvan-2019-0076 ◽

2021 ◽

Vol 7 (s3) ◽

Author(s):

Matthew Stave ◽

Ludger Paschen ◽

François Pellegrino ◽

Frank Seifart

Keyword(s):

Structural Information ◽

Unit Length ◽

Zipf’S Law ◽

Zipf's Law ◽

Linguistic Structure ◽

Morphological Complexity ◽

Linguistic Assessment ◽

Linguistic Units

Abstract Zipf’s Law of Abbreviation and Menzerath’s Law both make predictions about the length of linguistic units, based on corpus frequency and the length of the carrier unit. Each contributes to the efficiency of languages: for Zipf, units are more likely to be reduced when they are highly predictable, due to their frequency; for Menzerath, units are more likely to be reduced when there are more sub-units to contribute to the structural information of the carrier unit. However, it remains unclear how the two laws work together in determining unit length at a given level of linguistic structure. We examine this question regarding the length of morphemes in spoken corpora of nine typologically diverse languages drawn from the DoReCo corpus, showing that Zipf’s Law is a stronger predictor, but that the two laws interact with one another. We also explore how this is affected by specific typological characteristics, such as morphological complexity.

Get full-text (via PubEx)

Analysis of Zipf's law: An index approach

Information Processing & Management ◽

10.1016/0306-4573(87)90002-1 ◽

1987 ◽

Vol 23 (3) ◽

pp. 171-182 ◽

Cited By ~ 15

Author(s):

Ye-Sho Chen ◽

Ferdinand F. Leimkuhler

Keyword(s):

Zipf’S Law ◽

Zipf's Law ◽

Index Approach

Get full-text (via PubEx)

Emergence of Zipf’s law in the evolution of communication

Physical Review E ◽

10.1103/physreve.83.036115 ◽

2011 ◽

Vol 83 (3) ◽

Cited By ~ 31

Author(s):

Bernat Corominas-Murtra ◽

Jordi Fortuny ◽

Ricard V. Solé

Keyword(s):

Zipf’S Law ◽

Zipf's Law ◽

Evolution Of Communication

Get full-text (via PubEx)

A Time-Series Audit of Zipf's Law as a Measure of Terrane Endowment and Maturity in Mineral Exploration

Economic Geology ◽

10.2113/econgeo.106.2.241 ◽

2011 ◽

Vol 106 (2) ◽

pp. 241-259 ◽

Cited By ~ 20

Author(s):

P. Guj ◽

M. Fallon ◽

T. C. McCuaig ◽

R. Fagan

Keyword(s):

Time Series ◽

Mineral Exploration ◽

Zipf’S Law ◽

Zipf's Law

Get full-text (via PubEx)

Leveraging Zipf’s Law to Analyze Statistical Distribution of Chinese Corpus

2021 IEEE International Conference on Software Engineering and Artificial Intelligence (SEAI) ◽

10.1109/seai52285.2021.9477550 ◽

2021 ◽

Author(s):

Qing Lei ◽

Haifeng Li ◽

Rongbin Wei

Keyword(s):

Statistical Distribution ◽

Zipf’S Law ◽

Zipf's Law

Get full-text (via PubEx)