scholarly journals On the Mistaken Use of the Chi-Square Test in Benford’s Law

Stats ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 419-453
Author(s):  
Alex Ely Kossovsky

Benford’s Law predicts that the first significant digit on the leftmost side of numbers in real-life data is distributed between all possible 1 to 9 digits approximately as in LOG(1 + 1/digit), so that low digits occur much more frequently than high digits in the first place. Typically researchers, data analysts, and statisticians, rush to apply the chi-square test in order to verify compliance or deviation from this statistical law. In almost all cases of real-life data this approach is mistaken and without mathematical-statistics basis, yet it had become a dogma or rather an impulsive ritual in the field of Benford’s Law to apply the chi-square test for whatever data set the researcher is considering, regardless of its true applicability. The mistaken use of the chi-square test has led to much confusion and many errors, and has done a lot in general to undermine trust and confidence in the whole discipline of Benford’s Law. This article is an attempt to correct course and bring rationality and order to a field which had demonstrated harmony and consistency in all of its results, manifestations, and explanations. The first research question of this article demonstrates that real-life data sets typically do not arise from random and independent selections of data points from some larger universe of parental data as the chi-square approach supposes, and this conclusion is arrived at by examining how several real-life data sets are formed and obtained. The second research question demonstrates that the chi-square approach is actually all about the reasonableness of the random selection process and the Benford status of that parental universe of data and not solely about the Benford status of the data set under consideration, since the focus of the chi-square test is exclusively on whether the entire process of data selection was probable or too rare. In addition, a comparison of the chi-square statistic with the Sum of Squared Deviations (SSD) measure of distance from Benford is explored in this article, pitting one measure against the other, and concluding with a strong preference for the SSD measure.

Author(s):  
Arno Berger ◽  
T. P. Hill

This chapter embarks on a brief discussion of the mathematical theory of Benford's law. This law is the observation that in many collections of numbers, be they mathematical tables, real-life data, or combinations thereof, the leading significant digits are not uniformly distributed, as might be expected, but are heavily skewed toward the smaller digits. More specifically, Benford's law states that the significant digits in many data sets follow a very particular logarithmic distribution. The chapter lays out the basic theory of Benford's law before highlighting its more specific components: the significant digits and the significand (function), as well as the Benford property and its four characterizations. Finally, the chapter presents the basic theory of Benford's law in the context of deterministic and random processes.


Author(s):  
Muhammad H. Tahir ◽  
Muhammad Adnan Hussain ◽  
Gauss Cordeiro ◽  
Mahmoud El-Morshedy ◽  
Mohammed S. Eliwa

For bounded unit interval, we propose a new Kumaraswamy generalized (G) family of distributions from a new generator which could be an alternate to the Kumaraswamy-G family proposed earlier by Cordeiro and de-Castro in 2011. This new generator can also be used to develop alternate G-classes such as beta-G, McDonald-G, Topp-Leone-G, Marshall-Olkin-G and Transmuted-G for bounded unit interval. Some mathematical properties of this new family are obtained and maximum likelihood method is used for estimating the family parameters. We investigate the properties of one special model called a new Kumaraswamy-Weibull (NKwW) distribution. Parameter estimation is dealt and maximum likelihood estimators are assessed through simulation study. Two real life data sets are analyzed to illustrate the importance and flexibility of this distribution. In fact, this model outperforms some generalized Weibull models such as the Kumaraswamy-Weibull, McDonald-Weibull, beta-Weibull, exponentiated-generalized Weibull, gamma-Weibull, odd log-logistic-Weibull, Marshall-Olkin-Weibull, transmuted-Weibull, exponentiated-Weibull and Weibull distributions when applied to these data sets. The bivariate extension of the family is proposed and the estimation of parameters is given. The usefulness of the bivariate NKwW model is illustrated empirically by means of a real-life data set.


Mathematics ◽  
2020 ◽  
Vol 8 (11) ◽  
pp. 1989
Author(s):  
Muhammad H. Tahir ◽  
Muhammad Adnan Hussain ◽  
Gauss M. Cordeiro ◽  
M. El-Morshedy ◽  
M. S. Eliwa

For bounded unit interval, we propose a new Kumaraswamy generalized (G) family of distributions through a new generator which could be an alternate to the Kumaraswamy-G family proposed earlier by Cordeiro and de Castro in 2011. This new generator can also be used to develop alternate G-classes such as beta-G, McDonald-G, Topp-Leone-G, Marshall-Olkin-G, and Transmuted-G for bounded unit interval. Some mathematical properties of this new family are obtained and maximum likelihood method is used for the estimation of G-family parameters. We investigate the properties of one special model called the new Kumaraswamy-Weibull (NKwW) distribution. Parameters of NKwW model are estimated by using maximum likelihood method, and the performance of these estimators are assessed through simulation study. Two real life data sets are analyzed to illustrate the importance and flexibility of the proposed model. In fact, this model outperforms some generalized Weibull models such as the Kumaraswamy-Weibull, McDonald-Weibull, beta-Weibull, exponentiated-generalized Weibull, gamma-Weibull, odd log-logistic-Weibull, Marshall-Olkin-Weibull, transmuted-Weibull and exponentiated-Weibull distributions when applied to these data sets. The bivariate extension of the family is also proposed, and the estimation of parameters is dealt. The usefulness of the bivariate NKwW model is illustrated empirically by means of a real-life data set.


2021 ◽  
Vol 19 (1) ◽  
pp. 2-20
Author(s):  
Piyush Kant Rai ◽  
Alka Singh ◽  
Muhammad Qasim

This article introduces calibration estimators under different distance measures based on two auxiliary variables in stratified sampling. The theory of the calibration estimator is presented. The calibrated weights based on different distance functions are also derived. A simulation study has been carried out to judge the performance of the proposed estimators based on the minimum relative root mean squared error criterion. A real-life data set is also used to confirm the supremacy of the proposed method.


2020 ◽  
Vol 27 (4) ◽  
pp. 1349-1359
Author(s):  
Fernando Antonio Ignacio González

Purpose This paper aims to detect anomalous data in income reports of Argentina, including personal income – from a sample of households – and statements of public officials. Design/methodology/approach A widely known technique in forensic accounting – such as Benford’s Law – is used. The Chi-square test and the absolute mean deviation are used for verification. The databases consulted include the income declared by households in the Permanent Household Survey – for the 2003-2017 period – and the capital declarations of high-ranking public officials – for the period 1999-2017. Findings The results suggest that income reported in the Encuesta Permanente de Hogares do not follow a Benford´s distribution, and the degree of conformity with this decreases significantly between 2007 and 2015 – coincident with the intervention period of the Instituto Nacional de Estadísticas y Censos. Patrimonial statements of public officials present an acceptable level of compliance with Benford’s law, especially among those of the legislative branch (in more than 90% of cases) although to a lesser extent among officials of the executive branch. Practical implications The results suggest that income reports from the Permanent Household Survey, for the period 2007-2015, should be used with reservations because of their possible manipulation. Originality/value During the intervention of the official statistics institute in Argentina (2007-2015), the idea of lack of credibility of its reports has been disseminated. To date, however, there is no empirical evidence to support it related to income.


Author(s):  
Mohamed Ibrahim Mohamed ◽  
Laba Handique ◽  
Subrata Chakraborty ◽  
Nadeem Shafique Butt ◽  
Haitham M. Yousof

In this article an attempt is made to introduce a new extension of the Fréchet model called the Xgamma Fréchet model. Some of its properties are derived. The estimation of the parameters via different estimation methods are discussed. The performances of the proposed estimation methods are investigated through simulations as well as real life data sets. The potentiality of the proposed model is established through modelling of two real life data sets. The results have shown clear preference for the proposed model compared to several know competing ones.


2020 ◽  
Vol 18 (2) ◽  
pp. 2-13
Author(s):  
Oyebayo Ridwan Olaniran ◽  
Mohd Asrul Affendi Abdullah

A new Bayesian estimation procedure for extended cox model with time varying covariate was presented. The prior was determined using bootstrapping technique within the framework of parametric empirical Bayes. The efficiency of the proposed method was observed using Monte Carlo simulation of extended Cox model with time varying covariates under varying scenarios. Validity of the proposed method was also ascertained using real life data set of Stanford heart transplant. Comparison of the proposed method with its competitor established appreciable supremacy of the method.


The use of calibration estimation techniques in survey sampling have been found to improve the precision of estimators. This paper adopts the calibration approach with the assumption that the population median of the auxiliary variable is known to obtain a more efficient ratio-type estimator in estimating population median in stratified sampling. Conditions necessary for efficiency comparison have been obtained which show that the proposed estimator will always perform better than the existing asymptotically unbiased separate estimators in stratified random sampling. Numerical evaluations have been carried out through simulation and real-life data to compliment the theoretical claims. Results from the simulation study carried out under three distributional assumptions, namely the chi square, lognormal and Cauchy distributions with different sample settings showed that the new estimator provided better estimate of the median with greater gain in efficiency. In addition, result from the real-life data further supports the superiority of the proposed estimator over the existing ones considered in this study.


Author(s):  
Adebisi Ade Ogunde ◽  
Gbenga Adelekan Olalude ◽  
Donatus Osaretin Omosigho

In this paper we introduced Gompertz Gumbel II (GG II) distribution which generalizes the Gumbel II distribution. The new distribution is a flexible exponential type distribution which can be used in modeling real life data with varying degree of asymmetry. Unlike the Gumbel II distribution which exhibits a monotone decreasing failure rate, the new distribution is useful for modeling unimodal (Bathtub-shaped) failure rates which sometimes characterised the real life data. Structural properties of the new distribution namely, density function, hazard function, moments, quantile function, moment generating function, orders statistics, Stochastic Ordering, Renyi entropy were obtained. For the main formulas related to our model, we present numerical studies that illustrate the practicality of computational implementation using statistical software. We also present a Monte Carlo simulation study to evaluate the performance of the maximum likelihood estimators for the GGTT model. Three life data sets were used for applications in order to illustrate the flexibility of the new model.


Sign in / Sign up

Export Citation Format

Share Document