scholarly journals A data transformation process for using Benford’s Law with bounded data

2021 ◽  
Vol 3 ◽  
pp. 29
Author(s):  
Daniel McCarville

Benford’s Law is an empirical observation about the frequency of digits in a variety of naturally occurring data sets. Auditors and forensic scientists have used Benford’s Law to detect erroneous data in accounting and legal usage. One well-known limitation is that Benford’s Law fails when data have clear minimum and maximum values. Many kinds of education data, including assessment scores, typically include hard maximums and therefore do not meet the parametric assumptions of Benford’s Law. This paper implements a transformation procedure which allows for assessment data to be compared to Benford’s Law. As a case study, a data quality assessment of oral language scores from the Early Childhood Longitudinal Study, Kindergarten (ECLS-K) study is used and higher risk data segments detected. The same method could be used to evaluate other concerns, such as test fraud, or other bounded datasets.

Author(s):  
Lawrence Leemis

This chapter switches from the traditional analysis of Benford's law using data sets to a search for probability distributions that obey Benford's law. It begins by briefly discussing the origins of Benford's law through the independent efforts of Simon Newcomb (1835–1909) and Frank Benford, Jr. (1883–1948), both of whom made their discoveries through empirical data. Although Benford's law applies to a wide variety of data sets, none of the popular parametric distributions, such as the exponential and normal distributions, agree exactly with Benford's law. The chapter thus highlights the failures of several of these well-known probability distributions in conforming to Benford's law, considers what types of probability distributions might produce data that obey Benford's law, and looks at some of the geometry associated with these probability distributions.


2004 ◽  
Vol 33 (1) ◽  
pp. 229-246 ◽  
Author(s):  
Christina Lynn Geyer ◽  
Patricia Pepple Williamson

2021 ◽  
Vol 16 (1) ◽  
pp. 73-79
Author(s):  
Vitor Hugo Moreau

Reporting of daily new cases and deaths on COVID-19 is one of the main tools to understand and menage the pandemic. However, governments and health authorities worldwide present divergent procedures while registering and reporting their data. Most of the bias in those procedures are influenced by economic and political pressures and may lead to intentional or unintentional data corruption, what can mask crucial information. Benford’s law is a statistical phenomenon, extensively used to detect data corruption in large data sets. Here, we used the Benford’s law to screen and detect inconsistencies in data on daily new cases of COVID-19 reported by 80 countries. Data from 26 countries display severe nonconformity to the Benford’s law (p< 0.01), what may suggest data corruption or manipulation.


2009 ◽  
Vol 28 (2) ◽  
pp. 305-324 ◽  
Author(s):  
Mark J. Nigrini ◽  
Steven J. Miller

SUMMARY: Auditors are required to use analytical procedures to identify the existence of unusual transactions, events, and trends. Benford's Law gives the expected patterns of the digits in numerical data, and has been advocated as a test for the authenticity and reliability of transaction level accounting data. This paper describes a new second-order test that calculates the digit frequencies of the differences between the ordered (ranked) values in a data set. These digit frequencies approximate the frequencies of Benford's Law for most data sets. The second-order test is applied to four sets of transactional data. The second-order test detected errors in data downloads, rounded data, data generated by statistical procedures, and the inaccurate ordering of data. The test can be applied to any data set and nonconformity usually signals an unusual issue related to data integrity that might not have been easily detectable using traditional analytical procedures.


Author(s):  
Susan D'Agostino

“Act natural, because of Benford’s Law” explains how and why large data sets generated as a result of human behavior concerning health records, population counts, tax returns, stock prices, national debts, election data, and more, have numbers whose first digits are unevenly distributed, with Benford’s Law offering percentages. When an individual tampers with a naturally generated data set, they often introduce fake numbers whose first digits are (more or less) evenly distributed from one to nine. Often, a subsequent investigation reveals that someone has tampered with the data set. Mathematics students and enthusiasts are encouraged to act natural so as to avoid looking like a fraudulent data set that does not observe Benford’s Law. At the chapter’s end, readers may check their understanding by working on a problem. A solution is provided.


Author(s):  
David Hoyle

This chapter focuses on the occurrence of Benford's law within the natural sciences, emphasizing that Benford's law is to be expected within many scientific data sets. This is a consequence of the reasonable assumption that a particular scientific process is scale invariant, or nearly scale invariant. The chapter reviews previous work from many fields showing a number of data sets that conform to Benford's law. In each case the underlying scale invariance, or mechanism that leads to scale invariance, is identified. Having established that Benford's law is to be expected for many data sets in the natural sciences, the second half of the chapter highlights generic potential applications of Benford's law. Finally, direct applications of Benford's law are highlighted, whereby the Benford distribution is used in a constructive way rather than simply assessing an already existing data set.


1988 ◽  
Vol 62 (3) ◽  
pp. 967-971 ◽  
Author(s):  
Theodore P. Hill

To what extent do individuals “absorb” the empirical regularities of their environment and reflect them in behavior? A widely-accepted empirical observation called the First Digit Phenomenon or Benford's Law says that in collections of miscellaneous tables of data (such as physical constants, almanacs, newspaper articles, etc.), the first significant digit is much more likely to be a low number than a high number. In this study, an analysis of the frequencies of the first and second digits of “random” six-digit numbers guessed by people suggests that people's responses share some of the properties of Benford's Law: first digit 1 occurs much more frequently than expected; first digit 8 or 9 occurs much less frequently; and the second digits are much more uniformly distributed than the first.


2015 ◽  
Vol 14 (6) ◽  
pp. 829 ◽  
Author(s):  
Stephan Kienle

Leading digits often follow a distribution described by Newcomb (1881) and Benford (1938). We apply this phenomenon known as Benford’s Law on cover assets provided by issuers of German covered bonds. The main finding of the empirical analysis is that leading digits of these assets seem to follow the Benford distribution. Standard statistical evidence, however, might be misleading due to effects of large data sets. Consequently, the present paper also provides an example of how to deal with large data sets when a Benford distribution is assumed. 


2016 ◽  
Vol 23 (4) ◽  
pp. 798-805 ◽  
Author(s):  
Pedro Carreira ◽  
Carlos Gomes da Silva

Purpose The purpose of this paper is to propose a methodology to estimate the number of records that were omitted from a data set, and to assess its effectiveness. Design/methodology/approach The procedure to estimate the number of records that were omitted from a data set is based on Benford’s law. Empirical experiments are performed to illustrate the application of the procedure. In detail, two simulated Benford-conforming data sets are distorted and the procedure is then used to recover the original patterns of the data sets. Findings The effectiveness of the procedure seems to increase with the degree of conformity of the original data set with Benford’s law. Practical implications This work can be useful in auditing and economic crime detection, namely in identifying tax evasion. Originality/value This work is the first to propose Benford’s law as a tool to detect data evasion.


Sign in / Sign up

Export Citation Format

Share Document