digit frequencies
Recently Published Documents


TOTAL DOCUMENTS

19
(FIVE YEARS 8)

H-INDEX

4
(FIVE YEARS 1)

PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0260395
Author(s):  
Michael S. Bradshaw ◽  
Samuel H. Payne

Fraud is a pervasive problem and can occur as fabrication, falsification, plagiarism, or theft. The scientific community is not exempt from this universal problem and several studies have recently been caught manipulating or fabricating data. Current measures to prevent and deter scientific misconduct come in the form of the peer-review process and on-site clinical trial auditors. As recent advances in high-throughput omics technologies have moved biology into the realm of big-data, fraud detection methods must be updated for sophisticated computational fraud. In the financial sector, machine learning and digit-frequencies are successfully used to detect fraud. Drawing from these sources, we develop methods of fabrication detection in biomedical research and show that machine learning can be used to detect fraud in large-scale omic experiments. Using the gene copy-number data as input, machine learning models correctly predicted fraud with 58–100% accuracy. With digit frequency as input features, the models detected fraud with 82%-100% accuracy. All of the data and analysis scripts used in this project are available at https://github.com/MSBradshaw/FakeData.


Author(s):  
Charlene Kalle ◽  
Marta Maggioni

In this paper, we employ a random dynamical systems approach to study generalized Lüroth series expansions of numbers in the unit interval. We prove that for each [Formula: see text] with [Formula: see text] Lebesgue almost all numbers in [Formula: see text] have uncountably many universal generalized Lüroth series expansions with digits less than or equal to [Formula: see text], so expansions in which each possible block of digits occurs. In particular this means that Lebesgue almost all [Formula: see text] have uncountably many universal generalized Lüroth series expansions using finitely many digits only. For [Formula: see text] we show that typically the speed of convergence to an irrational number [Formula: see text] of the corresponding sequence of Lüroth approximants is equal to that of the standard Lüroth approximants. For other rational values of [Formula: see text] we use stationary measures to study the typical speed of convergence of the approximants and the digit frequencies.


2021 ◽  
Vol 1 (1) ◽  
pp. 50-60
Author(s):  
Edin Glogić ◽  
Zoran Jasak

Abstract Forensic accounting in scientific sense is the part of accounting that assumes the practice of scientific techniques and methods in conducting investigations and detecting criminal activities in financial statements, business statements and companies. One such tool in detecting anomalies in accounting records is the Benford’s Law, which gives the expected pattern of digit frequencies in numeric data sets according to their position in numbers. Because of this property, Benford’s law has become a significant forensic tool for the detection of anomalies, especially in financial business. One of the most important sources is account turnover data in the observed period, which has a debt and credit side. A classic way of analyzing these liabilities is to merge them and then look for a pattern of leading digits. In such approach, it is not possible to properly determine the source of anomalies, which are a guide to deeper analysis. For such purposes, a variant of the Hosmer-Lemeshow test is designed.


2020 ◽  
Vol 642 ◽  
pp. A205
Author(s):  
Jurjen de Jong ◽  
Jos de Bruijne ◽  
Joris De Ridder

Context. Benford’s law states that for scale- and base-invariant data sets covering a wide dynamic range, the distribution of the first significant digit is biased towards low values. This has been shown to be true for wildly different datasets, including financial, geographical, and atomic data. In astronomy, earlier work showed that Benford’s law also holds for distances estimated as the inverse of parallaxes from the ESA HIPPARCOS mission. Aims. We investigate whether Benford’s law still holds for the 1.3 billion parallaxes contained in the second data release of Gaia (Gaia DR2). In contrast to previous work, we also include negative parallaxes. We examine whether distance estimates computed using a Bayesian approach instead of parallax inversion still follow Benford’s law. Lastly, we investigate the use of Benford’s law as a validation tool for the zero-point of the Gaia parallaxes. Methods. We computed histograms of the observed most significant digit of the parallaxes and distances, and compared them with the predicted values from Benford’s law, as well as with theoretically expected histograms. The latter were derived from a simulated Gaia catalogue based on the Besançon galaxy model. Results. The observed parallaxes in Gaia DR2 indeed follow Benford’s law. Distances computed with the Bayesian approach of Bailer-Jones et al. (2018, AJ, 156, 58) no longer follow Benford’s law, although low-value ciphers are still favoured for the most significant digit. The prior that is used has a significant effect on the digit distribution. Using the simulated Gaia universe model snapshot, we demonstrate that the true distances underlying the Gaia catalogue are not expected to follow Benford’s law, essentially because the interplay between the luminosity function of the Milky Way and the mission selection function results in a bi-modal distance distribution, corresponding to nearby dwarfs in the Galactic disc and distant giants in the Galactic bulge. In conclusion, Gaia DR2 parallaxes only follow Benford’s Law as a result of observational errors. Finally, we show that a zero-point offset of the parallaxes derived by optimising the fit between the observed most-significant digit frequencies and Benford’s law leads to a value that is inconsistent with the value that is derived from quasars. The underlying reason is that such a fit primarily corrects for the difference in the number of positive and negative parallaxes, and can thus not be used to obtain a reliable zero-point.


2020 ◽  
Author(s):  
Michael Bradshaw ◽  
Samuel H Payne

Abstract Background: Fraud is a pervasive problem and can occur as fabrication, falsification, plagiarism or theft. The scientific community is not exempt from this universal problem and several studies have recently been caught manipulating or fabricating data. Current measures to prevent and deter scientific misconduct come in the form of the peer-review process and on-site clinical trial auditors. As recent advances in high-throughput omics technologies have moved biology into the realm of big-data, fraud detection methods must be updated for sophisticated computational fraud. In the financial sector, machine learning and digit-preference are successfully used to detect fraud. Results: Drawing from these sources, we develop methods of fabrication detection in biomedical research and show that machine learning can be used to detect fraud in large-scale omic experiments. Using the raw data as input, the best machine learning models correctly predicted fraud with 84-95% accuracy. With digit frequency as input features, the best models detected fraud with 98%-100% accuracy. All of the data and analysis scripts used in this project are available at https://github.com/MSBradshaw/FakeData . Conclusions: Using digit frequencies as a generalized representation of the data, multiple machine learning methods were able to identify fabricated data with near perfect accuracy.


2020 ◽  
Vol 162 (2) ◽  
pp. 403-418
Author(s):  
Y.-Q. Li
Keyword(s):  

2018 ◽  
Vol 40 (7) ◽  
pp. 1755-1787 ◽  
Author(s):  
SIMON BAKER

In this paper we study digit frequencies in the setting of expansions in non-integer bases, and self-affine sets with non-empty interior. Within expansions in non-integer bases we show that if $\unicode[STIX]{x1D6FD}\in (1,1.787\ldots )$ then every $x\in (0,1/(\unicode[STIX]{x1D6FD}-1))$ has a simply normal $\unicode[STIX]{x1D6FD}$-expansion. We also prove that if $\unicode[STIX]{x1D6FD}\in (1,(1+\sqrt{5})/2)$ then every $x\in (0,1/(\unicode[STIX]{x1D6FD}-1))$ has a $\unicode[STIX]{x1D6FD}$-expansion for which the digit frequency does not exist, and a $\unicode[STIX]{x1D6FD}$-expansion with limiting frequency of zeros $p$, where $p$ is any real number sufficiently close to $1/2$. For a class of planar self-affine sets we show that if the horizontal contraction lies in a certain parameter space and the vertical contractions are sufficiently close to $1$, then every non-trivial vertical fibre contains an interval. Our approach lends itself to explicit calculation and gives rise to new examples of self-affine sets with non-empty interior. One particular strength of our approach is that it allows for different rates of contraction in the vertical direction.


2016 ◽  
Vol 368 (12) ◽  
pp. 8633-8674 ◽  
Author(s):  
Philip Boyland ◽  
André de Carvalho ◽  
Toby Hall
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document