Benford’s law asserts that the lower first significant digit (FSD) occurs more frequently than the higher FSD in naturally produced datasets. The applications of the law vary from detecting election, tax, and Covid-19 data fraud to checking abnormalities in the stock market. Hence, it is vital to know which common probability distributions satisfy Benford’s law, which is called Hill’s question. Many research studies have been performed to answer this question by using various methods. The purpose of the work is to give a more simple and intuitive method to address the question for some common probability distributions. Moreover, statistical simulation is adopted to test their conformity to Benford’s law.

The basic form of Benford’s law is the following logarithmic distribution:

(1)

The law was first discovered by Newcomb in 1881 and later rediscovered by Benford in 1938.2 According to the law, one occurs roughly 30% of the time, but nine occurs only less than 5% of the time. Many research studies have been performed to explain the existence of the law by using various methods. Professor Hill has given a strict proof of existence of Benford’s law based on some assumptions in his milestone work.3 However, he also raised an open question: which common probability distributions satisfy Benford’s law? This is called Hill’s question.

It is very important to answer the question because Benford’s law has been found in various fields. For instance, Knuth, Burke, and Kincano observed that 30% of the most frequently used physical constants have first significant digit 1.4,5 The law also has many diverse applications, such as evaluating data validity6 and detecting tax, voter image, and Covid-19 data fraud.7–10 

TABLE I.

The rate of rejection that the FSD of the log-normal distribution is not Benford.

Distribution10050010002000500010  000
(0, 0.5) 99.2 100 100 100 100 100 
(0, 1) 4.5 7.5 9.2 14.5 30.4 63.7 
(0, 1.2) 5.2 5.6 5.2 4.5 5.3 7.1 
(0, 1.5) 4.6 4.1 5.8 5.3 5.4 4.5 
(1, 0.5) 97.3 100 100 100 100 100 
(1, 1) 4.2 4.8 7.8 13.4 28.1 54.4 
(1, 1.2) 5.2 5.4 4.9 4.9 6.1 
(1, 1.5) 5.1 4.6 4.8 5.9 4.8 5.4 
(2, 0.5) 98.5 100 100 100 100 100 
(2, 1) 5.8 7.8 8.9 13.7 28.3 58.1 
(2, 1.2) 4.5 6.7 5.6 5.2 6.7 5.7 
(2, 1.5) 4.1 4.6 6.1 5.1 
Distribution10050010002000500010  000
(0, 0.5) 99.2 100 100 100 100 100 
(0, 1) 4.5 7.5 9.2 14.5 30.4 63.7 
(0, 1.2) 5.2 5.6 5.2 4.5 5.3 7.1 
(0, 1.5) 4.6 4.1 5.8 5.3 5.4 4.5 
(1, 0.5) 97.3 100 100 100 100 100 
(1, 1) 4.2 4.8 7.8 13.4 28.1 54.4 
(1, 1.2) 5.2 5.4 4.9 4.9 6.1 
(1, 1.5) 5.1 4.6 4.8 5.9 4.8 5.4 
(2, 0.5) 98.5 100 100 100 100 100 
(2, 1) 5.8 7.8 8.9 13.7 28.3 58.1 
(2, 1.2) 4.5 6.7 5.6 5.2 6.7 5.7 
(2, 1.5) 4.1 4.6 6.1 5.1 
TABLE II.

The rate of rejection that the FSD of the Weibull distribution is not Benford.

Distribution10050010002000500010  000
(1, 0.5) 5.7 5.4 4.8 4.6 5.6 5.1 
(1, 0.6) 4.3 5.1 4.6 5.3 5.6 5.2 
(1, 1) 7.3 17.4 35.4 69.5 99.4 100 
(1, 1.5) 42.3 99.8 100 100 100 100 
(1, 2) 94.4 100 100 100 100 100 
(2, 0.5) 5.5 5.6 5.4 6.8 4.5 
(2, 0.6) 6.4 5.7 5.2 5.1 4.9 
(2, 1) 14.9 30.5 65.1 97.8 100 
(2, 1.5) 31.1 99.5 100 100 100 100 
(2, 2) 95 100 100 100 100 100 
(3, 0.5) 5.5 4.3 4.3 5.7 4.9 6.3 
(3, 0.6) 5.1 3.6 4.9 4.4 6.5 6.9 
(3, 1) 6.3 19.5 35.3 66.7 99.1 100 
(3, 1.5) 35.7 99.7 100 100 100 100 
(3, 2) 94.9 100 100 100 100 100 
Distribution10050010002000500010  000
(1, 0.5) 5.7 5.4 4.8 4.6 5.6 5.1 
(1, 0.6) 4.3 5.1 4.6 5.3 5.6 5.2 
(1, 1) 7.3 17.4 35.4 69.5 99.4 100 
(1, 1.5) 42.3 99.8 100 100 100 100 
(1, 2) 94.4 100 100 100 100 100 
(2, 0.5) 5.5 5.6 5.4 6.8 4.5 
(2, 0.6) 6.4 5.7 5.2 5.1 4.9 
(2, 1) 14.9 30.5 65.1 97.8 100 
(2, 1.5) 31.1 99.5 100 100 100 100 
(2, 2) 95 100 100 100 100 100 
(3, 0.5) 5.5 4.3 4.3 5.7 4.9 6.3 
(3, 0.6) 5.1 3.6 4.9 4.4 6.5 6.9 
(3, 1) 6.3 19.5 35.3 66.7 99.1 100 
(3, 1.5) 35.7 99.7 100 100 100 100 
(3, 2) 94.9 100 100 100 100 100 
TABLE III.

The rate of rejection that FSD of the inverse gamma distribution is not Benford.

Distribution10050010002000500010 000
(1, 0.3) 4.7 5.8 6.7 8.4 10.1 
(1, 0.5) 5.3 7.5 6.4 11.8 20.1 41 
(1, 1) 19.6 33.4 62.4 97.7 100 
(1, 1.5) 16.5 61.2 93.2 99.9 100 100 
(2, 0.3) 4.2 5.4 5.2 6.1 10 
(2, 0.5) 3.9 6.7 6.1 10.3 20.8 39.4 
(2, 1) 17.5 33.5 68.6 98.8 100 
(2, 1.5) 13.1 61.8 94.9 100 100 100 
(3, 0.3) 5.1 4.9 6.6 5.2 7.2 11.6 
(3, 0.5) 4.9 5.2 7.2 11.2 17.6 37.5 
(3, 1) 5.4 15.8 30.4 63 98 100 
(3, 1.5) 10.1 62.8 93.3 99.9 100 100 
Distribution10050010002000500010 000
(1, 0.3) 4.7 5.8 6.7 8.4 10.1 
(1, 0.5) 5.3 7.5 6.4 11.8 20.1 41 
(1, 1) 19.6 33.4 62.4 97.7 100 
(1, 1.5) 16.5 61.2 93.2 99.9 100 100 
(2, 0.3) 4.2 5.4 5.2 6.1 10 
(2, 0.5) 3.9 6.7 6.1 10.3 20.8 39.4 
(2, 1) 17.5 33.5 68.6 98.8 100 
(2, 1.5) 13.1 61.8 94.9 100 100 100 
(3, 0.3) 5.1 4.9 6.6 5.2 7.2 11.6 
(3, 0.5) 4.9 5.2 7.2 11.2 17.6 37.5 
(3, 1) 5.4 15.8 30.4 63 98 100 
(3, 1.5) 10.1 62.8 93.3 99.9 100 100 

Since Hill’s question was raised, many researchers began to investigate the conformity of probability distribution to Benford’s law. Leemis et al. quantified compliance with Benford’s law for some survival distributions.11 Engel and Leuenberger proved that exponentially distributed random variable obeys the law approximately.12 Miller et al. proved that both Weibull distribution and inverse gamma distribution are almost Benford if their parameters satisfy some conditions.13,14 Fasli and Scott showed that the log-normal distribution is nearly conforming to Benford’s law.15 Rodriguez also proved log-normal distribution is almost Benford.16 Fang and Chen proved that several common probability distributions almost obey Benford’s law.17 

After reviewing these research studies, a problem is worthy of study: why are these probability distributions almost Benford? There may be some essential connections among these probability distributions. Therefore, some common probability distributions are selected in this paper, which are log-normal, Weibull, and inverse gamma distributions. The graphs of probability density functions are observed, and some similar patterns have been found. The graphs of their pdfs with different parameters are shown in Figs. 13.

FIG. 1.

Log-normal probability density function curve.

FIG. 1.

Log-normal probability density function curve.

Close modal
FIG. 2.

Weibull probability density function curve.

FIG. 2.

Weibull probability density function curve.

Close modal
FIG. 3.

Inverse gamma probability density function curve.

FIG. 3.

Inverse gamma probability density function curve.

Close modal

From the graphs, it can be observed that all the curves of their pdf f(x) are increasing on ,a and deceasing on a,+, and there is a maximum f(a), so there must be some internal common properties with these probability distributions. Gauvrit and Delahaye presented two new concepts: regularity and scatter.18 The former corresponds to the function f increasing on ,a and deceasing on a,+, and the latter corresponds to its small maximum f(a). They thought both of the concepts are related to a probability distribution conforming to Benford’s law. In the paper, both of the concepts were used to investigate whether the log-normal distribution, Weibull distribution, and inverse gamma distribution are close to Benford’s law. Moreover, statistical simulation was adopted to test their conformity to the law as Rodriguez has performed in his work.16 

This paper is organized as follows: some basic definitions and theorems are listed in Sec. II. In Sec. III, main theoretical results are presented. In Sec. IV, statistical simulation is used to test the three probability distributions conforming to Benford’s law. In Sec. V, some final remarks and clues for future research are discussed.

Some basic definitions and theorems about Benford’s law are listed as follows:

Definition II.1

(Ref. 19) (Significand). Any positive numberxcan be expressed as the form of scientific notation, that is,x = SB(x) · Bk(x), where B is the base.SB(x) ∈ [1, B) represents the significand ofx, and the integerk(x) is the exponent.

If a number is negative, its significand is the same as its absolute value. Usually, the number system is decimal, but in this work, a more general circumstance is concerned. Any other integers can be the base in addition to ten. If the base is ten, the significand can be easily computed; for example, 3.1415 is the significand of 31 415. In particular, 3 is called the first significant digit (FSD) of 31 415.

Definition II.2
(Ref. 19) (Benford’s law). A random variableXwith its support (0, +) obeys Benford’s law in baseBif anys ∈ [1, B),
(2)
particularly,
(3)
and if B = 10, the law becomes the initial form discovered by Newcomb1and Benford.2 

Remark II.3.

If the base is ten, the probability of Benford’s law can be calculated. Prob(FSD = 1) = prob(1 ≤ S(X) ⩽ 2) = lg2 − lg1 = ≈0.3010, Prob(FSD = 9) = prob(9 ≤ S(X) ⩽ 10) = 1 − lg9 ≈ 0.0457. The probability decreases as FSD becomes larger.

Theorem II.4

(Ref. 20). Any random variable,X > 0, obeys Benford’s law if and only if the fraction part of logB(X) is uniformly distributed in [0, 1].

Theorem II.4 states if a random variable is Benford, the fraction part of logB(X) is uniformly distributed in [0, 1]. Let FB(z) denote the cumulative distribution of the fraction part of logB(X). Benford’s law is equivalent to FB(z) = z or FB(z)=1.To investigate a random variable with a probability distribution deviation from Benford’s law is to compare its FB(z) deviation from z.

Gauvrit and Delahaye have given a result to state how regularity and scatter are related to a probability distribution conforming to Benford’s law.18 The following lemma is their result.

Lemma III.1
(Ref. 18). Xis continuous positive random variable with pdffsuch thatId.f: xxf(x) conforms to the two following conditions:∃a > 0 such that (1) max(Id.f) = m = af(a) and (2) Id.f is non-decreasing on0,aand non-increasing ona,+; then, for anyz ∈ (0, 1],
(4)

In particular,Xnbeing a sequence of continuous random variables withfnsatisfying these conditions and such thatmn = max (Id.fn) → 0, then{log(Xn)}converges to uniformity in [0, 1).

Based on Theorem III.1, three probability distributions close to Benford’s law are proved.

Theorem III.2.
LetXμ,σbe a continuous positive random variable from the log-normal distribution with parametersμandσ; for anyz ∈ [0, 1], letF(z) be the cumulative distribution function of log(Xμ,σ); then
(5)

Proof III.3.
f(x)=1xσ2πexp(lnxμ)22σ2,x>0; let F(x)=xf(x)=1σ2πexp(lnxμ)22σ2,
if F′(x) > 0, lnxμ < 0, x < eμ, and if F(x)<0,lnxμ>0,x>eμmaxF(x)=1σ2π

It can be shown that the log-normal random variable with large σ is almost to be Benford.

Theorem III.4.
LetXα,γbe a continuous positive random variable for Weibull distribution with parametersαandγ;.for anyz ∈ [0, 1], F(z) is the cumulative distribution function of log(Xα,γ); then,
(6)

Proof III.5.
f(x,α,γ)=γα(xα)γ1e(xα)γ,
take derivative
Hence, ifF′(x) > 0, thenx < α, and ifF′(x) < 0, thenx > α. maxF(x)=F(α)=γe; hence,|FB(z)z|<2ln(10)γe.

From the above inequality, we see that the Weibull distribution random variable with small γ is almost Benford.

Theorem III.6.
LetXα,βbe a positive random variable from the inverse gamma distribution with parametersαandβ. For anyz ∈ [0, 1], letF(z) be the cumulative distribution of log(Xα,β). Then,
(7)

Proof III.7.
take derivative
Therefore, if F′(x) > 0, then x<βα, and if F′(x) < 0, then x>βα. Hence, maxF(x)=βαΓ(α)(βα)αeα=ααΓ(α)eα,

Likewise, the inverse gamma distribution random variable with small α is almost to be Benford.

Compared to the previous methods to investigate a probability distribution deviation from Benford, such as Fourier analysis, the above method solves the problem easily. If the pdf of a probability distribution satisfies regularity and scatter, which can be observed from their graphs of pdfs, it should be close to Benford’s law if the value of its parameters is proper.

In real world, if the population distribution is one of the above probability distributions, it is believed that a sample from the population should approximately conform to Benford’s law. However, this assumption needs to be confirmed and the above theoretical results need to be checked by using statistical simulation. Compared with the former numerical calculation method in Ref. 17, statistical simulation is more reasonable and practical. Two kinds of hypothesis are presented as follows:

  • H0: The distribution of FSD of a population is Benford.

  • H1: The distribution of FSD of a population is not Benford.

The six steps of the statistical simulation are as follows:

  1. Set up a probability distribution with parameters for a population and fix a sample size n.

  2. Use the R procedure to produce 1000 samples with the same size n.

  3. investigate the distribution of FSD of each sample and calculate the value of χ2 between the distribution and Benford’s law.

  4. Repeat the above hypothesis testing process for the 1000 samples and observe how many times the null hypothesis is rejected and obtain the rejection rate.

  5. Change the sample size, produce another group 1000 samples with the size, and repeat the above process.

  6. Complete the above process with six different sample sizes,then adjust the parameters of the population distribution, and do it again.

Here, set up six different sample sizes, which are 100 500, 1000, 2000, 5000, and 10 000, respectively, and the number of samples is 1000.

Based on Theorem III.2, the log-normal distribution is close to Benford’s law if the parameter σ increases, but the other parameter μ has no significant effect. Let μ be equal to 0, 1, and 2, and σ be equal to 0.5, 1, 1.2, and 1.5, respectively, so that 12 different parameter combinations are formed. Six group data with different sample sizes from a population with one of the parameters are produced. The FSD and the rate of rejection are computed. Table I gives the result.

From the computing result of Table I, if the parameter μ is fixed, the value of the parameter σ is smaller, and the sample size is larger, the rate of rejection becomes larger. However, if the value of the parameter σ is greater than 1.2, no matter how big the sample size is, the rate of rejection is almost equal to 5%. If the value of the parameter σ is fixed and large, no matter how big or small the parameter μ is, the rate of rejection is almost same, not affected by the change of μ.

Obviously, the rate of rejection becomes small as the value of the parameter σ becomes larger, so whether the distribution of FSD of the log-normal distribution is close to Benford’s law is only related to the parameter σ.If it is greater than 1.2, the log-normal distribution is almost Benford. Although σ is larger than 1.2, Table I states that there is still some rate of rejection existing, which is so low that we believe a sample from log-normal population is almost Benford in real world.

Based on Theorem III.4, the Weibull distribution is also close to Benford’s law if the parameter γ decreases, but the other parameter α has no work. Same as the above, let α be equal to 1, 2, and 3, and γ be equal to 0.5, 0.6, 1, and 1.5, respectively, which forms 12 different parameters; choose one of these parameter combination as a population. Six group data with different sample size from the population are produced. The FSD and the rate of rejection are computed. The results are listed in Table II.

From the above result of Table II, it can be seen if the parameter α is fixed, the value of the parameter γ is greater than 1, and if the sample size is larger, the rate of rejection increases. However, if the value of the parameter γ is less than 0.6, no matter how big the sample is, the rate of rejection is almost equal to 5%. If the value of the parameter γ is fixed, no matter how big or small the parameter γ is, the rate of rejection is almost same, not affected by the change of α.

The rate of rejection becomes small as the value of the parameter γ becomes smaller. Whether the FSD of the Weibull distribution is close to Benford’s law is only related to the parameter γ; if it is not greater than 0.6, then it is almost Benford. Although γ is less than 0.6, the above table states that there is still some rate of rejection existing, which is so low that we believe a sample from the Weibull population is almost Benford in real world.

From Theorem III.4, the inverse gamma distribution almost obeys Benford’s law if the parameter α decreases and the other parameter β is not effective. Let β be equal to 0, 1, and 2, and α be equal to 0.3, 0.5, 1, and 1.5. Likewise, 12 different parameter combinations are formed. Select a population with one of the parameter combinations, produce six group data with different sample sizes from the population, and compute the FSD and rate of rejection. The results are listed in Table III.

From the above computing result of Table III, if the parameter β is fixed, the value of the parameter α becomes larger and if the sample size is larger, the rate of rejection becomes larger. However, if the value of the parameter α is smaller than 0.3, no matter how big the sample is, the rate of rejection is almost equal to 5%. If the value of the parameter α is fixed, no matter how big or small the parameter β is, the rate of rejection is almost the same, not affected by the change of β.

Clearly, the rate of rejection becomes smaller as the value of the parameter α becomes smaller. In other words, whether the FSD of the inverse gamma distribution is close to Benford’s law is only related to the parameter α. If it is less than or equal to 0.3, the distribution of FSD of the inverse gamma distribution is almost Benford. Although α is less than or equal to 0.3, Table III states that there is still some rate of rejection existing, which is so low that we believe a sample from the inverse gamma population is almost Benford in natural world.

Hill’s question is investigated for some common probability distribution. Compared to previous research methods, such as Fourier analysis, he method adopted here is relatively easy to solve the question. These distributions obey Benford’s law approximately if their parameters satisfy some conditions.

In addition to these strict mathematical proofs, statistical simulation is used to test the theoretical results and also get similar results as before. Specifically, for the log-normal distribution, the parameter σ is bigger than 1.2; for the Weibull distribution, the shape parameter is less than 0.5; for the inverse gamma distribution, the shape parameter is less than 0.3. The other parameter almost has no effect.

However, it must be pointed out that Hill’s question has partly been solved. Many other common probability distributions need to be investigated to determine whether they are close to Benford’s law. The method used here can be helpful in answering the question. When many probability distributions are confirmed to be close to Benford’s law, we can say that the law is widely occurring in the natural world.

The author acknowledges the support from the National Natural Science Foundation of China (Grant No. 71771142).

The author has no conflicts to disclose.

Guojun Fang: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Supervision (equal); Validation (equal); Writing – original draft (equal); Writing – review & editing (equal).

The data that support the findings of this study are available within the article.

1.
S.
Newcomb
, “
Note on the frequency of use of the different digits in natural numbers
,”
Am. J. Math.
4
,
39
40
(
1881
).
2.
F.
Benford
, “
The law of anomalous numbers
,”
Proc. Am. Philos. Soc.
78
,
551
572
(
1938
).
3.
T. P.
Hill
, “
A statistical derivation of the significant-digit law
,”
Stat. Sci.
10
,
354
363
(
1995
).
4.
D. E.
Knuth
,
The Art of Computer Programming, Volume 3: Sorting and Searching
, 2nd ed. (
Addison Wesley Longman Publishing Co., Inc.
,
1997
), pp.
253
264
.
5.
J.
Burke
and
E.
Kincanon
, “
Benford’s law and physical constants: The distribution of initial digits
,”
Am. J. Phys.
59
,
952
(
1991
).
6.
R.
Cerqueti
and
M.
Maggi
, “
Data validity and statistical conformity with Benford’s law
,”
Chaos, Solitons Fractals
144
,
110740
(
2021
).
7.
W. R.
Mebane
, “
Comment on ‘Benford’s law and the detection of election fraud
,’”
Political Anal.
19
,
269
272
(
2011
).
8.
M. J.
Nigrini
,
Forensic Analytics: Methods and Techniques for Forensic Accounting Investigations
, 1st ed. (
Wiley
,
2012
).
9.
D.
Fu
,
Y. Q.
Shi
, and
W.
Su
, “
A generalized Benford’s law for JPEG coefficients and its applications in image forensics
,”
Proc. SPIE
6505
,
65051L
(
2007
).
10.
Pérez-González
,
F
,
Heileman
,
G
and
Abdallah
,
CT
.
A generalization of Benford’s Law and its application to images. European Control Conference’2007
,
Kos, Greece
, July
2007
, pp. 3613–3619.
ISSN/ISBN:9783952417386
.
11.
L. M.
Leemis
,
B. W.
Schmeiser
, and
D. L.
Evans
, “
Survival distributions satisfying Benford’s law
,”
Am. Stat.
54
,
236
241
(
2000
).
12.
H. A.
Engel
and
C.
Leuenberger
, “
Benford’s law for exponential random variables
,”
Stat. Probab. Lett.
63
,
361
365
(
2008
).
13.
V.
Cuff
,
A.
Lewis
, and
S. J.
Miller
, “
The Weibull distribution and Benford’s law
,”
Involve J. Math.
8
,
859
874
(
2015
).
14.
R. F.
Durst
,
H.
Chi
,
A.
Lott
,
S. J.
Miller
,
E. A.
Palsson
,
W.
Touw
, and
G.
Vriend
, “
The inverse gamma distribution and benford’s law
,”
ArXiv E-Prints
(
2016
).
15.
P. D.
Scott
and
M.
Fasli
, “
Benford’s law: An empirical investigation and a novel explanation
,” Technical Report No. CSM-349,
2001
.
16.
R. J.
Rodriguez
, “
Reducing false alarms in the detection of human influence on data
,”
J. Accounting, Auditing Finance
19
,
141
158
(
2004
).
17.
G.
Fang
and
Q.
Chen
, “
Several common probability distributions obey Benford’s law
,”
Physica A
540
,
123129
(
2020
).
18.
Gauvrit
,
N
and
Delahaye
,
J-P
.
Scatter and regularity imply Benford's Law ... and more
.
Preprint arXiv: 0910.1359 [math.PR]; last accessed
July 18, 2018.
19.
A.
Berger
and
T. P.
Hill
, “
A basic theory of Benford’s law
,”
Probab. Surv.
8
,
1
126
(
2011
).
20.
P.
Diaconis
, “
The distribution of leading digits and uniform distribution mod 1
,”
Ann. Probab.
5
,
72
81
(
1977
).