Test whether a dataset follows Benford's Law using chi-squared, K-S, and MAD statistics. Visualize leading digit distribution against expected frequencies.
Benford's Law, also called the First-Digit Law, predicts that in many naturally occurring datasets the leading digit 1 appears about 30.1% of the time, while 9 appears only 4.6% of the time. Rather than uniform distribution, the probability of leading digit d is P(d) = log₁₀(1 + 1/d). This counterintuitive pattern holds remarkably well for population counts, financial data, scientific constants, street addresses, and many other real-world data sources.
This calculator lets you paste any numeric dataset and instantly compare its leading-digit distribution against the Benford prediction. It performs a chi-squared goodness-of-fit test (df = 8) and computes the Kolmogorov-Smirnov statistic and mean absolute deviation (MAD) to quantify how closely the data conforms. The tool classifies datasets as conforming, suspicious, or non-conforming based on standard critical values.
Benford analysis is widely used in forensic accounting and fraud detection — fabricated numbers tend to have more uniform leading digits. Tax authorities, auditors, and election data analysts use Benford's Law as a screening tool. Try the preset datasets to see conforming (Fibonacci, populations) and non-conforming (uniform random) examples.
Benford's Law analysis is a powerful first-pass tool for detecting anomalies in numerical data. Whether you're an auditor reviewing expense claims, a data scientist validating datasets, or a student exploring probability, this calculator instantly reveals whether leading-digit patterns match theoretical expectations.
Manual calculation for large datasets is tedious — extracting leading digits, computing frequencies, and performing chi-squared tests by hand is error-prone. This tool automates everything and provides visual comparison bars so patterns jump out immediately.
Benford's Law: P(d) = log₁₀(1 + 1/d) for d = 1, 2, ..., 9 Chi-squared test: χ² = Σ ((Observed_d − Expected_d)² / Expected_d) with df = 8, critical values 15.507 (α = 0.05) and 20.090 (α = 0.01) K-S Statistic: D = max |F_obs(d) − F_exp(d)| over all digits
Result: Conforming (χ² = 2.31)
Fibonacci numbers naturally follow Benford's Law. Digit 1 appears about 30% of the time, consistent with the predicted 30.1%. The chi-squared statistic of 2.31 is well below the critical value of 15.51.
Simon Newcomb first noticed in 1881 that early pages of logarithm tables were more worn than later ones. Frank Benford rediscovered and extensively tested this in 1938 across 20 diverse datasets. The underlying principle is logarithmic distribution: if the mantissa (fractional part of the logarithm) of data values is uniformly distributed, then P(d) = log₁₀(1 + 1/d).
This produces the distinctive decreasing staircase: 1 appears 30.1% of the time, 2 at 17.6%, gradually declining to 9 at 4.6%. The distribution extends to second and third digits as well, though distributions become flatter with each successive digit, approaching uniform.
Major accounting firms and government agencies routinely apply Benford analysis. The IRS, SEC, and Europol have used it to flag suspicious financial statements. A company whose revenue figures show unusually high frequency of leading 5s and 6s may warrant deeper investigation. Similarly, election results in some countries have been analyzed using Benford's Law, though interpretation requires domain expertise.
The Nigrini method, developed by forensic accountant Mark Nigrini, established standardized thresholds for MAD: below 0.006 is close conformity, 0.006–0.012 is acceptable, 0.012–0.015 is marginally acceptable, and above 0.015 is non-conforming.
Benford's Law does not apply universally. Datasets generated from uniform distributions, assigned numbers (like Social Security Numbers), or data confined to a narrow range will naturally deviate. Human height in inches, for example, mostly starts with digits 5, 6, or 7 — a legitimate deviation. Always consider whether the data-generating process is expected to produce Benford-distributed values before drawing conclusions from non-conformity.
Benford's Law arises from scale invariance — if a dataset's distribution spans multiple orders of magnitude, lower leading digits naturally dominate because numbers spend longer in ranges starting with smaller digits. Logarithmically distributed data perfectly follows the law.
It works best for data spanning several orders of magnitude without artificial caps or floors: financial statements, populations, scientific measurements, utility bills, stock prices, and address numbers. It does not apply to genuinely uniform data like lottery numbers, phone numbers, or data restricted to a narrow range.
Fabricated financial data often violates Benford's Law because humans tend to choose leading digits more uniformly or favor certain digits. Auditors screen expense reports, tax returns, and accounting ledgers using chi-squared tests against Benford's distribution to flag anomalies for further investigation.
The chi-squared statistic measures overall deviation from Benford's distribution. If it exceeds 15.51 (for 8 degrees of freedom), the data deviates significantly at the 5% level. The K-S statistic captures the maximum cumulative difference. MAD gives the average per-digit absolute deviation.
No. Non-conformity is a red flag, not proof. Legitimate data may deviate for valid reasons (restricted range, aggregation effects, small sample). Conversely, conforming data isn't guaranteed fraud-free. Benford analysis is a screening tool, not a definitive test.
Generally, at least 100 data points are recommended for reliable chi-squared testing. With fewer than 50 observations, the test has low statistical power and results should be interpreted cautiously.