Bonferroni Correction Calculator

Calculate adjusted significance thresholds for multiple comparisons using Bonferroni, Šidák, and Holm corrections. Compare methods side-by-side with FWER tables.

About the Bonferroni Correction Calculator

When you perform multiple statistical tests simultaneously, the probability of committing at least one Type I error (false positive) increases dramatically. The Bonferroni correction is the simplest and most widely used remedy: divide your significance level by the number of tests to maintain the desired familywise error rate (FWER).

This calculator computes adjusted significance thresholds using Bonferroni, Šidák, and Holm step-down methods. Enter your original alpha, the number of comparisons, and optionally your individual p-values to see which tests remain significant after correction. A side-by-side comparison table shows how each method performs.

Multiple comparison corrections are essential in genomics (thousands of gene tests), ANOVA post-hoc analyses, clinical trials with multiple endpoints, neuroimaging voxel-wise tests, and any study where many hypotheses are tested simultaneously. Without correction, you're virtually guaranteed false positives. Check the example with realistic values before reporting. Use the steps shown to verify rounding and units. Cross-check this output using a known reference case.

Why Use This Bonferroni Correction Calculator?

Performing 20 independent tests at α = 0.05 gives a 64% chance of at least one false positive — even when no real effect exists. Bonferroni correction reduces each test's threshold to α/m, keeping the overall error rate at α. This calculator also shows the less conservative Šidák and Holm alternatives, helping you pick the right balance between controlling false positives and retaining statistical power.

How to Use This Calculator

  1. Enter the number of simultaneous comparisons (m) you're performing.
  2. Set your original significance level alpha (typically 0.05).
  3. Optionally enter comma-separated p-values from your individual tests.
  4. Choose a correction method or select "Compare All Methods" for a side-by-side view.
  5. Review the corrected significance thresholds for Bonferroni and Šidák.
  6. Check the per-test results table to see which p-values survive each correction.
  7. Examine the Holm step-down procedure for a more powerful sequential method.

Formula

Bonferroni Correction: α* = α / m Šidák Correction: α* = 1 − (1 − α)^(1/m) Familywise Error Rate (uncorrected): FWER = 1 − (1 − α)^m Holm Step-Down: Order p-values: p₍₁₎ ≤ p₍₂₎ ≤ … ≤ p₍ₘ₎ Reject p₍ᵢ₎ if p₍ᵢ₎ < α / (m − i + 1) Stop at first non-rejection Where: m = number of comparisons, α = original significance level

Example Calculation

Result: Bonferroni threshold: 0.008333; 1 of 6 tests significant

With 6 comparisons and α = 0.05, the Bonferroni-adjusted threshold is 0.05/6 = 0.008333. Only the p-value of 0.002 falls below this threshold, so only one test remains significant after correction. The uncorrected FWER would have been 26.5%.

Tips & Best Practices

The Multiple Testing Problem

Every time you test a hypothesis at α = 0.05, there's a 5% chance of a false positive. Run 20 independent tests and the probability of at least one false positive is 1 − 0.95²⁰ ≈ 64%. This is the multiple testing problem, and it's pervasive in modern research where data analysis often involves many simultaneous comparisons.

Choosing a Correction Method

Bonferroni is the gold standard for simplicity but sacrifices power. Šidák provides a slight improvement. Holm's step-down procedure is uniformly more powerful — it should generally be preferred when you need FWER control. For large-scale screening (genomics, proteomics, neuroimaging), switch to FDR methods like Benjamini-Hochberg, which allow a controlled proportion of false discoveries rather than trying to eliminate them entirely.

Practical Considerations in Research

Many journals now require multiple comparison corrections for any study reporting more than one primary outcome. Pre-registering your planned analyses helps distinguish exploratory from confirmatory tests. Some researchers advocate adjusting only for the number of primary hypotheses, not secondary or exploratory analyses. The key is transparency: always report how many tests were conducted and what correction was applied.

Frequently Asked Questions

What is the Bonferroni correction?

It's a method that adjusts the significance threshold when performing multiple statistical tests. You divide your alpha level by the number of tests (α/m), ensuring the probability of at least one false positive stays at or below α across all tests.

Why is Bonferroni considered conservative?

Because it assumes the worst case where all tests are independent. In practice, tests are often correlated, making the true FWER lower than what Bonferroni controls for. This means you may miss real effects (reduced power).

What is the difference between Bonferroni and Šidák corrections?

Both control the FWER. Bonferroni uses α/m, while Šidák uses 1−(1−α)^(1/m). Šidák is slightly less conservative because it accounts for the exact probability rather than using the Bonferroni inequality. The difference is negligible for large m or small α.

What is the Holm step-down method?

Holm's method sorts p-values from smallest to largest, then tests each against a progressively less strict threshold: α/m, α/(m−1), α/(m−2), etc. It stops at the first non-significant p-value. It's always at least as powerful as Bonferroni.

When should I NOT use Bonferroni correction?

When you have hundreds or thousands of tests (e.g., genomics), Bonferroni becomes extremely conservative. In those cases, FDR-controlling methods like Benjamini-Hochberg are preferred. Also, pre-planned contrasts in ANOVA don't always require correction.

What is FWER vs FDR?

FWER (Familywise Error Rate) is the probability of making at least one Type I error across all tests. FDR (False Discovery Rate) is the expected proportion of false positives among all rejected hypotheses. FDR is less strict and more appropriate for large-scale testing.

Related Pages