Calculate adjusted significance thresholds for multiple comparisons using Bonferroni, Šidák, and Holm corrections. Compare methods side-by-side with FWER tables.
When you perform multiple statistical tests simultaneously, the probability of committing at least one Type I error (false positive) increases dramatically. The Bonferroni correction is the simplest and most widely used remedy: divide your significance level by the number of tests to maintain the desired familywise error rate (FWER).
This calculator computes adjusted significance thresholds using Bonferroni, Šidák, and Holm step-down methods. Enter your original alpha, the number of comparisons, and optionally your individual p-values to see which tests remain significant after correction. A side-by-side comparison table shows how each method performs.
Multiple comparison corrections are essential in genomics (thousands of gene tests), ANOVA post-hoc analyses, clinical trials with multiple endpoints, neuroimaging voxel-wise tests, and any study where many hypotheses are tested simultaneously. Without correction, you're virtually guaranteed false positives. Check the example with realistic values before reporting. Use the steps shown to verify rounding and units. Cross-check this output using a known reference case.
Performing 20 independent tests at α = 0.05 gives a 64% chance of at least one false positive — even when no real effect exists. Bonferroni correction reduces each test's threshold to α/m, keeping the overall error rate at α. This calculator also shows the less conservative Šidák and Holm alternatives, helping you pick the right balance between controlling false positives and retaining statistical power.
Bonferroni Correction: α* = α / m Šidák Correction: α* = 1 − (1 − α)^(1/m) Familywise Error Rate (uncorrected): FWER = 1 − (1 − α)^m Holm Step-Down: Order p-values: p₍₁₎ ≤ p₍₂₎ ≤ … ≤ p₍ₘ₎ Reject p₍ᵢ₎ if p₍ᵢ₎ < α / (m − i + 1) Stop at first non-rejection Where: m = number of comparisons, α = original significance level
Result: Bonferroni threshold: 0.008333; 1 of 6 tests significant
With 6 comparisons and α = 0.05, the Bonferroni-adjusted threshold is 0.05/6 = 0.008333. Only the p-value of 0.002 falls below this threshold, so only one test remains significant after correction. The uncorrected FWER would have been 26.5%.
Every time you test a hypothesis at α = 0.05, there's a 5% chance of a false positive. Run 20 independent tests and the probability of at least one false positive is 1 − 0.95²⁰ ≈ 64%. This is the multiple testing problem, and it's pervasive in modern research where data analysis often involves many simultaneous comparisons.
Bonferroni is the gold standard for simplicity but sacrifices power. Šidák provides a slight improvement. Holm's step-down procedure is uniformly more powerful — it should generally be preferred when you need FWER control. For large-scale screening (genomics, proteomics, neuroimaging), switch to FDR methods like Benjamini-Hochberg, which allow a controlled proportion of false discoveries rather than trying to eliminate them entirely.
Many journals now require multiple comparison corrections for any study reporting more than one primary outcome. Pre-registering your planned analyses helps distinguish exploratory from confirmatory tests. Some researchers advocate adjusting only for the number of primary hypotheses, not secondary or exploratory analyses. The key is transparency: always report how many tests were conducted and what correction was applied.
It's a method that adjusts the significance threshold when performing multiple statistical tests. You divide your alpha level by the number of tests (α/m), ensuring the probability of at least one false positive stays at or below α across all tests.
Because it assumes the worst case where all tests are independent. In practice, tests are often correlated, making the true FWER lower than what Bonferroni controls for. This means you may miss real effects (reduced power).
Both control the FWER. Bonferroni uses α/m, while Šidák uses 1−(1−α)^(1/m). Šidák is slightly less conservative because it accounts for the exact probability rather than using the Bonferroni inequality. The difference is negligible for large m or small α.
Holm's method sorts p-values from smallest to largest, then tests each against a progressively less strict threshold: α/m, α/(m−1), α/(m−2), etc. It stops at the first non-significant p-value. It's always at least as powerful as Bonferroni.
When you have hundreds or thousands of tests (e.g., genomics), Bonferroni becomes extremely conservative. In those cases, FDR-controlling methods like Benjamini-Hochberg are preferred. Also, pre-planned contrasts in ANOVA don't always require correction.
FWER (Familywise Error Rate) is the probability of making at least one Type I error across all tests. FDR (False Discovery Rate) is the expected proportion of false positives among all rejected hypotheses. FDR is less strict and more appropriate for large-scale testing.