Calculate required sample size, achieved power, and Type II error for z-tests, t-tests, and chi-square tests. Cohen's d effect size reference and power curves included.
Statistical power is the probability that a test correctly rejects the null hypothesis when it's false — in other words, the probability of detecting a real effect. Power analysis is the essential planning step before any study, answering the question: "How many subjects do I need to have a good chance of finding the effect I'm looking for?"
This calculator helps you determine the required sample size for a desired power level, or compute the achieved power for a given sample size. It supports one-sample and two-sample z-tests, t-tests, paired t-tests, and chi-square tests. Enter your expected effect size (Cohen's d), significance level, and desired power to get the minimum sample size.
Power analysis is critical in clinical trials (ethical requirement to not waste patient resources), psychology experiments, A/B tests, market research, and any study where you need to justify your sample size to reviewers, grant agencies, or ethics boards.
An underpowered study wastes resources by being unlikely to detect real effects, while an overpowered study wastes resources by enrolling far more subjects than necessary. This calculator finds the sweet spot. It also provides a power curve showing how power changes with sample size, and a reference table of Cohen's d benchmarks to help you choose a realistic effect size.
Required Sample Size (one-sample test): n = ((z_α + z_β) / d)² Required Sample Size (two-sample test): n = 2 × ((z_α + z_β) / d)² (per group) Where: z_α = critical z-value for significance level α z_β = z-value for desired power (1 − β) d = Cohen's d effect size = (μ₁ − μ₂) / σ Achieved Power: 1 − β = Φ(d√n − z_α)
Result: Required n = 32
To detect a medium effect (d = 0.5) with 80% power at the 0.05 significance level using a two-tailed one-sample t-test, you need at least 32 subjects. With 32 subjects, your actual achieved power is approximately 80.5%.
Power analysis involves four quantities, and knowing any three determines the fourth: (1) significance level α, (2) effect size d, (3) sample size n, and (4) power 1−β. Most commonly, you fix α and d, then solve for n given a desired power. Alternatively, you can compute what power a given n achieves, or what effect size is detectable with your n and power.
The most critical (and difficult) decision in power analysis is selecting a realistic effect size. Overly optimistic effect sizes lead to underpowered studies. Sources include: pilot studies, published literature meta-analyses, subject-matter judgment about the minimum clinically important difference (MCID), and standardized benchmarks. When in doubt, use a smaller effect size for a more conservative (larger) sample size estimate.
Paired designs (e.g., before-after) typically have more power than independent-groups designs because they control for individual differences. Unequal group sizes always reduce power relative to balanced designs with the same total n. For ANOVA and regression, power depends on the number of groups/predictors and the expected effect size (Cohen's f or f²). Multi-site studies pool power across locations but introduce complexity in the analysis.
Cohen's d is a standardized effect size measuring the difference between two means divided by the pooled standard deviation. d = 0.2 is small (hard to see), 0.5 is medium (visible to careful observer), and 0.8 is large (obvious). It's the most common effect size for power analysis.
80% (0.80) is standard in most fields, meaning you have an 80% chance of detecting a real effect. For clinical trials or high-stakes research, 90% is preferred. Power below 50% means you're more likely to miss the effect than detect it.
Use pilot study data, published meta-analyses, or Cohen's benchmarks (0.2, 0.5, 0.8). Some researchers compute sample sizes for a range of effect sizes. A sensitivity analysis showing what effect sizes your planned n can detect is also informative.
Because after a study, post-hoc power is a direct function of the p-value obtained — it adds no new information. A non-significant result with "adequate post-hoc power" is a contradiction. Design-stage power analysis is the correct approach.
A stricter alpha (e.g., 0.01 instead of 0.05) requires larger samples to maintain the same power. This is because the rejection region shrinks, making it harder to reject H₀ and requiring more data to compensate.
Power = 1 − β, where β is the Type II error rate (probability of failing to detect a real effect). If power is 0.80, β is 0.20 — there's a 20% chance of a false negative.