A/B Price Test Sample Size Calculator

Calculate the minimum sample size needed for a statistically significant A/B price test. Set confidence level, power, and minimum detectable effect.

About the A/B Price Test Sample Size Calculator

Running a price test without enough traffic is like flipping a coin and calling it science. Our A/B Price Test Sample Size Calculator tells you exactly how many visitors or transactions each variant needs before you can trust the results. Enter your baseline conversion rate, the minimum change you want to detect, and your desired confidence and power levels — the tool returns a clear sample size per group along with estimated test duration.

Price experiments carry more risk than typical UX tests because every transaction affects real revenue. Under-powered tests lead to false positives that can lock in a worse price, while over-powered tests waste time that could be spent on other optimizations. This calculator uses the standard two-proportion z-test formula so you can plan experiments that are both efficient and reliable.

Whether you are testing a small SaaS pricing change or a major e-commerce markdown strategy, knowing the required sample size upfront prevents premature conclusions and protects your bottom line.

Why Use This A/B Price Test Sample Size Calculator?

Guessing when a price test has “enough data” is the most common mistake in pricing optimization. Ending an experiment too early often means acting on statistical noise, while running too long wastes traffic that could power the next test. This calculator removes the guesswork by applying established statistical formulas, helping you allocate resources efficiently and reach trustworthy conclusions every time.

How to Use This Calculator

  1. Enter your current (control) conversion rate as a percentage.
  2. Specify the minimum detectable effect — the smallest improvement worth detecting.
  3. Choose a significance level (commonly 5% for 95% confidence).
  4. Choose statistical power (commonly 80%).
  5. Optionally enter your daily traffic to estimate test duration.
  6. Read the required sample size per variant and total.
  7. Use the scenario table to compare different MDE and confidence combinations.

Formula

n = (Z_{α/2} + Z_β)² × [ p₁(1−p₁) + p₂(1−p₂) ] / (p₂ − p₁)² Where: • n = required sample size per group • Z_{α/2} = z-score for desired significance (e.g. 1.96 for 95%) • Z_β = z-score for desired power (e.g. 0.842 for 80%) • p₁ = baseline conversion rate • p₂ = baseline + minimum detectable effect

Example Calculation

Result: ~6,350 per variant (12,700 total — approximately 13 days)

With a 3% baseline conversion rate and a desire to detect a 0.5 percentage-point lift to 3.5%, at 95% confidence and 80% power, each variant needs roughly 6,350 visitors. At 1,000 visitors per day split evenly between two variants (500 each), the test would take about 13 days.

Tips & Best Practices

Why Sample Size Matters for Price Tests

Pricing is one of the highest-leverage decisions a business can make. A well-run price test can reveal whether a higher price boosts revenue without losing conversions, or whether a lower price drives enough volume to compensate for thinner margins. But these insights are only useful if you can trust the data. Under-sized experiments produce noisy results that look convincing but don't replicate, leading to pricing mistakes that can persist for months or years.

Fixed-Horizon vs. Sequential Testing

This calculator uses the fixed-horizon approach: compute a sample size upfront, run the test until you hit it, then analyze. The main alternative is sequential testing (e.g., SPRT), which allows early stopping when results are conclusive. Sequential methods are powerful but require more statistical sophistication and infrastructure. For most teams, a fixed-horizon test with a pre-computed sample size is the simplest reliable approach.

Practical Tips for Pricing Experiments

Always randomize at the user level, not the session level, so returning visitors see a consistent price. Run the test for complete weeks to neutralize day-of-week effects. Log revenue and conversion data separately so you can analyze both. And document your hypothesis, MDE, and sample size plan before launching — post-hoc rationalizations are the enemy of rigorous experimentation.

Frequently Asked Questions

What is minimum detectable effect (MDE)?

MDE is the smallest difference between control and variant conversion rates that your experiment can reliably pick up. A smaller MDE requires more traffic. Choose an MDE that represents a meaningful business impact — a 0.1 pp lift on a 3% baseline may not justify the engineering cost of the change.

What happens if I end the test early?

Stopping an experiment before reaching the required sample size inflates your false positive rate. You might conclude a price change works when it actually doesn't, or miss a real improvement. The calculated sample size assumes a fixed-horizon test — peeking at results and stopping early violates that assumption.

Should I use 80% or 90% power?

80% power is the standard in most industries and balances sample size with reliability. It means there is a 20% chance you fail to detect a real effect. For high-stakes pricing decisions where a miss is costly, 90% power provides extra protection at the expense of roughly 30% more traffic.

How do I handle a revenue-based metric instead of conversion rate?

Revenue metrics have higher variance than binary conversion rates, so they require larger samples. This calculator focuses on conversion rate (binary outcome). For revenue per visitor, you would use a t-test formula with the standard deviation of revenue, which typically requires 2–5× more traffic than a conversion test.

Can I test more than two prices at once?

Yes, but each additional variant increases the total sample needed. A three-variant test requires pairwise comparisons and a correction like Bonferroni to control the overall false positive rate. As a rule of thumb, multiply the two-variant sample by 1.5 for three variants.

What significance level should I use?

The standard is 5% (95% confidence), meaning there's a 5% chance of a false positive. For pricing tests with large revenue impact, some teams use 1% (99% confidence) for extra safety. Lower significance requires larger samples, so there's always a trade-off.

Related Pages