Question 1

What p-value means my test is significant?

Accepted Answer

The conventional threshold is p < 0.05 (95% confidence). This means there is less than a 5% probability that the observed difference is due to random chance. More conservative tests use p < 0.01 (99% confidence).

Question 2

What is a z-score?

Accepted Answer

The z-score measures how many standard deviations the observed difference is from zero (no difference). Higher absolute z-scores indicate stronger evidence. |Z| > 1.96 corresponds to p < 0.05, and |Z| > 2.58 corresponds to p < 0.01.

Question 3

Can a test be significant but not meaningful?

Accepted Answer

Yes. With very large sample sizes, even tiny differences (0.01% lift) can be statistically significant. Always ask whether the magnitude of the lift justifies the cost of implementation. Practical significance is as important as statistical significance.

Question 4

What is a two-tailed vs. one-tailed test?

Accepted Answer

A two-tailed test checks for any difference (better or worse). A one-tailed test only checks for improvement. Two-tailed is recommended because it catches degradations. This calculator uses a two-tailed test.

Question 5

My p-value is exactly 0.05. Is that significant?

Accepted Answer

Borderline. Technically, p must be less than 0.05 to reject the null hypothesis. In practice, a p-value of 0.05 suggests weak evidence. Consider running the test longer or using a larger sample to get a clearer signal.

Question 6

How do I interpret negative z-scores?

Accepted Answer

A negative z-score means the variant performed worse than the control. The p-value still measures significance — a significant negative result means the variant genuinely hurt performance and should not be implemented.

A/B Test Statistical Significance Calculator

About the A/B Test Statistical Significance Calculator

Why Use This A/B Test Statistical Significance Calculator?

How to Use This Calculator

Formula

Example Calculation

Tips & Best Practices

Understanding Statistical Significance

Common Significance Mistakes

Beyond Significance: Effect Size and Confidence Intervals

Frequently Asked Questions