Back to Tools

A/B Test Significance Calculator

Calculate if your email A/B test results are statistically significant. Enter your sample sizes and conversion rates to know if you can trust your test results or need more data.

A/B Test Significance Calculator

Determine if your test results are statistically significant

AControl Variation

BTest Variation

Understanding the results

  • 95% confidence means 5% chance results are due to random chance
  • P-value < 0.05 indicates statistical significance
  • Z-score measures standard deviations from the mean
  • Larger sample sizes give more reliable results
  • Don't end tests early - wait for significance

About this tool

You ran an A/B test and Variant B got a 12% higher open rate. Time to celebrate? Maybe not. With small sample sizes, random variation can easily produce a 12% swing that means nothing. This calculator tells you whether your results are statistically significant—meaning you can trust them enough to act on—or whether you need more data before making changes.

How statistical significance works in email testing

Statistical significance answers one question: "What's the probability that this result happened by pure chance?" When we say a result is "significant at 95%," it means there's only a 5% chance that random variation alone produced the observed difference. The math behind it (a two-proportion z-test for most email metrics) compares the conversion rates of your two variants while accounting for sample size. Larger samples give you more certainty. A 2% open rate difference with 500 recipients per variant might not be significant, but the same difference with 10,000 per variant almost certainly is. This calculator handles the math so you can focus on interpreting the results.

What to test and what not to test

The best A/B tests change one thing that could have a big impact. Subject lines are the classic email A/B test because they directly affect open rates and the effect sizes tend to be large enough to detect with reasonable sample sizes. Send time is another good one. Preheader text can move the needle on opens too. CTA button text, color, and placement affect click rates. What's harder to test: small copy tweaks deep in the email body, minor layout changes, or font choices. These usually produce tiny effects that require enormous sample sizes to detect. If you need 500,000 recipients per variant to reach significance, the difference probably isn't worth optimizing for.

Sample size: the make-or-break factor

Most email A/B tests fail to reach significance because the sample size is too small. Here's a rough guide: to detect a 2-percentage-point difference in open rates (say, 20% vs 22%), you need about 4,000 recipients per variant at 95% confidence. To detect a 1-point difference in click rates (3% vs 4%), you need around 5,500 per variant. If your list is 2,000 people total, you simply can't detect small effects—test bigger changes instead, like completely different subject line approaches rather than swapping one word. Use our email calculator to understand your baseline metrics before designing tests.

Common A/B testing mistakes

The most dangerous mistake is "peeking"—checking results early and stopping the test as soon as one variant looks better. This dramatically inflates your false positive rate because random fluctuations are larger with small samples. Decide your sample size before you start and commit to it. Another common mistake: running multiple tests simultaneously on overlapping audiences, which contaminates results. And don't test tiny differences—if you need a calculator to tell whether 20.1% vs 20.3% matters, it doesn't. Focus your testing energy on changes that could move metrics by 10% or more. Tag your test variants with unique UTM parameters so you can track downstream conversions, not just opens and clicks.

Frequently Asked Questions