Effect Size

A quantitative measure of the magnitude of a difference between groups in an experiment, independent of sample size. It answers the question "how big is the improvement?" rather than "is there an improvement?"

Also known as: treatment effect, lift

Why It Matters

Statistical significance tells you whether an effect exists. Effect size tells you whether it matters. A test can be highly significant (p < 0.001) while detecting an improvement so small (0.01% conversion lift) that it has no practical business impact.

Effect size is what drives business decisions. When deciding whether to invest engineering resources in shipping a variant, you need to know the expected magnitude of improvement, not just whether the improvement is nonzero. A 0.5% lift might not justify the maintenance cost, while a 5% lift clearly does.

Effect sizes also enable comparison across experiments. If you want to know which of your 10 past experiments produced the most impact, comparing p-values is meaningless (they depend on sample size). Comparing effect sizes gives you a genuine ranking of what moved the needle most.

How to Calculate

For conversion rate experiments, effect size is simply the absolute or relative difference between variant and control conversion rates. Absolute effect: variant rate minus control rate. Relative effect (lift): (variant rate - control rate) / control rate. For standardized comparisons across different metrics, use Cohen's d: (mean difference) / pooled standard deviation.

Industry Applications

E-commerce

A beauty brand runs a test that shows a statistically significant improvement in add-to-cart rate. However, the effect size is only 0.3% absolute (from 4.2% to 4.5%). Given the engineering cost to maintain the variant, they decide the effect size is too small to justify shipping.

SaaS

A collaboration tool tests a new trial onboarding sequence and measures a 15% relative lift in 7-day activation (from 20% to 23%). This effect size translates to 300 additional activated users per month, generating an estimated $45,000 in incremental annual revenue - clearly worth shipping.

How to Track in KISSmetrics

KISSmetrics reports effect sizes as both absolute and relative differences in experiment results. When planning experiments, set a minimum effect size that represents a practically meaningful improvement for your business. Use this as the basis for power analysis to determine required sample sizes.

Common Mistakes

  • -Focusing on statistical significance while ignoring whether the effect size is large enough to matter
  • -Comparing raw effect sizes across metrics with different scales without standardizing
  • -Expecting large effect sizes from minor changes - most UI tweaks produce 1-5% relative improvements
  • -Not distinguishing between absolute and relative effect sizes, which can be very different for low-baseline metrics

Pro Tips

  • +Establish minimum meaningful effect sizes for your key metrics before running experiments - what lift actually changes your business trajectory?
  • +Track historical effect sizes by experiment type to calibrate future expectations and power analyses
  • +Use relative lift for communicating results to stakeholders (it is more intuitive) but absolute differences for sample size calculations
  • +When an effect size is smaller than expected, consider whether the test variant was bold enough or whether the hypothesis was wrong

Related Terms

See Effect Size in action

KISSmetrics tracks every user across sessions and devices so you can measure what matters. Start free - no credit card required.