Statistical Power

The probability that a test will correctly detect a real effect when one exists, typically set at 80% as a minimum standard. Higher power means a lower chance of missing genuine improvements.

Also known as: test power, 1 minus beta

Formula

n = (Z_alpha/2 + Z_beta)^2 * (p1(1-p1) + p2(1-p2)) / (p2 - p1)^2

Try Calculator

Why It Matters

Statistical power is the experimentation equivalent of sensitivity. A test with 80% power will detect a real improvement 80% of the time and miss it 20% of the time. A test with 50% power is a coin flip - you might as well not run it.

Power analysis should happen before every experiment, not after. It tells you how many users you need and how long the test needs to run to have a reasonable chance of detecting the effect you care about. Without this step, you are guessing whether your test will be informative.

Low-powered tests are worse than no test at all. They consume traffic (opportunity cost), consume time (team attention), and produce inconclusive results that do not improve decision-making. If you cannot achieve adequate power for a test, you are better off spending those resources elsewhere.

How to Calculate

Power is a function of four variables: sample size, effect size (the minimum improvement you want to detect), significance level (alpha), and baseline variance. Increase any of the first three and power increases. Online power calculators and built-in experimentation tools handle the math. The standard minimum is 80% power, meaning a 20% chance of a Type II error.

Required Sample Size (per group, for proportions) Calculator

n = (Z_alpha/2 + Z_beta)^2 * (p1(1-p1) + p2(1-p2)) / (p2 - p1)^2

Required Sample Size Per Group31240.00users

Industry Applications

E-commerce

A mid-size retailer with 50,000 monthly visitors calculates that detecting a 5% relative lift in their 3% checkout conversion requires 85,000 visitors per variant at 80% power. This means a 7-week test, so they decide to test a bolder redesign expected to produce at least a 15% lift, reducing the required duration to under 2 weeks.

SaaS

A B2B platform with only 500 new trials per month realizes they need 4,200 per variant to detect a 10% improvement in activation at 80% power. Instead of running an underpowered test, they focus on qualitative user testing and make changes based on direct user feedback.

How to Track in KISSmetrics

Before launching experiments in KISSmetrics, use a sample size calculator to determine the traffic needed for 80% power at your desired minimum detectable effect. Compare this against your actual traffic volume to estimate test duration. If the required duration exceeds 4-6 weeks, consider testing a bolder change (larger expected effect) or focusing on higher-traffic pages.

Common Mistakes

  • -Not calculating power before the test, leading to underpowered experiments that waste time and traffic
  • -Confusing power with confidence level - they control different types of errors
  • -Setting power too high (99%) which requires enormous sample sizes and very long test durations
  • -Ignoring the relationship between effect size and power - smaller effects need much larger samples to detect

Pro Tips

  • +Use 80% power as your default and reserve 90% power for high-stakes tests where missing a real effect is very costly
  • +When traffic is limited, increase power by testing bolder changes with larger expected effects rather than subtle tweaks
  • +Track your experimentation program's historical detection rate - if fewer than 70% of "promising" tests reach significance, your tests may be systematically underpowered
  • +Share power analysis results with stakeholders so they understand test timelines and the tradeoff between speed and reliability

Related Terms

See Statistical Power in action

KISSmetrics tracks every user across sessions and devices so you can measure what matters. Start free - no credit card required.