Type II Error

A false negative in hypothesis testing - failing to reject the null hypothesis and concluding that a change had no effect when it actually did produce a real improvement.

Also known as: false negative, beta error

Formula

Beta = 1 - Statistical Power

Try Calculator

Why It Matters

Type II errors are the silent killers of experimentation programs. A Type I error produces a visible failure (you ship something that does not work). A Type II error produces an invisible one (you discard an improvement that would have worked). You never know what you missed.

Type II errors are far more common than most teams realize. They typically happen because tests are underpowered - too few users or too short a runtime to detect real but modest improvements. If your minimum detectable effect is 10% but the true improvement is 5%, an underpowered test will miss it every time.

The cumulative cost of Type II errors is enormous. If you run 50 tests per year and 20% of them would have shown a genuine 3-5% improvement that your test was too small to detect, those missed opportunities compound into significant lost revenue. Proper power analysis before every test is the remedy.

How to Calculate

The probability of a Type II error (beta) equals 1 minus the statistical power of the test. If your test has 80% power, the Type II error rate is 20%. Power depends on sample size, effect size, significance level, and the variance in your data. Use a power calculator to determine the required sample size to achieve acceptable Type II error rates.

Type II Error Rate Calculator

Beta = 1 - Statistical Power

Type II Error Rate (Beta)0.2000

Industry Applications

E-commerce

A specialty food retailer tests personalized product recommendations but calls the test "no impact" after two weeks with only 2,000 users per variant. Power analysis reveals the test could only detect effects above 15%. When rerun with 15,000 users per variant, the same change shows a significant 6% revenue lift.

SaaS

A developer tools company discards a simplified signup flow after a "non-significant" test. A post-mortem reveals the test had only 40% power to detect the expected 5% improvement. When rerun with proper sample sizing, the simplified flow shows a significant 4.5% improvement in trial starts.

How to Track in KISSmetrics

Prevent Type II errors by running proper power analysis before each experiment. Calculate the sample size needed to detect your minimum meaningful effect at your desired power level (typically 80%). If KISSmetrics shows insufficient traffic to reach that sample size within a reasonable timeframe, consider testing a larger effect (bolder change) or consolidating traffic to the test.

Common Mistakes

  • -Running underpowered tests that have little chance of detecting real improvements and then concluding "the change had no effect"
  • -Treating "not significant" as "no effect" instead of "insufficient evidence"
  • -Not performing pre-test power analysis, which is the primary prevention for Type II errors
  • -Ignoring the cost of missed improvements while focusing exclusively on preventing false positives

Pro Tips

  • +Always calculate required sample sizes before launching tests - if you cannot reach the needed size, reconsider the test design
  • +When a test shows a positive but non-significant trend, check whether the test was underpowered before discarding the idea
  • +Consider running tests at 80% power for standard experiments and 90% power for tests where missing a real improvement is especially costly
  • +Aggregate results from multiple small, related experiments using meta-analysis to detect effects that individual tests missed

Related Terms

See Type II Error in action

KISSmetrics tracks every user across sessions and devices so you can measure what matters. Start free - no credit card required.