Hypothesis Testing

A statistical method used to determine whether observed differences in data - such as a higher conversion rate in a test variant - are likely real or could have occurred by random chance.

Also known as: significance testing, statistical hypothesis testing

Why It Matters

Without hypothesis testing, you cannot distinguish between a genuine improvement and a lucky streak. If your new checkout page shows a 5% higher conversion rate this week, hypothesis testing tells you whether that difference is statistically meaningful or within the range of normal random variation.

Hypothesis testing protects organizations from costly false positives. Rolling out a change based on random noise can hurt performance, waste engineering time, and erode trust in the experimentation process. Rigorous testing ensures you only ship changes that are genuinely better.

The discipline of forming and testing hypotheses also improves how teams think about optimization. Instead of "let us try a bigger button," a proper hypothesis states: "We believe that making the CTA button more prominent will increase checkout completion by at least 3% because session recordings show users struggle to find the current button." This forces clarity about the expected mechanism and magnitude of improvement.

Industry Applications

E-commerce

A shoe retailer hypothesizes that adding customer review photos to product pages will increase add-to-cart rates. They run a controlled test with proper sample sizing and find a statistically significant 8% improvement, validating the hypothesis and justifying the feature investment.

SaaS

A B2B platform hypothesizes that removing the credit card requirement from trial signup will increase trial starts by 30%+. The test shows a 45% increase in trial starts but a 20% decrease in trial-to-paid conversion. The hypothesis test on overall revenue shows no significant difference, preventing a premature rollout.

How to Track in KISSmetrics

When running experiments through KISSmetrics, define your hypothesis before the test begins. Use KISSmetrics A/B test reports to monitor both the observed difference and the statistical significance. Wait until the required sample size is reached before drawing conclusions. The Metrics dashboard lets you track experiment results alongside other key metrics to monitor for unintended side effects.

Common Mistakes

  • -Stopping a test as soon as the result looks significant - this inflates false positive rates dramatically
  • -Not defining success criteria and sample size requirements before the test starts
  • -Running too many simultaneous tests without adjusting for multiple comparisons
  • -Ignoring practical significance - a statistically significant 0.1% improvement may not be worth the engineering cost to maintain
  • -Testing without a clear hypothesis, which makes it impossible to learn from results whether they are positive, negative, or neutral

Pro Tips

  • +Always calculate the required sample size before launching a test using a power analysis calculator
  • +Pre-register your hypothesis, primary metric, and test duration to prevent post-hoc rationalization of results
  • +Run experiments for full business cycles (at least one full week) to account for day-of-week effects
  • +Track guardrail metrics (metrics that should not get worse) alongside your primary success metric
  • +Document every experiment with its hypothesis, results, and learnings in a shared experiment log

Related Terms

See Hypothesis Testing in action

KISSmetrics tracks every user across sessions and devices so you can measure what matters. Start free - no credit card required.