Confidence Interval
A range of values that likely contains the true effect of a change, calculated from experiment data. A 95% confidence interval means that if the experiment were repeated many times, 95% of the calculated intervals would contain the true value.
Also known as: CI, error margin
Why It Matters
While p-values give you a yes/no answer about significance, confidence intervals tell you the likely range of the true effect. This is far more useful for decision-making. Knowing that your new checkout flow improves conversion "somewhere between 2% and 8%" is much more actionable than just knowing "it is significant."
Confidence intervals reveal the precision of your estimate. A narrow interval (3% to 5% improvement) means you have a reliable estimate. A wide interval (-1% to 15% improvement) means you have high uncertainty despite potentially having a significant p-value. Width depends primarily on sample size - more data narrows the interval.
Confidence intervals also make it easier to assess practical significance. If the entire interval is above your minimum meaningful improvement (say, 2%), you can be confident the change is worth shipping. If the interval includes values below your threshold, the true effect might be too small to matter even though it is statistically detectable.
How to Calculate
A confidence interval is calculated as the observed effect plus or minus a margin of error. The margin of error equals the critical value (1.96 for 95% confidence) multiplied by the standard error of the estimate. For conversion rate experiments, the standard error depends on the observed proportions and sample sizes of both groups.
Industry Applications
A home furnishing retailer tests a new product recommendation widget. The result shows an average revenue lift of $2.50 per session with a 95% confidence interval of [$1.80, $3.20]. Since the entire interval is positive and above their $1.00 minimum threshold, they roll out with high confidence.
A productivity app tests a new trial onboarding flow. The conversion lift is 4% with a 95% CI of [-1%, 9%]. Despite the positive point estimate, the interval includes zero and negative values, so the team decides to iterate on the variant rather than ship it.
How to Track in KISSmetrics
KISSmetrics experiment reports include confidence intervals alongside point estimates and p-values. When evaluating test results, focus on the confidence interval rather than just the point estimate. A result with a narrow confidence interval entirely above zero gives you much more certainty than a large point estimate with a wide interval that overlaps zero.
Common Mistakes
- -Ignoring confidence intervals and focusing only on the point estimate of the effect
- -Misinterpreting "95% confidence interval" as "95% probability the true value is in this range" - it refers to the procedure, not this specific interval
- -Not considering whether the confidence interval is practically meaningful, even when it excludes zero
- -Using default confidence levels (95%) without considering whether 90% or 99% is more appropriate for the decision at hand
Pro Tips
- +Report confidence intervals in executive summaries instead of just point estimates to convey the uncertainty in your results
- +If your confidence interval is too wide, increase sample size rather than lowering your confidence level
- +Compare the confidence interval against your minimum detectable effect to determine if the test was adequately powered
- +Use confidence intervals to set expectations: "we expect this change to improve conversion by 3-7%"
Related Terms
Confidence Level
The percentage probability that a confidence interval calculated from a given experiment will contain the true population parameter, commonly set at 90%, 95%, or 99% in A/B testing.
P-Value
The probability of observing a result as extreme as the one measured, assuming the null hypothesis is true. A small p-value (typically below 0.05) suggests the observed difference is unlikely due to chance alone.
Effect Size
A quantitative measure of the magnitude of a difference between groups in an experiment, independent of sample size. It answers the question "how big is the improvement?" rather than "is there an improvement?"
Minimum Detectable Effect
The smallest difference between control and variant that a test is designed to reliably detect, given its sample size, significance level, and desired statistical power.
Hypothesis Testing
A statistical method used to determine whether observed differences in data - such as a higher conversion rate in a test variant - are likely real or could have occurred by random chance.
See Confidence Interval in action
KISSmetrics tracks every user across sessions and devices so you can measure what matters. Start free - no credit card required.