Confidence Level
The percentage probability that a confidence interval calculated from a given experiment will contain the true population parameter, commonly set at 90%, 95%, or 99% in A/B testing.
Also known as: significance level complement, coverage probability
Why It Matters
The confidence level is a dial you set before running an experiment that controls the tradeoff between certainty and speed. A higher confidence level (99%) means you need more data but are less likely to act on false positives. A lower confidence level (90%) lets you reach conclusions faster but accepts more risk.
Choosing the right confidence level depends on the cost of being wrong. For a pricing change that is difficult to reverse and affects all customers, use 99%. For a minor UI tweak on a secondary page, 90% might be sufficient. The 95% default is a reasonable middle ground, but it is a convention, not a law of nature.
The confidence level directly affects your required sample size and test duration. Moving from 90% to 95% confidence increases the required sample by about 30%. Moving from 95% to 99% roughly doubles it. For sites with limited traffic, this difference can mean weeks of additional test runtime.
Industry Applications
A luxury retailer uses 99% confidence for pricing experiments because incorrect prices damage brand perception and are noticed by customers immediately. For product page layout tests, they use 90% confidence to iterate faster.
A fintech company requires 99% confidence for any experiment that affects the transaction flow (because errors mean lost money) but uses 95% confidence for onboarding experiments where the cost of a false positive is lower.
How to Track in KISSmetrics
Set your desired confidence level in KISSmetrics before launching an experiment. The platform will indicate when the result has reached your chosen confidence threshold. Use higher levels for irreversible changes and lower levels for easily reversible experiments.
Common Mistakes
- -Using 95% confidence for every test without considering the stakes - some tests deserve 99%, others are fine at 90%
- -Changing the confidence level after seeing results to make a borderline test appear significant
- -Confusing confidence level with the probability that your variant is actually better
- -Not accounting for the impact of confidence level on required test duration when planning experiments
Pro Tips
- +Document your confidence level choice and rationale in your experiment plan before launching
- +Use 90% confidence for exploratory tests and iteration, 95% for important features, and 99% for pricing or high-revenue-impact changes
- +Remember that confidence level and power together determine your sample size requirements - plan both in advance
- +If stakeholders push for faster results, explain the tradeoff: lower confidence levels mean higher risk of false positives, not smaller experiments
Related Terms
Confidence Interval
A range of values that likely contains the true effect of a change, calculated from experiment data. A 95% confidence interval means that if the experiment were repeated many times, 95% of the calculated intervals would contain the true value.
P-Value
The probability of observing a result as extreme as the one measured, assuming the null hypothesis is true. A small p-value (typically below 0.05) suggests the observed difference is unlikely due to chance alone.
Type I Error
A false positive in hypothesis testing - incorrectly rejecting the null hypothesis and concluding that a change had a real effect when the observed difference was actually due to random chance.
Statistical Power
The probability that a test will correctly detect a real effect when one exists, typically set at 80% as a minimum standard. Higher power means a lower chance of missing genuine improvements.
Hypothesis Testing
A statistical method used to determine whether observed differences in data - such as a higher conversion rate in a test variant - are likely real or could have occurred by random chance.
See Confidence Level in action
KISSmetrics tracks every user across sessions and devices so you can measure what matters. Start free - no credit card required.