Sample Size

Sample size is the number of users or observations included in each variant of an experiment, determining the statistical power of the test and how confidently you can detect real differences between variants.

Also known as: test population, experiment size, n-size

Why It Matters

Sample size determines whether your experiment can actually answer the question you are asking. Too small a sample and you cannot distinguish real effects from random noise - you might conclude that a change had no impact when it actually improved conversion by 5%, or worse, declare a winner based on random fluctuation.

The required sample size depends on three factors: your baseline conversion rate, the minimum detectable effect (the smallest improvement worth detecting), and your desired confidence level. A test trying to detect a 1% relative improvement needs dramatically more traffic than one looking for a 10% improvement. This is why pre-test power analysis is essential - it prevents you from wasting time on tests that cannot produce reliable results.

Insufficient sample sizes are one of the most common experimentation mistakes. Teams run tests for a week, see a promising result, and ship the change - only to discover weeks later that the improvement disappears. This is not bad luck; it is a predictable consequence of making decisions with insufficient data.

How to Calculate

Sample size is calculated using a power analysis formula that considers four inputs: baseline conversion rate, minimum detectable effect, significance level (typically 95%), and statistical power (typically 80%). For example, with a 5% baseline conversion rate and a desire to detect a 10% relative improvement (to 5.5%), you need approximately 30,000 visitors per variant. Online sample size calculators make this straightforward.

Industry Applications

E-commerce

An ecommerce site with a 3.5% conversion rate wants to detect a 5% relative improvement (to 3.675%). Power analysis shows they need 88,000 visitors per variant. At 15,000 daily visitors to the tested page, the test will take 12 days - feasible and worthwhile.

SaaS

A SaaS trial signup page converts at 8%. The team wants to detect a 15% relative improvement. Sample size calculation shows they need 7,500 visitors per variant, achievable in 10 days with their traffic volume. They run the test for 14 days to capture a full two-week cycle.

How to Track in KISSmetrics

Calculate your required sample size before any test begins. Monitor daily traffic to the tested page in KISSmetrics to estimate how long the test will need to run. If the required runtime exceeds your patience, either accept a higher minimum detectable effect or find a higher-traffic page to test on.

Common Mistakes

-Not calculating required sample size before starting a test, leading to underpowered experiments with unreliable results.
-Using overall site traffic as the sample size estimate when the test only affects a subset of visitors.
-Reducing required sample size by lowering the confidence level below 90%, which dramatically increases false positive risk.
-Assuming equal sample sizes per variant - unequal splits can be more efficient when you want to limit exposure to a risky variant.

Pro Tips

+Use an online sample size calculator before every test to determine whether the test is feasible given your traffic volume.
+If required sample sizes are too large, focus on testing bigger changes that produce larger effects, which are detectable with smaller samples.
+Consider using a 90/10 or 80/20 split (giving more traffic to control) when testing risky changes, though this increases required total sample size.
+Factor in test duration - a test that needs 60 days of traffic introduces seasonal and temporal confounds that a 14-day test avoids.
+When testing on low-traffic pages, consider testing a broader metric (like overall site conversion) rather than page-specific metrics that need larger samples.

Related Terms

Experimentation

Statistical Significance

Statistical significance is a measure of confidence that the difference observed between test variants is real and not due to random chance, typically expressed as a percentage (e.g., 95% confidence) or a p-value threshold.

Experimentation

A/B Testing

A/B testing is a controlled experiment that compares two versions of a web page, email, ad, or feature by randomly splitting traffic between them and measuring which version performs better on a defined success metric.

Experimentation

Control Group

A control group is the subset of users in an experiment who receive the existing or unchanged experience, serving as the baseline against which the performance of test variants is measured.

Experimentation

Variant

A variant (also called a treatment or challenger) is an alternative version of a page, feature, or experience being tested against the control in an experiment, incorporating the specific changes hypothesized to improve performance.

Marketing Analytics

Conversion Rate

Conversion rate is the percentage of users who complete a desired action out of the total number of users who had the opportunity to do so, serving as the primary measure of how effectively a page, campaign, or experience turns visitors into customers.

See Sample Size in action

KISSmetrics tracks every user across sessions and devices so you can measure what matters. Start free - no credit card required.

Start Free Trial Book a Demo

Sample Size

Why It Matters

How to Calculate

Industry Applications

How to Track in KISSmetrics

Common Mistakes

Pro Tips

Related Terms

Statistical Significance

A/B Testing

Control Group

Variant

Conversion Rate

Further Reading

A/B Test Sample Size: How to Calculate It and Why Most Teams Get It Wrong

A/B Testing Statistical Significance: When to Call a Winner

See Sample Size in action