Holdout Group
A randomly selected subset of users permanently excluded from a specific change, feature, or experiment, used to measure the long-term incremental impact of that change by comparing their outcomes to exposed users.
Also known as: holdback group, control holdout
Why It Matters
A/B tests measure the immediate impact of a change, but holdout groups measure the sustained impact over weeks or months. Sometimes a change produces a short-term novelty effect that fades, or a delayed benefit that only appears after users have time to adopt a new feature. Holdout groups capture these long-term dynamics.
Holdout groups also provide the only rigorous way to measure the cumulative impact of many small changes. If you ship 50 incremental improvements over a year, each one too small to measure individually, a holdout group that received none of them can tell you the total combined effect.
For strategic decisions about whether to continue investing in a feature or program, holdout groups are essential. They answer questions like: "Does our recommendation engine actually generate incremental revenue, or do users find products on their own anyway?" Without a holdout, you cannot separate the feature's contribution from organic behavior.
Industry Applications
An online grocery service maintains a 5% holdout group for their personalized recommendation engine. After 6 months, the holdout comparison shows that recommendations drive 12% incremental revenue per user, justifying continued investment in the recommendation algorithm.
A marketing automation platform keeps a 10% holdout from their new onboarding experience for 90 days. The holdout shows the new onboarding improves 90-day retention by 8%, confirming that the initial A/B test results (which showed a 10% lift) held up over the long term with only modest decay.
How to Track in KISSmetrics
Implement holdout groups by randomly assigning a small percentage of users (typically 5-10%) to a control experience and excluding them from the change being measured. Use KISSmetrics Populations to segment holdout and exposed groups, then compare key metrics (conversion, retention, revenue) between them over time.
Common Mistakes
- -Making the holdout group too large, which limits the business impact of the change being tested
- -Making the holdout group too small, which produces statistically unreliable comparisons
- -Not maintaining the holdout long enough to capture long-term effects
- -Accidentally exposing holdout users to the change through other channels or system interactions
- -Not randomizing properly, which introduces selection bias into the holdout comparison
Pro Tips
- +Use a consistent hashing function on user IDs for holdout assignment to ensure stability across sessions and platforms
- +Set a calendar reminder to review holdout results monthly and decide whether to continue or dissolve the holdout
- +Use holdout groups to validate the cumulative impact of your experimentation program annually
- +Document which users are in holdout groups so customer-facing teams can handle support requests appropriately
- +Start with a 10% holdout and reduce to 5% once you have enough data for statistical reliability
Related Terms
Hypothesis Testing
A statistical method used to determine whether observed differences in data - such as a higher conversion rate in a test variant - are likely real or could have occurred by random chance.
Incrementality Testing
An experimental approach that measures the true causal impact of a marketing activity by comparing outcomes between a group exposed to the marketing and a control group that was not, isolating the genuine lift beyond what would have happened organically.
Feature Experiment
A controlled test that measures the impact of a new product feature by exposing it to a random subset of users and comparing their behavior and outcomes against users who do not have access.
Type II Error
A false negative in hypothesis testing - failing to reject the null hypothesis and concluding that a change had no effect when it actually did produce a real improvement.
Effect Size
A quantitative measure of the magnitude of a difference between groups in an experiment, independent of sample size. It answers the question "how big is the improvement?" rather than "is there an improvement?"
See Holdout Group in action
KISSmetrics tracks every user across sessions and devices so you can measure what matters. Start free - no credit card required.