What's the difference between incrementality testing and attribution?

Attribution credits a sale to whichever channel touched it last (or first, or whatever model you use). Incrementality measures whether the sale would have happened without that channel at all. They're not the same thing. A campaign can show high attributed ROAS while contributing zero incremental revenue if it's just retargeting customers who would have bought anyway.

Do I really need to run incrementality tests?

If you're spending under $30K/month total, attribution data is fine for most decisions. Above $50K/month, you should run at least one incrementality test per quarter. The bigger the spend, the bigger the attribution lies — testing forces honesty.

What's the minimum budget for a meaningful geo test?

$10,000-15,000 of ad spend allocated to the test, run for 3-4 weeks. Lower than that, you won't get statistical significance unless your effect size is huge. Plan for 80% of normal spend in test geos and 0% in holdout geos for a clean comparison.

How do I pick which states or regions to use as the holdout?

Match similar markets. Pair states with similar baseline conversion rates, demographics, and your existing customer mix. Common matched-market pairings: Texas vs Florida, Ohio vs Pennsylvania, Washington vs Oregon. Run the test in one of each pair, hold the other out.

What if I see no lift in test markets versus holdout?

It usually means the channel you tested is not driving incremental volume — it's retargeting or attributing customers who would have bought anyway. Don't immediately turn off the channel. Diagnose: it might still be helping retention or LTV. But it shouldn't get credit for new acquisition that wasn't going to happen.

BLOG/PAID ADS

MAY 10, 2026 // UPDATED MAY 10, 2026

Incrementality Testing for Shopify: Geo-Holdout Without an Analyst

How to run geo-lift incrementality tests on Shopify ad spend without a data team. The simple holdout methodology that tells you if your channels are actually driving incremental revenue.

AUTHOR

AT

AdsX Team

PAID MEDIA SPECIALISTS

READ TIME

7 MIN

What incrementality actually measures

A simple thought experiment: if you turned off your Meta retargeting campaign tomorrow, how much revenue would you actually lose?

Attribution data says you'd lose all the revenue Meta retargeting is currently credited with. Reality usually says you'd lose 20-40% of it — the rest were customers who would have bought through email, organic, or direct anyway.

The gap between "attributed revenue" and "incremental revenue" is what incrementality testing measures. The gap is often huge.

Why geo-holdouts work

A geo-holdout test pauses ads in some geographic regions while running them in matched regions. By comparing revenue in test versus holdout regions over a fixed period, you can isolate the channel's actual incremental impact — without needing complex statistical models.

The geo-test method works because:

Geography is independent of marketing decisions
Matched markets have similar baseline conversion behavior
The intervention (ads on/off by geo) is clean and easy to execute
The results are interpretable without statistical training

The methodology isn't perfect, but for the cost ($10-20K of test spend) and complexity (low), it gives Shopify operators a useful read on channel impact.

Setting up a geo-holdout test

Step 1: Pick the channel to test

Test one channel at a time. Pick the channel whose incremental impact you most doubt. Common starting points: Meta retargeting, branded search, or a recently-added channel like AppLovin or Pinterest.

Step 2: Match your markets

Identify pairs of US states (or DMAs) with similar:

Conversion rates from your store data
Demographic profiles
Baseline daily order volume
Your existing customer mix

Common matched pairs we've used:

Texas / Florida (large, similar buying patterns)
Ohio / Pennsylvania (similar Midwest demos)
Washington / Oregon (similar West Coast demos)
North Carolina / Georgia (similar Southeast demos)

For smaller test budgets, you might match smaller states or DMAs (designated market areas — Nielsen-defined regions).

Step 3: Decide test design

Two common designs:

Holdout design. Run ads as normal in one set of states, completely turn them off in matched states. Cleanest signal. Requires real revenue at risk.

Spend-difference design. Reduce spend by 50% or 75% in one set, hold normal in others. Less revenue at risk, but harder to interpret.

For first-time testers, we recommend the full holdout design even though it's more aggressive. The signal is clearer and the test takes less time.

Step 4: Set the test duration

Three to four weeks is the minimum for a meaningful read. Too short and you'll see noise. Longer than 4 weeks and you start running into attribution decay and seasonal shifts that muddy the data.

Step 5: Configure the campaign restrictions

In your ad platform, restrict the channel campaigns to only run in your test geos. Pull all spend from holdout geos. Make sure organic content (email, SMS, social posts) continues running everywhere — you only want the paid channel turned off.

Step 6: Track outcomes

You're measuring total revenue (not attributed revenue) by geography. In Shopify, segment orders by shipping state during the test period. Compare:

Total revenue in test geos during test period
Total revenue in holdout geos during test period
Both compared to a baseline period (4 weeks before the test)

Calculating the lift

The simple formula:

Lift = (Revenue in test geos / Baseline test geos) - (Revenue in holdout geos / Baseline holdout geos)

If your test geos saw revenue grow by 10% while your holdout geos grew by 2%, the channel's incremental lift is roughly 8%.

Apply that 8% to your total revenue for the test period to get the dollar value of the channel's incremental contribution. Compare to the spend on that channel during the test.

If the channel was credited with $50K of attributed revenue but only $20K of incremental revenue, you've learned something important: the attribution is significantly overstating the channel's true value.

Common test design mistakes

Forgetting non-channel marketing. If your email program is sending different content to different states (rare, but possible), it'll contaminate the test.

Running during a sale or promotion. Either your test or your sale will get muddled. Schedule tests during steady-state periods.

Picking poorly matched geos. California and Mississippi aren't matched markets even if your audience size is similar. Make sure baseline conversion rates and demographics line up.

Cutting too short. Two weeks of test data is rarely enough. Hold to 3-4 weeks.

Testing multiple channels at once. You won't be able to attribute the lift difference to any specific channel. Test one at a time.

Not preserving the original campaign settings. When you turn the test campaigns back on after the holdout period, the algorithm has lost some learning. Plan for a 3-7 day re-stabilization.

What to do with the results

Three common scenarios:

Scenario A: Lift exceeds attributed revenue. The channel is even more valuable than your dashboard says. Increase spend.

Scenario B: Lift roughly matches attributed revenue. Your attribution is reasonably accurate for this channel. Continue current spend.

Scenario C: Lift is significantly below attributed revenue. The channel is over-claiming credit. Consider:

Reducing spend on this channel and reallocating
Restructuring the channel (different audience focus, different campaign types)
Continuing it but discounting the attributed ROAS in your decision-making

Don't immediately kill a channel that fails an incrementality test. Sometimes the test reveals that the channel is good at retention or LTV but not new customer acquisition. That's still a valuable function — just not what you thought it was doing.

A real example

A pet supplement client we worked with had Meta retargeting credited with $80K/month of attributed revenue at 5.2x ROAS. We ran a geo-holdout test pausing Meta retargeting in 5 matched states for 3 weeks.

Result: revenue in holdout states dropped 3.5%. Test states held flat. Applied to total revenue, the incremental contribution of Meta retargeting was roughly $28K/month, not $80K.

Outcome: we reduced retargeting spend by 50%, reallocated to creative testing and new prospecting audiences. Six weeks later, blended ROAS was up 18% with similar total spend. The retargeting hadn't been worthless — but it was eating budget that worked harder elsewhere.

How often to run tests

Quarterly cadence is the minimum we recommend for accounts at $50K+/month spend. Test rotates through channels:

Q1: Meta retargeting incrementality
Q2: Branded search incrementality
Q3: A new channel incrementality (Pinterest, TikTok, etc.)
Q4: Top-of-funnel awareness incrementality

After a year, you have a much clearer picture of which channels actually move revenue versus which are decorating your dashboards.

What to do this week

Pick the channel you're most uncertain about. Plan a 3-week geo-holdout test for next month. Identify your matched markets, calculate the test spend exposure, and put it on the calendar.

For more on measurement, see our MMM vs MTA vs GA4 attribution post, first-party data strategy, and why ROAS down but revenue up explained.

SHARE ON X

← BACK TO BLOG

AI Visibility for Shopify Free AI Visibility Audit AI Visibility for E-commerce AI Visibility Glossary Our AI Advertising Services