Continuous Loops

A Continuous Loop is an always-on champion-challenger optimization. You pick a page and the element types to test (Headlines and Copy in v1; Images/Visuals and Layout are coming soon), launch once, and abTestBot runs an endless cycle: the current champion competes against an AI-authored challenger, the round resolves with a defensible Bayesian winner, the winner becomes the next champion, the AI authors a fresh challenger, and the loop continues — round after round — until you pause it.

How the loop protects you

The loop is deliberately conservative. Every round has to clear all of these gates before a winner is promoted, and the loop as a whole watches itself for drift and burnout.

7-day minimum per round. Every round runs at least a full week. This wipes out day-of-week effects (B2B traffic on Monday looks nothing like Saturday traffic) and protects against the novelty effect, where a fresh variant gets a temporary lift that fades within the first 1–3 weeks.
≥500 samples per arm floor. Each variant has to collect at least 500 sessions before a winner can be declared. For low-baseline pages (e.g. < 1% conversion), the floor is auto-raised so the result has enough statistical power to be meaningful.
95% probability-to-win threshold. A challenger only gets promoted when the Bayesian model says there's at least a 95% chance it's actually better than the champion — not just numerically ahead.
Lower 95% credible-interval bound must be > 0. The probability-to-win can be high while the actual lift is tiny or even negative at the bottom of the interval. We require the lower bound of the 95% credible interval on the lift to be strictly positive — the effect must be positive, not just probable.
Every 8th round is a regression check. The loop re-tests the previous champion against the current one. Wins from novelty effects, seasonal blips, or measurement quirks get caught and reverted here. If the "winner" can't beat its own predecessor on a fresh run, the loop unwinds the false promotion.
Alpha-investing / online FDR control across rounds. Each round spends a small slice of a lifetime false-discovery-rate budget. The budget is sized so the loop's lifetime FDR stays at or below 10% — even after dozens of rounds, the share of "winners" that are actually noise is bounded.
3 consecutive inconclusive rounds → auto-pause + check-in email. If three rounds in a row fail to find a winner, the loop pauses itself and emails you. This is the loop telling you it might not have a strong optimization signal to chase on this page right now — so your credits aren't quietly burned while it spins.

FAQ

Why didn't my winner promote on day 3? (Why 7 days?!)

There are four science-based reasons the 7-day floor is doing real work — plus an escape hatch our platform applies when the win is overwhelming.

1. Day-of-week seasonality (this is the big one). Site traffic is rarely uniform across a week. The visitors who land on your page Tuesday morning differ from Saturday evening — different intent, different traffic sources, different conversion baselines. If your test resolves on Wednesday with 99% prob_to_win, that signal is built from 3 days of Tue/Wed/Thu visitors. The variant might genuinely lose on weekend audiences. Forcing ≥7 days averages across a full weekly cycle so the win is robust to which days you happened to be running. This is the single most cited reason in CRO methodology — at least one full business cycle.

2. Novelty effects. Returning users see "something different" and engage at an elevated rate during the first few exposures. This honeymoon decays over ~3–10 days as the change becomes the new normal. Stopping at day 2 bakes in the honeymoon and overestimates the true steady-state lift. This bites you most on visible changes (hero image swaps, layout, new CTAs) and least on subtle copy tweaks.

3. Sample composition drift. Early visitors in a test window are systematically different from later ones — they're disproportionately your most loyal/engaged segment (push subscribers, email openers, direct-traffic regulars). The mix of new-vs-returning, paid-vs-organic, mobile-vs-desktop all shifts over the first few days as the campaign mix rotates and SEO crawls update. Bayesian models assume IID samples — independent and identically distributed, meaning every visitor in the test should be drawn from the same underlying population, independently of when they arrived. They aren't IID in the first 72 hours.

4. Peeking / sequential testing nuance. Bayesian methods are genuinely more peeking-resistant than frequentist ones — you can look at prob_to_win whenever you want without inflating your false positive rate the way you would with p-values. But "auto-stop when threshold crossed" is structurally different from "peek but don't act." Auto-stopping at the first crossing of 95% has empirically been shown to inflate the effective false positive rate, especially with tight credible intervals on small samples. The 7-day rule is partial protection.

When the 7-day rule does NOT apply. The 7-day floor is overkill when the win is overwhelming. Our platform is page- and business-aware, so it recognises when the four reasons above genuinely don't matter for a given round — and settles early to stop wasting traffic on the losing variant. We override the 7-day floor and settle as soon as all of the following are true:

We're 99%+ confident the new version wins (not just the standard 95%).
The new version is converting at least 20% better than the original.
Even the pessimistic estimate still shows the new version winning by 10%+ (the credible interval is tight enough that day-of-week noise can't realistically flip the sign).
We have ≥500 visitors per version (the standard sample gate still applies).
The round has run for at least 2 calendar days (so we never settle on a single day's traffic).

When all five conditions clear, running another 4–5 days has almost no statistical value — and a real cost, because half your visitors are being served the loser variant in the meantime.

Why did you re-test an old variant?

That was a regression check. Every 8th round, the loop deliberately re-runs the previous champion against the current champion. The point is to catch winners that won for the wrong reasons — novelty, a one-time external event, an instrumentation glitch — by replaying the head-to-head on fresh traffic. If the current champion can't defend its title, the loop reverts to the older variant and records the false promotion. It's a built-in sanity check, not a regression.

Why did my loop pause?

Because three rounds in a row came back inconclusive — meaning none of them produced enough evidence to confidently promote a challenger within the round's time/sample budget. That's often a sign the page doesn't have a strong optimization signal right now (audience is already converting well, traffic is too low for the effect size we'd need to detect, or the element types selected aren't where the real lift lives). The loop pauses to flag this rather than silently burn through more credits. You can resume it any time from the loop detail page — nothing is lost.

Why aren't Images / Layout available?

Coming soon. Continuous Loops v1 ships with Headlines and Copy because those are the element types where AI-authored challengers are most reliable today and where the lift-per-round signal is cleanest. Image and layout variants need a different generation pipeline (asset selection / regenerative DOM rewrites) — we're building it as a follow-up.

How is this different from a multi-armed bandit?

Bandits dynamically shift traffic toward the arm that looks like it's winning, which maximizes short-term reward but sacrifices the clean causal claim you get from a balanced split. After a bandit run, you can't cleanly say "variant B beat variant A by X%" — the allocation kept changing while the result was forming.

Continuous Loops use a champion-challenger split instead: every round is a clean 50/50 between exactly two variants. When a round resolves, you have a defensible "X beat Y by Z%, P > 0.95, CI excludes zero" claim that you can put in a report, defend to a stakeholder, or feed back into the next round's prompt. We trade a small amount of short-term efficiency for results you can actually trust and explain.

What does it cost?

Creating a loop: 15 credits.
Each round transition (promote / hold / inconclusive resolution + next challenger authored): 10 credits.
Enterprise plan: unmetered — no credits are deducted.
Free plan: Continuous Loops are not available. Any paid plan (Starter / Growth / Scale / Enterprise) can launch loops.