Growth

What is A/B Testing?

A/B testing is a controlled experiment where two or more variants of a webpage, message, or feature are shown to randomised user cohorts simultaneously, with the winner chosen by a statistically significant lift on a defined success metric.

A/B testing replaces opinion with evidence. Done well, it is the engine of compounding conversion improvement; done badly, it produces false positives that destroy revenue. The discipline lives in three places: sample size (run long enough to reach significance, not just yesterday’s numbers), test isolation (no overlapping experiments contaminating each other), and decision rules agreed before the test runs. Treat A/B testing like a research function, not a button-pushing exercise.

What it includes

Primary success metric and minimum detectable effect defined upfront
Sample-size calculation done before the test ships
Statistical significance threshold (commonly p < 0.05) agreed in advance
Isolation from other concurrent tests on the same surface
Documented hypothesis: what we believe, why, and what would falsify it
Pre-registered decision rule: what we do with each possible outcome

How it works

Start from a falsifiable hypothesis
“If we move the trust strip above the fold, qualified consult bookings will lift 15%, because parents need a credibility signal before scrolling.” Falsifiable, measurable, time-boxed.
Size the test
Use a sample-size calculator. Account for daily traffic, baseline conversion rate, expected lift, and significance threshold. Most tests need 2–4 weeks at typical traffic.
Ship the variant cleanly
Server-side or fast client-side A/B tooling. Flicker is a confound. One variant per test, isolated to one surface, no overlapping experiments.
Wait for significance
Do not call a winner on day three. Peeking is the most common cause of false positives. Let the test reach its pre-calculated sample size.
Decide, document, ship, archive
Winner ships; loser archives. Document the result in a test log so the team can review patterns over time. Failed tests teach more than wins.

Frequently asked

When should we A/B test?

When traffic is sufficient to reach significance in 2–6 weeks. Below ~1,000 conversions/month per variant, statistical power is too low to learn anything useful. Below that, ship and learn qualitatively.

How big should the test lift be?

Plan for the minimum lift you actually care about. Tests powered to detect a 1% lift waste months for marginal gains. Most growth teams pre-commit to a 10–20% MDE on primary metrics.

Frequentist or Bayesian?

Frequentist (p-values) is the default in most tools and well-understood. Bayesian frameworks (probability of being better) communicate more intuitively to non-technical stakeholders. Both work; consistency matters more than choice.

Last reviewed: May 11, 2026Category: Growth← All terms

What is A/B Testing?

What it includes

How it works

Start from a falsifiable hypothesis

Size the test

Ship the variant cleanly

Wait for significance

Decide, document, ship, archive

Frequently asked

When should we A/B test?

How big should the test lift be?

Frequentist or Bayesian?

Codnity services that put this into practice

Related terms

Put this into practice.