A/B Testing Math: When to Call a Winner and When to Keep Running
Pre-commit to a sample size
Calculate required sample size before the test starts. Don't peek and stop early. Stopping when you like the result inflates false positives.
Minimum 2 weeks
Regardless of sample size, run at least 2 full weeks. Captures weekday/weekend seasonality. 1 week is too noisy.
Call it at 95% confidence
Standard industry threshold. Higher confidence (99%) needed for high-traffic changes where variance in wins/losses at 95% compounds.