The sample size problem
Most tests stop at 'statistically significant' without enough volume to generalize. A 2000-session test tells you about 2000 sessions, not the whole audience.
The isolated-variable problem
Testing five things at once and declaring a winner teaches nothing — you don't know which change worked. One variable per test, always.
The documentation problem
Tests happen, winners ship, nobody documents why. Six months later the team repeats the same test with a new junior running it. Documentation is the knowledge moat.
Killing bad tests
Ship the confident directional winner. Don't wait for statistical significance on low-volume tests. Action beats certainty on small samples.