simonw
yesterday at 6:20 PM
I think A/B testing is one of the most expensive ways of getting feedback on a product feature.
- You have to make good decisions about what you're going to test
- You have to build the feature twice
- You have to establish a statistically robust tracking mechanism. Using a vendor helps here, but you still need to correctly integrate with them.
- You have to test both versions of the feature AND the tracking and test selection mechanisms really well, because bugs in any of those invalidate the test
- You have to run it in production for several weeks (or you won't get statistically significant results) - and ensure it doesn't overlap with other tests in a way that could bias the results
- You'd better be good at statistics. I've seen plenty of A/B test results presented in ways that did not feel statistically sound to me.
... and after all of that, my experience is that a LOT of the tests you run don't show a statistically significant result one way or the other - so all of that effort really didn't teach you much that was useful.
The problem is that talking people out of running an A/B test is really hard! No-one ever got fired for suggesting an A/B test - it feels like the "safe" option.
Want to do something much cheaper than that which results in a much higher level of information? Run usability tests. Recruit 3-5 testers and watch them use your new feature over screen sharing and talk through what they're doing. This is an order of magnitude cheaper than A/B testing and will probably teach you a whole lot more.
Something that is always a problem for me when doing A/B testing was that the C-Suite just reads the landing page of A/B testing tools where it says things like "Ridiculously easy A/B Testing" and they assume the tool is gonna do everything for them, including changing the page layout by simply adding some HTML in TagManager.
In my career I had the discussion to explain that this is not the case more times than it's appropriate.
A/B testing is possibly the most misunderstood tool in our business, and people underestimate even the effort it takes to do it wrong... let alone to do it right.
light_triad
yesterday at 7:26 PM
Some teams think they can A/B test their way to a great product. It can become a socially acceptable mechanism to avoid having opinions and reduce friction.
Steve Blank's quote about validating assumptions: "Lean was designed to inform the founders’ vision while they operated frugally at speed. It was not built as a focus group for consensus for those without deep convictions"
Is the Lean Startup Dead? (2018)
https://medium.com/@sgblank/is-the-lean-startup-dead-71e0517...
Discussed on HN at the time: https://news.ycombinator.com/item?id=17917479
eddythompson80
yesterday at 8:30 PM
Any sort of political/PR fallout in any organization can be greatly limited or eliminated if you just explain a change as an "experiment" rather than something deliberate.
"We were just running an experiment; we do lots of those. We'll stop that particular experiment. No harm no foul" is much more palatable than "We thought we'd make that change. We will revert it. Sorry about that".
With the former people think: "Those guys are always experimenting with new stuff. With experimentations comes hiccups, but experimentation is generally good"
With the later; now people would wanna know more about your decision-making process. How and why that decision was made. What were the underlying reasons? What was your end goal with such a change? Do you actually have a plan or are you just stumbling in the dark?
> You have to build the feature twice
Erm, isn't it three times, or am I missing something?
You have what you are currently doing (feature Null), feature A, and feature B.
Otherwise, you can't distinguish that the feature is what is causing the change as opposed to something else like "novelty" (favoring a new feature) or "familiarity" (favoring an old feature).
If all you have is "what you are currently doing" as "feature A" and "new thing" as "feature B", you're going to have a murderous time getting enough statistical power to get any real information.
porridgeraisin
yesterday at 7:46 PM
> You have to build the feature twice
Why though? Can't you have it dynamically look up whether the experiment is active for the current request and if so behave a certain way? And the place it looks up from can be updated however?
stetrain
yesterday at 7:53 PM
But you have to implement and test both sides of that "if" statement, both behaviors. Thus "build the feature twice"
simonw
yesterday at 7:57 PM
Right: you have to take responsibility for implementing (and testing and short-term maintaining) two branches.