Opinions & Blogs

The Problem with A/B Testing (Stop Calling it Conversion Rate Optimisation)

Other than the work I do for freelance clients and hobby projects, my professional career has been largely in the realm of A/B testing. I was an in-house engineer in my previous employer’s optimisation department and when started at my current employer.

In both roles I joined a company writing code directly into the editor provided by the testing tool. And in both roles I found myself implementing a modern workflow which utilises all the tooling you would expect in 2023 (a bundler, linter, HMR, unit testing etc).

Since joining my current employer I have moved up to the role of lead developer. It is now my duty to create tools which enable our developers to write better code, faster. Why am I telling you all this? Simple, to highlight the fact that I know my way around the world of A/B testing.

What’s in a Name, Anyway?

Since entering the world of developing A/B tests (and tooling for A/B testers) I have found myself increasingly more bemused by the inability of people to name this industry: Web Experimentation, Experimentation Development, A/B Testing or, my personal pet peeve, Conversion Rate Optimisation, are all names I hear regularly.

The name itself isn’t an issue, it’s a name. Jog on. However, there is one name in that list above which I personally find to be highly problematic due to the mental impact it has on the overall expectation of the industry: Conversion Rate Optimisation (CRO from here on out).

Granted, I have found that the term is one which is coined by marketers and not engineers (I guess it sounds a bit more catchy or something?) however, it paints a terribly inaccurate picture of what the goals of A/B testing actually are.

Why Isn’t Our Conversion Rate Going Up!?

I often hear the dreaded CRO label being thrown around like a pair of used pants in the wash basket and, from my experience, the term alone leads to the misunderstanding of the point of A/B testing.

All too often I am witness to a person with an incorrect assumption that the work we do is akin to a crypto moon-boy/girl: “make line go up, wen Lambo?”. The reality is that our job couldn’t be further from that. The even darker reality is that there is no humanly possible way of ensuring the conversion rate of a website will even stay flat, let alone go up… indefinitely. That’s just not how it works. That’s not how anything works.

This notion highlights the problem with labelling the A/B testing industry as CRO. When you call something “Conversion Rate Optimisation” you are literally saying “We will optimise your conversion rate”. And the truth is that we are just testing shit to see what the effects are. Granted the shit we test has a hypothesis and some fancy reasoning behind it, but the fact remains. We are just testing shit.

A Real World Scenario

Take the following scenario, one which I see all too often: an A/B testing engineer has written a test to add a call to action (CTA) block on a website’s homepage; The new CTA block is to sit directly beneath the hero banner (the big massive banner which is usually at the top of a page, often with an image); When the CTA block is clicked it will direct the user to a product listing page (PLP).

The test is made, all the metrics are firing in the right places, all the code works and the split has been correctly configured. When the test is released the conversion rate on the variant is 3% lower than the control. Alarms start blaring and someone punches someone’s nan. Anarchy rules the office and stakeholders are screaming to end the test before someone lobs a cake at Fred’s pug.

Someone in a suit jumps on a video call from a business class flight cabin and laments the destruction this test caused, probably blaming the remote aspect of work while they are at it (everything is better in an office, said no-one, ever). However, through all of this hyperbole nobody mentions that the PLP to which the test links has 20% more traffic on the variant.

Nobody checks whether the people in the variant whom didn’t convert entered the site directly or via a referral from Google. Or whether the converting users in the control did the same. This is all relevant information. If this were an episode of Line of Duty you would be screaming at Steve and Kate for not doing a thorough enough investigation. This is the A/B testing equivalent of looking at a crime scene and stating “Yeah, looks like someone killed someone” before tipping your hat leaving for a brew.

The Point in All This

While the world went down the pan the second the conversion rate went red in the variant, people forgot to check the mitigating circumstances of the test’s environment. They didn’t bother to factor in the behaviour of the visitors. And they certainly didn’t evaluate the one shining beacon of light in all this: The fact a test with a CTA to a PLP sent 20% more traffic to said PLP.

It is safe to assume that users visiting a website through a direct link are most likely doing so for a reason. It is also safe to assume that they probably had the URL autocompleted by their browser’s history. This means that they were most likely arriving with the intention of doing something specific. The A/B test was never going to affect them. However, they are still accounted for in the dipped conversion rate.

The fact that they simply scrolled passed the CTA block is no indication of the CTA block’s value as this particular user group are traveling the site through their memory bank, somewhat in autopilot. They are looking for that item they saw last week and bookmarked for payday. Maybe it was out of stock today so they didn’t convert. Who knows. The point is that the CTA block didn’t alter their behaviour.

Now take the logic and apply it to two people entering the site via the homepage. They do so via a Google referral. They both click the banner and go to the PLP, click on some product pages and leave. These users do not convert but does that mean the CTA block failed? No, it was a bloody block linking people to a PLP, which it did, successfully.

Use the Right Metric

As the terrible and elaborate example above demonstrates. Conversion rate is not a blanket goal for all tests. Using the above example you would be well placed to set your A/B test’s goal as views on the PLP to which it links. You wouldn’t even be well placed using the conversion rate as the secondary goal in this circumstance. The secondary goal in this circumstance should be views of product details pages (PDPs). Specifically the PDPs for products on the target PLP. Holy acronyms, Batman.

If there is no possible way for a user to even add a product to their bag in the location of the test, it has nothing to do with the conversion rate. I personally would be hesitant to ever use the conversion rate as the goal as there is a plethora of obstacles which could affect it, none of which relate to the majority of A/B tests.

I have worked on websites where the checkout just straight-up didn’t work. Like, it just crashed. How on earth is any test gonna account for that? Granted, it is an anomaly but the point I am trying to make is that the actual checkout is where the conversion happens, everything else is just noise.

If you take care of the minor details in the user’s journey you end up with an optimised user experience which, by proxy, would increase your conversion rate. Without checking the correct metrics with every A/B test you really are just guessing as to the impact of it.

You Can’t Measure Behaviour

All of this fails to even mention the obvious: humans be humans. Without getting too philosophical, none of us wake up the same person that fell asleep. Yes, the memories we use to build up our identity are there but our mood has changed. Our feelings about an argument we had yesterday are a lot calmer now. We no longer want to buy that brick to throw through the front window of the cheating ex’s car.

This emphasises the need to focus our A/B test goals to ones which are actually tangible, measurable, repeatable and, most importantly, related to the test itself. If your test is a new checkout flow, yeah, you wanna measure the conversion rate.

However, for like 99% of all tests this is not even going to be your secondary metric. You want to measure things like add to bag rate, PDP views, PLP views, user retention etc.

If you drive more traffic to your PDP and the conversion rate is lower it doesn’t necessarily mean your CTA block on the homepage is shitty. The chances are that it has highlighted a bigger problem with your PDP’s UX. Maybe test that next.

Finally, stop calling it CRO. It’s A/B Testing.