Data-Driven Decisions: How To Leverage A/B Testing for Successful Product Optimization?

In 2020, Brian Halligan, the co-founder of Hubspot and a senior lecturer at MIT, wrote an article in MIT Sloan Management Review describing the impact of experience design on the success of products. To design such disruptive experiences, product managers must adopt a systematic and evidence-based approach to complement great ideas and creative intuitions through the culture of experimentation and data analysis.

Experimentation, a quantitative data-gathering tool, lies at the heart of the product-led approach. By combining the power of data and user insights, you can make informed decisions that shape your product strategy and drive impactful outcomes, especially in a dynamic and competitive business landscape.

Qualitative experiments, such as user interviews, usability testing, and focus groups, enable you to dive deep into the minds of your users. They uncover valuable qualitative insights, discover pain points, and help you understand the “why” behind user behavior. These experiments provide rich context and discover user sentiments and preferences, laying a solid foundation for effective product development.

On the other hand, quantitative experiments, like A/B testing, also called Split testing, and multivariate testing, allow you to validate and quantify the impact of specific changes or variations on user behavior. A/B testing involves comparing two or more versions of a product or feature to determine which performs better. In contrast, multivariate testing allows you to test multiple variations of different elements simultaneously. These experiments provide statistically significant data, revealing which features, designs, or user experiences resonate with your audience.

Table of Contents

What is A/B Testing?

A/B testing is a form of product management testing designed to help product managers make data-driven decisions about their products. In its simplest form, A/B testing involves dividing users into two groups and presenting each group with a different product version. The version that performs better regarding business and product goals can be considered the more effective.

Getting Started With A/B Testing

Any effective experiment should start with establishing the desired outcome. The outcomes can be increasing conversion or user engagement, reduction in friction in the user interaction, or achieving some business objectives for the organization or customers.

Once your desired outcome is clear, you will start generating ideas to achieve those outcomes. For example, if your objective is to reduce the cart abandonment rate, you will start generating ideas to reduce the friction between putting items in the cart and checking them out through payment. Your ideas may involve any of the following (and beyond);

1. Optimizing the entire workflow, such as reducing the steps to complete the action. (like what Amazon did with their One-Click Purchase).

2. Adding new features or additional ways to complete the action; for example, adding an option to purchase a product in installments.

3. Optimizing the design elements and aesthetics, including changing colors, sizes, texts, and placements of elements. The story of Marissa Mayer testing 41 shades of blue is well-known.

Any other such changes that will help us achieve the outcome we want.

Executing The A/B Test

Let’s expand the above high-level steps in the diagram into a more detailed discussion.

Defining the outcome

Any promising experiment starts with the answer to the question, “Why?” Why do you want to conduct this experiment? What is the outcome you want to achieve? For example, reduce the time the users take to perform a specific operation. Or, you want to increase your Daily Active Users (DAU).

The outcome definition drives possible solutions. The outcome definition and solution determine what kind of experiment you should run. The A/B test may not even be the correct test in some scenarios.

Setting up the hypothesis

The expectation that by making the intended changes, we will be able to achieve the desired product outcomes is our hypothesis.

We use A/B testing or multivariate testing to prove or disprove this hypothesis.

The hypothesis is your educated guess about the relationship between the variables.

Control, Independent & Dependent Variable

While with most experiments, you need not worry about explicitly defining these variables. However, you should still know what they are.

Control Variable

The control variable is the variable against which you will compare the results of your experiments and determine the validity of your hypothesis. Your A/B test is version A, the one you have right now, against which you will decide whether or not version B is more effective in achieving the desired outcome.

Independent Variable

The variable in the experiment that doesn’t change in response to any events during the experiment is the independent variable. Option B is the independent variable.

Dependent Variable

The dependent variable is the one that changes with the change in the independent variables.

Let us understand these with an example.

Your product has an optional form that collects certain information from users, which you want to use to analyze further and fine-tune some of the features. Against this information, you provide them with a free ebook. You notice that despite the ebook having helpful information for your users, only some fill up the form.

The team brainstorms and thinks the amount of information you collect may stop users from filling it up. The team believes it is possible to divide the form into two parts. The first part will ask for the most valuable information for your analysis and provide access to the ebook upon completion. The second part will contain helpful but not-so-critical information that the users can fill out voluntarily.

You hypothesize that the redesigned form will help you collect the required information from significantly more than the number of users providing the information with the current form.

You design the A/B experiment, where half of your users will be presented with the existing form. This user group becomes your control group; the current form is your control variable.

Your new form becomes your independent variable. You will show this form to the other half of the users.

The number of users filling up the form is your dependent variable. This number depends on which version of the form you show them.

Population and Sample Size

The population is the entire user base of your product. If you have 100,000 users using your product, your population is 100,000.

Knowing your population allows you to determine your sample size. There are many sample size calculators available online that you can use to determine how many people should participate in your experiment to have sufficiently good results. Here is the link to one such sample size calculator. This one is by SurveyMonkey.

SurveyMonkey’s Sample Size Calculator

If you want to experiment with all your users, your population and the sample size will be the same.

Cohort Definition and Randomization

How you define cohorts may determine the effectiveness of the inferences you draw from the experiment. Let’s see a case with AirBnB.

When Airbnb first launched, it displayed listings like this:

As the site grew in popularity, Airbnb redesigned the search results page like this:

The natural expectation was that the new version would perform better, given the better experience it provides. However, the results were contrary to this expectation.

While the data suggested that the first version was better than the second, the AirBnB team looked deeper into the results. They discovered that the negative results were primarily due to a broken experience on Internet Explorer (the browser from Microsoft).

The team fixed the experience, and the results proved that the new design worked better.

The simplest way to define your cohorts is to randomly divide your sample in half. However, such a loose cohort definition may lead to problems like the AirBnB one.

One of the critical criteria to define your cohorts is to ensure a similar experience. You use relevant criteria like age, gender, geography or other factors like characteristics and behavior.

Think about what you are trying to measure and what factors can affect these measurements. Then, define your cohorts accordingly.

This strategy may sound like Cohort Analysis. However, the difference is that with Cohort Analysis, you provide one single variation to all users in all cohorts and analyze their behavior relative to other cohorts against that single variation. With A/B testing, you will still provide two variations.

Once your cohorts are defined, use randomization to render your variations inside these cohorts.

We eliminate potential biases and confounding factors that could influence the results by randomizing. Randomization ensures that the characteristics and behaviors of users are evenly distributed across both groups, making the comparison valid and reliable.

Various techniques, such as random assignment algorithms or randomized sampling methods, can be used to implement randomization. These techniques ensure that each user has an equal chance of being assigned to either group, creating a level playing field for accurate analysis and interpretation of the experiment’s outcomes.

Multiple cohorts

So far, we have spoken only about two variants. However, it is possible to use A/B testing with multiple variants and cohorts. This approach is A/B/n testing when n represents the number of cohorts or variants. For example, Marisa Mayer used 41 cohorts for 41 colors under the test.

Interpreting Results

Once you get the results from the experiment back, how do you ascertain that there is indeed an effect of one or the other variant on your hypothesis?

For instance, using the form example above, you observe that the number of users filling it up has increased with the new form. However, how do you ascertain that this increase was related to the new form, or was it the effect of any other variable, like the time, a general increase in the traffic to your product, or any other factor? For example, along with launching the two variants of the form, you also ran a social media campaign, which brought additional users to your product, increasing the form fill-ups.

Statistical significance refers to the likelihood that the observed results in a study are not due to random chance or unrelated factors but instead reflect a true relationship or effect.

There are two other concepts that you should know.

The first concept is that of correlation. While statistical significance determines whether there is a relation between the dependent and independent variables, correlation measures the strength and direction of a relationship between two variables, indicating how they tend to vary.

A strong correlation does not, however, mean causation. In the example above, while the new form increases the number of fill-ups, the form is not why people provide the information. It is not the cause of user actions. Just because two variables are correlated does not necessarily mean one causes the other. Causation requires additional evidence and experimental design to establish a cause-and-effect relationship.

What is Multivariate Testing?

Multivariate Testing is a form of testing that allows businesses to test multiple variables at the same time. This approach differs from A/B testing, which only allows testing two variations of elements on a page. Multivariate Testing allows various design combinations and variable changes to be tested simultaneously, making it easy to identify which element on one page plays the most significant role in achieving that page’s objective.

Let’s consider the example of the new form design we discussed earlier. In addition to the length, the team felt that other variables like the color of the Call-To-Action (CTA), the content, and the typography might also be impacting the rate of fill-ups. To keep things simple, the team first decided to change the CTA. The suggested changes include a change of color, making the CTA a little bigger, and changing the text of the CTA to indicate the outcome that the users expect rather than the action they should take. (e.g., Instead of “Click Here,” the team might use “Get the ebook”).

With this change, the team creates four versions of the same page.

Version 1: The original form with the original CTA

Version 2: The original form with the new CTA

Version 3: The new form with the original CTA

Version 4: The new form with the new CTA

Once the variations are created, the rest of the steps are similar. The principles for determining statistical significance and correlation remain the same, while specific calculations and statistical techniques may vary. The detailed discussion of the statistical aspects of the A/B and multivariate tests is out of the scope of this article.

A/B Testing vs. Multivariate Testing

The kind of variations you want to test primarily determines the choice of the testing method. Typically, you will use an A/B test to change two significant variations, while you can use multivariate tests to test multiple, more subtle changes.

While multivariate testing allows you to test multiple variants simultaneously, saving time and resources, given the number of variations to test, it requires a larger sample size. There will be more cohorts, and each cohort should have a reasonable number of users for the multivariate tests to be meaningful. There are also complexities of calculations involved in multivariate tests, which may take more time and effort. A/B tests are generally easier and faster to conduct than multivariate tests.

Finally

In today’s competitive digital landscape, A/B and multivariate testing are essential tools for product managers to understand how users interact with products. Through experimentation, businesses can effectively tweak their products or services to maximize user engagement and improve performance. Both A/B and multivariate testing can identify areas of opportunity in your products. Thankfully, there are many tools out there that make conducting these experiments easy. You need a bias to action to put those in place.

ACCESS THE A/B TEST DESIGN TEMPLATE NOW

I am leaving you with a humorous yet oft-experienced take on data-driven decision-making.