The following two tabs change content below.

Visualizing test data vs. point-in-time measurements - Fusion

Our website uses cookies to give you the best possible browsing experience whilst you’re here. If you continue without changing your settings, we’ll assume that you are happy to receive all cookies on our website. However, you can change your cookie settings at any time. To read more about cookies and how to manage them please take a look through our cookie policy.


Visualizing test data vs. point-in-time measurements

The old saying goes, “A picture is worth a thousand words”. This can certainly be said for optimization test results when you have the opportunity to view major KPI’s over time in a graph, rather than just calculating a lift and significance at a given point-in-time. We’ll use a standard A/B test scenario as an example to illustrate the importance of observing data in a graphical format to determine trends and look for learnings.

There are a number of points to consider when evaluating a test’s results in a visual format such as a line graph:

Directional trends versus static/random trends

Do you see a clear difference between the performance of your test versus control? Your evaluation may not show significance, but if the delta between test and control is increasing or decreasing over time, your significance will also change with time. If your results seem random day to day (up one day and down the next) you may have a lot of variability between your groups and you may need more time to run the test (see Fig. A). There are several tools available online that can help you determine the length of time and size of your populations for a given test, depending on the lift you want to measure with significance. At the end of the day, you may just not be moving the needle enough with your test idea to capture the impact in your given test time-frame.

Fig. A


If your data seems skewed in one direction that doesn’t make sense, take a look at the daily data points to see if there are any outliers that have been introduced into the evaluation. Outliers are data points that fall outside of normal variation, and if you determine they are legitimate you’ll need to decide whether to keep them or discard them from the analysis.  Are they likely to show up again? Maybe you had a group of purchases in either your test or control that are not representative of ‘normal’ sales. Maybe bad weather hurt sales at a location that makes up the majority of your test or control group (see Fig. B).  While they may be legitimate data points, if you can measure the impact with and without them you can determine their influence on the overall results and make a business decision on whether to keep them in the analysis.

Fig. B

Immediate impact versus ramp-up

Testing anything that requires time before expecting an impact – like customers adjusting to a new booking flow, word to spread on a new product, training to get up to speed with new procedures (think sales training) – can have an impact on test results that might not be apparent initially. If you anticipate a ramp-up before expecting to see positive results in your test group, you should monitor the trend over time and determine if metrics are continuing to improve or have plateaued at some point. You may even observe an initial downward trend if the changes being tested are dramatic enough to disrupt normal sales initially. If we evaluated the test on 2/7, (noted by the green line in Fig.C), we would most likely determine the test was under performing and stop it. When you view the graph, you can see that there was an upward trend that started around 2/1 indicating the test was beginning to have a positive effect.  Therefore, this additional information might lead you to let the test continue, ultimately resulting in a winning test at 2/27!

Fig. C

Business importance

Any decisions on stopping or extending a test should always include an evaluation of the impact on your business. Is the test important enough to let it run longer or should we stop it and move on? Have we learned enough to make a decision or do we really need more information before making a call? For example, if you determine a test is hurting your KPI but isn’t significant, do you really need to keep it running longer just to claim significance? If it’s unlikely to turn the corner and become positive then why let it continue? Do we have other pressing ideas that are waiting with equal or more potential being held up by this test?

Rarely do we see a test that has an immediate delta between test and control and doesn’t fluctuate over time. What we want to see is a clear indication that the test is either helping or hurting the main KPIs and ultimately the business. If we can say this with confidence, then we can decide if there is value in letting the test run longer to gather additional data and claim more accuracy in our estimates of impact. Tests that are in the uncertain area (neutral with no clear winner or loser) are harder to evaluate and we often need to decide when it’s time to move on rather than continuing to collect data that has been inconclusive. The key is to observe the data and understand the trend (no apparent trend, overall positive, overall negative, or specific time periods when metrics shifted). This will give you much more valuable information than just a simple metric with an attached significance for making your business decisions.

The following two tabs change content below.

Will Plusch

Will Plusch is a Director of Optimization Strategy who is passionate about conversion rate optimization and data driven decision making. He enjoys educating clients on how the optimization process works and loves to use real world examples to simplify the science. He has over 15 years of experience in marketing and merchandising analysis and optimization testing in both brick and mortar and online sales channels. He holds an undergraduate degree in Computer Science, a Masters of Business Administration and is a Certified Usability Analyst (CUA).

Latest posts by Will Plusch (see all)