A common technique used when introducing a new idea is to test it against a sample population. If it does well with the sample we can conclude that it will do equally well with a larger audience. We can improve the validity of that conclusion, however, by following a couple of important principles.
- Randomness – if you hand-pick people based on some common characteristic (age, gender, etc.) that group may behave differently than the larger population. The sample needs to be statistically representative of the whole.
- Concurrent Control – To eliminate the effect of external influences, you should run the test alongside a baseline or control. Any environmental or seasonal variances should have a similar effect on both your test sample and control population.
Let me clarify a little terminology here. I’ll refer to a test as an experiment. Just like in science, an experiment is designed to prove (or disprove) a hypothesis; in this case, whether your new idea is a good one. I’ll call the individual samples within the experiment variants. You can refer to them as a control, baseline, test, challenger, etc., it doesn’t really matter in this discussion; they’re all just variants that can be selected through a sampling process.
So assuming you have your test and control running at the same time, you need to decide which to present to each individual. The simplest approach is to use a random number generator. When used properly, these do a great job of providing an even distribution (more on that later). Suppose you want 25% of the population to receive the test and the remaining 75% to receive the control. To do this, generate a random number between 0 and 99. If the number is less than 25, use the test, otherwise use the control. You also can use this approach if you wanted to run multiple tests simultaneously, for example, if you had three tests and you wanted each one 15% of the time. The generated random number just needs to be compared with multiple bands: 0-14 test1, 15-29 test2, etc.
So far, this is all pretty simple, so let’s consider a common problem that you may need to resolve. You offer a potential customer the test offer and a few minutes later they come back and, because of the randomization, they now see the control. This phenomenon is especially common on the web where refreshing a page usually generates a new request for content. This makes it difficult to analyze how the test is performing since the visitor has seen both variants. This could also be embarrassing depending on your test idea. For example, if your test is to offer a lower price, the consumer may be wondering why they are seeing different prices for the same item.
A common solution to this problem is to store the result of the initial sampling so subsequent requests can ensure the same sample is returned. If you’re using the web, there are essentially two approaches you can take: 1) store the sampling results on the client using cookies or local storage or 2) store the results on the server, either in a database or some type of shared cache. There are pros and cons with each of these. The server-side approach will tend to perform slower since the server will need to perform a lookup that can add latency. The client-side approach is certainly not fool-proof either as cookies and local storage can be disabled or cleared.
So all of this leads up to the point of this article. Using a technique known as hashing you can achieve “repeatable randomization” without needing to store anything. Essentially, multiple people are randomly sampled, but the same person is always assigned the same sample on subsequent requests.
To do this, you’ll need some sort of unique identifier so each customer can be distinguished from others. This can be an account number, email address, phone number, user name, or any other key that uniquely identifies a consumer. You can also generate a visitor ID using some type of fingerprinting technique. If none of these are available, you can use a session key or order/transaction number. The latter have a much more limited lifespan but they will at least ensure consistency during a single visit. If the customer comes back later they could be sampled differently, however.
Hashing is a technique that generates a fixed-size hash from varying sized input strings. It’s a one-way process in that you can’t get back to the original input string if you only have the hash. But the same input string will always produce the same hash value and even minor changes in the input will produce a different hash. This approach was initially used to ensure fidelity when transmitting documents or messages. The sender would create a hash of the document and include that in the transmission. The receiver would then hash the document upon delivery. If the two hashes matched, you could be fairly confident that the document was unaltered. One behavior of hashes that really helps us with sampling is that the hash tends to be somewhat random. Two input values, for example, that are very similar, perhaps only one differing character, can produce very different hashes.
Sampling using hashing is pretty simple: take the unique ID, compute a hash, convert the hash to a decimal value, and take the last two digits (or any two digits for that matter). You’ll end up with a number between 0 and 99, just like with the random number generator, which you can use to select the variant.
Before we done, there’s one minor improvement that you should include. If you’re running multiple experiments over time, you may not want the same customers always seeing the control or test offers. Ideally, you will want to re-sample them for each experiment. One may be in the test group for one experiment but in the control group for another. To do this, simply include an identifier for the experiment along with the unique customer identifier when computing the hash. For any specific experiment, they will always be sampled the same way, but they could be in a different variant for other experiments.
Ultimately, whether using a random number generator or a hashing algorithm, you’ll end up with a number between 0 and 99 that will determine which variant is assigned based on the configured weights. We expect these to be evenly distributed. For example, if you pass 1,000 unique customers through this process, in a perfect world, there would be 10 assigned a one, 10 assigned a two, etc. Realistically, there will be some uneven distribution. You can visualize this by plotting the number of customers assigned to each number, which is illustrated below.
The blue line represents the perfect world where there is a straight line. A more realistic example is shown with the red line. These peaks and valleys should not be significant and should be random so that over time they tend to even out. With a larger sample, the red line would become more like a straight line. The green line demonstrates an unacceptable behavior where the lower numbers are given a disproportionate number of customers. If you experience this phenomenon, you should try other hashing algorithms to improve the distribution.
One final note: With the hashing approach, the same customer/experiment combination will always be assigned the same number for sampling purposes, say 42. However, if you’re adjusting the weights frequently, perhaps using a machine learning process that increases traffic to better performing variants throughout the experiment, then the variant that 42 is assigned to can change over time. One day the customer may see the control and on another day, they could see the test. If it is important that the customer see the same variant throughout the entire experiment, then you either need to keep the weights consistent, or store the selected variant as discussed at the beginning of this article.
Using a hashing approach to sampling can solve the problem of providing a consistent user experience while still using random sampling. It’s pretty simple to implement as long as you have some type of unique identifier that you can use to isolate a single consumer. It doesn’t require storing information on either the client or the server.
See his latest book on Amazon here.