I recently came across a question from one of my retail clients that I took great pleasure in solving. It was interesting to me both because of the clear usefulness of retrieving the answer for my client and also for the brainteasing nature of the problem. In this post I’m going to share both the problem and my solution to it. If you’re so inclined read the problem part in isolation first and see if you can come up with your own solution.
The Problem
Imagine a situation where you’re selling a bundle of a fixed number of widgets to your customers on a specific sales event. The customers can choose the widgets from a list of potential widgets but can only pick one of each. So each bundle is created by picking elements from the set of potential widgets without replacement. For example:
Set of widgets = [A, B, C, D, E, F, G, H, I, J]
Bundle1 = [A, C, D, H, J]
Bundle2 = [B, C, D, I, J]
…..
The problem posed is to come up with an estimation the distribution of these bundles or combination of widgets. I.e. given a certain number of customers how many of them will chose Bundle1, Bundle2 and so on. The ordering of the widgets within the bundle is irrelevant so that [H, B, C, J, E] is the same as [B, C, H, E, J] and [H, J, E, B, C] etc.
- There are three things that we need to take into account for this problem:1There is a default bundle that the customer is presented with initially: [A, B, C, D, E]. However; there is no penalty for making substitutions and so nothing but inertia stops the customer from replacing any number of widgets in their bundle with another widget from the set.
- The probability of picking a certain widget is not independent of the probability of picking another widget. So for example if a customer has picked A for their bundle that could affect the probability that they will also pick B in that bundle.
- We have two forecasts that in this instance we should take as a given (ignoring the uncertainties involved):
- a) The distribution of widgets that will be sold on a total level is given. So we “know” that after the sales event for example 10% of the total widgets sold will be A’s and 2% will be E’s
- b) We also have the distribution of customers when it comes to how many substitutions they make. So we “know” that for example 20% of customers will not make any substitutions, 25% will make one substitution of some kind, 10% will make two substitutions and so on.

My Solution
Probability evaluation – and why I didn’t take this route
My first instinct when faced with this problem was to dive into the probabilities involved. If we knew the probability of each potential bundle we could use this to come up with an expected value of the share of the total it would make up for.
At first glance this doesn’t seem like an impossible route to take. After all with the forecasted distribution of widgets we should be able to derive the probability of a specific widget appearing in a given bundle. And given these my thought was to treat the situation as a standard “picking marbles out of a bowl without replacement” that you will find in many probability exercises.
The reason this didn’t work is because we can’t assume the probabilities to be independent as stated in point 2. There are two main reasons for this limitation:
- One reason for this is because A and B can be either complementary products or substitution products. Complementary products is when A and B are often used together; such as hotdogs and hotdog breads. In this case the likelihood that hotdog breads are chosen becomes larger if hotdogs have already been chosen. Substitution products on the other hand is when picking A decreases the probability of picking B afterwards. For example if A and B are different brands of lawnmowers the customer is unlikely to need both.
- Another reason we might see the probability of choosing B increase or decrease if A has already been chosen is down customer preferences being revealed by the choice of A. This is the whole reasoning behind methods such as collaborative filtering in recommender systems. Having watched The Sound of Music might mean that the likelihood that you’ll choose to watch Grease is greater but the probability of watching Terminator is smaller.
Therefore if a customer has A in their bundle might greatly change the probability that they will have C in there as well. And the two forecasts that we have available do not help us with estimating this effect. In particular we have no information about the conditional probabilities P(C|A), P(A|C), P(G|B)….
On top of this is the fact that a default bundle is presented to the customer and an active choice being necessary to change this. If customers were completely logical this wouldn’t matter to the outcome. They would simply consider all their options in the set and pick their favorite widgets from there. But in reality humans are biased towards choosing the prescribed option and against changing the status-quo especially if they don’t have a strong preference either way. This tendency will be captured by the distribution described in 3b and the solution should reflect it.
For these reason working out the probabilities directly would be very difficult without more information.
Simulation – the ‘good enough’ solution
Instead I wrote a process in code that simulated the situation as faced by the customer when deciding which widgets to put in their bundle and repeated it a lot of times. From the output this created I was then able to directly calculate the distribution between bundles.
The solution can be done for any number of widgets in the bundle and in the possible set but in here I will assume that 4 widgets should be chosen out of a total of 10.
The way I structured the customer choice simulation was in these steps:
- The customer is initially allocated the [A, B, C, D] bundle
- They are then allocated a number between 0 and 4 reflecting the number of widgets they are going to swap out based on a probability distribution
- Then that number of widgets is removed from the initial bundle and the same number added in from the remaining part of the set [E, F, G, H, I, J] leaving a final bundle that is saved. This is also done based on a probability distribution.
- Move on to the next customer and do the same thing n times
To perform this simulation we need to calculate the probability distributions both for steps 2 and 3 based on the provided forecasts.
Let’s start with the probability that the customer is going to swap out X number of widgets. This one is simple because we can simply use the forecast directly as an estimate. For example in R code:
P_noswipe <- 0.42 P_oneswipe <- 0.14 P_twoswipe <- 0.25 P_threeswipe <- 0.17 P_fourswipe <- 0.02 swap_prob_distribution <- c(P_noswipe, P_oneswipe, P_twoswipe, P_threeswipe, P_fourswipe)
So for each customer in step 2 we allocate a 0 with 42% probability, we allocate a 1 with 14% probability, a 2 with 25% probability, a 3 with 17% probability and a 4 with 2% probability.
The probability distribution in step 3 is a little bit trickier to derive from the given forecasts and requires you to be a bit careful. We’re given the following forecasts for the total number of each type of widgets:
F_A <- 0.19 F_B <- 0.16 F_C <- 0.12 F_D <- 0.11 F_E <- 0.08 F_F <- 0.09 F_G <- 0.07 F_H <- 0.06 F_I <- 0.03 F_J <- 0.09
So when the sale event is over the company has sold X bundles they will have sold 5X widgets of different kinds. The figures provided tell us that out of these 5X widgets 17% will be A, 15% will be B and so on.
This is however not the same as saying that a customer will choose widget A with 17% probability and B with a 15% probability. Why is this? The answer is that this distribution also reflect the probability distribution of the number of swaps that you’re actually making. The reason the forecasted numbers are higher for A-D is because 42% of the customers will be choosing exactly the bundle that contains them. The effect of this must be removed to get the true probability for each widget.
A customer that we know will not be making any swaps has 25% chance in every pick and 0% probability of choosing widgets E-J. A customer that we know will be making four swaps has a 0% probability of choosing widgets A-D. Therefore we estimate the probabilities like this.
P_A_swipe <- (F_A - 0.25*P_noswipe)/(1-P_noswipe) P_B_swipe <- (F_B - 0.25*P_noswipe)/(1-P_noswipe) P_C_swipe <- (F_C - 0.25*P_noswipe)/(1-P_noswipe) P_D_swipe <- (F_D - 0.25*P_noswipe)/(1-P_noswipe) P_E_swipe <- (F_E - 0*P_noswipe)/(1-P_noswipe) P_F_swipe <- (F_F - 0*P_noswipe)/(1-P_noswipe) P_G_swipe <- (F_G - 0*P_noswipe)/(1-P_noswipe) P_H_swipe <- (F_H - 0*P_noswipe)/(1-P_noswipe) P_I_swipe <- (F_I - 0*P_noswipe)/(1-P_noswipe) P_J_swipe <- (F_J - 0*P_noswipe)/(1-P_noswipe) widget_prob_distribution <- c(P_A_swipe, P_B_swipe, P_C_swipe, P_D_swipe, P_E_swipe, P_F_swipe, P_G_swipe, P_H_swipe, P_I_swipe, P_J_swipe)
We then define a function that allocates widgets to any customer given the number of swaps that they are going to make according to the widget probability distribution. This will create one row of the output that we need:
random_customer_new <- function(p=widget_prob_distribution,swaps) { if (swaps==0) { valda_recept <- c(1,1,1,1,0,0,0,0,0,0) } if(swaps>0) { done = FALSE cp <- cumsum(p) orginal_widget <- rep(0,4) swapped_widget <- rep(0,6) chosen_widget <- rep(0, 10) while (!done) { r <- runif(1) if (r < cp[1]) { if (orginal_widget[1] == 0 & sum(orginal_widget)<4-swaps) orginal_widget[1] <- 1 } else if (r < cp[2]) { if (orginal_widget[2] == 0 & sum(orginal_widget)<4-swaps) orginal_widget[2] <- 1 } else if (r < cp[3]) { if (orginal_widget[3] == 0 & sum(orginal_widget)<4-swaps) orginal_widget[3] <- 1 } else if (r < cp[4]) { if (orginal_widget[4] == 0 & sum(orginal_widget)<4-swaps) orginal_widget[4] <- 1 } else if (r < cp[5]) { if (swapped_widget[1] == 0 & sum(swapped_widget)<swaps) swapped_widget[1] <- 1 } else if (r < cp[6]) { if (swapped_widget[2] == 0 & sum(swapped_widget)<swaps) swapped_widget[2] <- 1 } else if (r < cp[7]) { if (swapped_widget[3] == 0 & sum(swapped_widget)<swaps) swapped_widget[3] <- 1 } else if (r < cp[8]) { if (swapped_widget[4] == 0 & sum(swapped_widget)<swaps) swapped_widget[4] <- 1 } else if (r < cp[9]) { if (swapped_widget[5] == 0 & sum(swapped_widget)<swaps) swapped_widget[5] <- 1 } else if (r < cp[10]) { if (swapped_widget[6] == 0 & sum(swapped_widget)<swaps) swapped_widget[6] <- 1 } chosen_widget <- c(orginal_widget, swapped_widget) if (sum(chosen_widget) == 4) done <- TRUE } } return(chosen_widget) }
And finally we create a loop that creates n rows of output data; each one of them a simulated customer with an allocation of widgets. In the loop each simulated customer is allocated the number of swaps that they are going to make according to the swap probability distribution as defined previously.
customer_simulation <- function(n, widget_prob_distribution, swap_prob_distribution) { output_data <- matrix(rep(0,n*10), nrow=n) for (x in 1:n) { r <- runif(1) cp <- cumsum(swap_prob_distribution) if (r < cp[1]){ swaps <- 0 } else if (r < cp[2]) { swaps <- 1 } else if (r < cp[3]) { swaps <- 2 } else if (r < cp[4]) { swaps <- 3 } else if (r<=1) { swaps <- 4 } output_data[x,] <- random_customer_new(widget_prob_distribution,swaps) } return(output_data) }
Evaluating the solution
This solution worked well enough for our purposes as long as the number of simulated customer created was large enough. The main requirement was that the proposed forecast of distribution between bundles should reflect fairly accurately the two high level forecasts (3a and 3b) and that was achieved by simulating one million customers.
There are many uncertainties built into it that should be understood though. The forecasts 3a and 3b that form the very basis for the outcome of the simulation obviously has uncertainty built into it that we have been ignoring. For example if the percentage of customers that decides to make no swaps at all turns out to be 32% instead of 42% this would completely alter the outcome. Or if widget G turns out to be more popular so that it reflects 15% of all widgets sold instead of 7% the same thing would happen. Thus the reliability of the overall forecasts is very important.
The subtleties relating to the conditional probabilities between widgets that made a probability approach difficult is still a factor. If the sales event was repeated several times with the same widgets available to the customers then past data could be used to better understand how they interact with each other and maybe estimate P(A|B), P(J|B) and so on. This scenario would also allow us to improve the overall forecasts 3a and 3b more.
For my client however though more sales events were scheduled for the future the widgets forming the bundles would not be the same from time to time. This means that the forecast for number of swaps could become somewhat improved by observing how likely customer are to make changes to their suggested bundle. But when it comes to the popularity of each individual widget it will continue to be a challenge to forecast both on a total level and to understand the interactions between the different items. At least with this method we provided a way to get a qualified guess of the more detailed bundle distrib
its great as your other articles : D, regards for posting.
LikeLike