Preventing customer attrition through predictive modelling

Many companies today have understood the importance of keeping their customers as satisfied as possible. The reasons for this should be quite obvious and given a large number of actors on the market it is of paramount to retain customers. It is a well-known fact that the acquisition of a new customer is far costlier to a company than the effort needed to keep it from leaving, or as it is called, churning. Thus, whenever a customer churns, a company not only loses its investment in recruiting a customer but also future revenues associated to that customer. The term churning refers to the movement of individuals from a group over a given period. Another term used in the literature is Attrition.

But how do you determine whether a customer is about to churn? Are there different ways to do so depending on the type of business? Some businesses provide a service that can be sought from time to time while others demand a long-term commitment from customers in the form of contract of a determined duration. Online shopping services are of the first type and cellphone service providers are of the second type. Naturally, they need to be handled in separate ways because of the data available about the customers.

In this blog, we intend to describe some of the methods used to determine whether a customer is about to churn. This knowledge then enables a business to design strategies that might prevent the loses associated to groups of customers choosing the services of competitors. Of course, the accuracy of such a model is dependent on the amount of data available to the analyst, but today this issue is of a lesser concern since most businesses gather vast amounts of data about its subscribers and of most transactions associated to the latter. So, basically, the real problem is mining the data in the right way, making a good model and predict reliable churn rates.

In this exercise, we will use a fictive, real world based, dataset from telecommunication (The dataset can be found online at http://www.dataminingconsultant.com/data/churn.txt). Several important aspects need to be mentioned before going any further:

  1. Modelling churn (as with many predictive models) cannot be done without a sufficient amount of historical churn data. What we seek is what costumers who churned had in common or what differentiated them from those that didn’t, and this task cannot be achieved without information about which customers have or haven’t churned.
  2. People may have widely different reasons to end their contracts with service providers. As a model’s aim is to identify patterns, it is likely to misinterpret some customer’s behavior as possible attrition. The model results are therefore not entirely accurate.
  3. Once a model has been built and tuned it needs to regularly be updated (retrained). Indeed, costumer behaviors may change over time, either due to, for instance, the service providers changes in pricing, market diversity or technical advances that make the product less attractive.

We shall also present two approaches, Random Forest and Survival analysis, to analyzing and predicting churn. The two methods can be used together, partly to predict churn, partly to predict time to churn.

Random Forest

A random decision forest is an ensemble learning method for classification and regression among a wide range of other tasks. It operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes or mean prediction of the individual trees. We have already described Random forest in a previous blog, https://kentoranalytics.com/blog/2017/3/21/partikelkollense, and invite readers to read it.

Survival Analysis

As we saw in the previous section, we have the ability to predict customer attrition. As the type of models described here have a reasonable accuracy, they lack one important feature, namely a prediction of when churn will occur. It is of course impossible to determine a precise time of such an event, but it is however possible to determine the probability of an event within a given time interval. To do so, we used so-called Kaplan-Meier estimators and more specifically Greenwood’s formula. KentorAnalytics does not, usually, have the intent to educate its readers in statistics or mathematics. However, to grasp the essence of the Kaplan-Meier estimators, we chose to give a light-version of how one obtain Greenwood’s formula.

Let (X_1,X_2,\ldots,X_n) be the times of either (i) an observed churn (ii) the last time a costumer was censored as such. Let \delta_i=0 if $latexX_i$ is an observed attrition (churn) and \delta_i=1 if the i-th individual was last seen as a costumer (that is, churn is false) at time X_i, but has not been censored as such since then. The concept of censored and un-censored costumers is essential here because at the time we make prediction, individuals have been members of the studied sample different length of time and it may be difficult to compare them. If \delta_1=1,

the TRUE churn-time for the i-th individual is Y_i>X_i, but the costumer dropped out of the study at time X_i. This means that we perform our analysis not knowing anything about that particular individual’s intentions (stay or go). We say that Y_i was censored at time X_i. If \delta_i=0, X_i is the observed time of attrition.
Now, the natural question that poses itself is: Can we say ANYTHING about the future attrition of censored and un-censored individuals? That is, can we compute estimates of when individuals will churn and possibly/alternatively give probabilities for customer attrition within a given time frame?

Assume \{t_i\}_{i \in \{1,2,\ldots ,n\}} are distinct times arranged in increasing order. The times t_i are observed churn times in the sample data for which X_j=t and \delta_j=0 for some j. Let n_i be the size of the risk at time t. What this means is that n_i is the number of individuals in the sample that had not churned at time t_i. For i < r , n_{i+1}=n_i-d_{i}-c_i where d_i is the number of individuals that churned at time t=t_i and c_i the number who are censored at times t with t_i \leq t < t_{i+1}. The Kaplan-Meier estimator S(t) of the survival function S(t)=Pr⁡(t <X) is then given by

\hat{S(t)}=\Pi_{t_i \leq t}\left( 1-\frac{d_i}{n_i} \right)

Since \(\hat S(t)\) is an estimate we should rather give this point-estimate together with some confidence interval. If the required confidence is (100-α)% then the Greenwood formula is given by
\hat{S(t)}=z_(\alpha/2)\sqrt{(\widehat{Var(\hat{S(t)})}}
Where
\widehat{Var(\hat{S(t)})}=\hat{S(t)}^2 \sum_{t_i \leq t}\frac{d_i}{(n_i (n_i-  d_i))}
and z_{\alpha}  is the \alpha-th quantile of the normal distribution so that if a 95% confidence interval is required, then \alpha=5 and z_(\alpha/2)=z_0.025=-1,96.

With this in mind, we finally get to hard work with our data. One question that immediately poses itself is whether on really needs to include all variables in our survival model. This dataset contains very little data compared to what a business might have at hand and in those cases, it might be a smart and cheap move to investigate whether a variable is important to churn or not. This implies shorter, less expensive, computational time. If nothing else, it might help you to understand the data and inspire relevant statistical investigations.

We can graphically investigate the relation between churning and different variables.

names(TeleComData)
"State","Account.Length" "Area.Code","Phone","Int.l.Plan","VMail.Plan",
"VMail.Message", "Day.Mins","Day.Calls","Day.Charge","Eve.Mins","Eve.Calls",
"Eve.Charge","Night.Mins","Night.Calls","Night.Charge","Intl.Mins","Intl.Calls",  "Intl.Charge","CustServ.Calls","Churn."

What would your reasons to change service provider? Could it be that customers over time get curious about other service providers? This is apparently not the case, as the figure below shows.

LengthInServicebild1

One of the drivers in almost everything is money, of course and a second is the reasons for which one choses a service over another. In the case of telecommunication, we know that charges vary over the course of the day. Daytime hours are often costlier since communication is a must during business hours and users are more price insensitive.

DayNightGraphbild2

These graphs can be compared to the corresponding graphs of usage in minutes of the services.

DayNightGraphMinbild3

There are probably several other leads that may be investigated in order to determine all the reasons for which an individual chooses to leave a service provider in hope to get a better deal elsewhere, but the aim of this article is not an extensive study of one particular case but rather to give an introduction to the techniques usually used. We shall therefore restrict ourselves to o check for all the variables in a dataset, given the right format, one might even what to use linear regression style techniques to observe relationships between churn and these variables. A note of caution though is that one might even want to perform multivariate regressions as well as univariate since there, in some cases, might exit multiple reasons for churning.

To work through our material, we have chosen to use the R implemented survival package (https://cran.r-project.org/web/packages/survival/index.html) which works well for all versions of R older than 2.13.0. It only imports a small number of other packages (graphics, Matrix, methods, splines) and includes definitions of Surv objects, Kaplan-Meier and Aalen-Johansen curves as well as Cox models.

We begin by creation a survival object using each customer’s time in the system, i.e. the time they spent as subscribers of the service and the information of their churn status. When this is done, one needs to fit a survival curve to the data and of course plot it. This we, in the above description chose to concentrate on three aspects, the overall survival of any customer in the system and that of individuals using their services at different times of the day, we create these Surv-object for these three types.

TeleComData$Accountsurvival = Surv(TeleComData$Account.Length, TeleComData$Churn. == "True.")
TeleComData$Daysurvival = Surv(TeleComData$TotalDayCharge, TeleComData$Churn. == "True.")
TeleComData$Nightsurvival = Surv(TeleComData$TotalEveCharge, TeleComData$Churn. == "True.")

The fitted curves are then given by:

fit = survfit(Accountsurvival ~ 1, data = TeleComData)
fit2 = survfit(Daysurvival ~ 1, data = TeleComData)
fit3 = survfit(Nightsurvival ~ 1 , data = TeleComData)

The plots are easily done with the following code:

SurvivalPlot = plot(fit, lty = 1:2, mark.time = FALSE, ylim=c(.05,1), xlab = 'Days since Subscribing', ylab = 'Percent Surviving')
legend(20, .8, c('yes', 'no'), lty=1:2, bty = 'n', ncol = 1)
title(main = "Telecom Survival Curves")
SurvivalDayCharge = plot(fit2, lty = 1:2, mark.time = FALSE, ylim=c(.05,1), xlab = 'Days since Subscribing', ylab = 'Percent Surviving')
legend(20, .8, c('yes', 'no'), lty=1:2, bty = 'n', ncol = 1)
title(main = "Survival Curves-Total day charge")

SurvivalNightCharge = plot(fit3, lty = 1:2, mark.time = FALSE, ylim=c(.05,1), xlab = 'Days since Subscribing', ylab = 'Percent Surviving')
legend(10, .8, c('yes', 'no'), lty=1:2, bty = 'n', ncol = 1)
title(main = "Survival Curves-Total night charge")

which gives the following plots:

SurvivalOverallbild4

We can observe that we can expect about 50 percent of our costumers to survive the first 200 days of their subscription to our services. Also note the widening of the confidence intervall(here set to 95%).

Survival+Daybild5

Compare the graphs for individuals using mostly their subscription for day usage and those using it mostly for night usage (graph below). Note the difference in survival probability.

Survival+Nightbild6

A final step, that we in this example shall only do for the overall account length, is to perform a log-rank test. The log-rank test compares estimates of the hazard functions of two samples at each observed event time. It computes the observed and expected number of events in one of the samples at each observed event in time and then adds these to obtain an overall summary across all-time points where there is an event.

survdiff(survival ~ Account.Length, data = TeleComData)
DataSurv             =  as.data.frame(cbind(fit$time,fit$surv, fit$upper, fit$lower))
names(DataSurv)      =  c("Account.Length","surv.prob","surv.prob.upper","surv.prob.lower")
TelecomSurvTable     =  merge(TeleComData,DataSurv, by=c("Account.Length","Account.Length"))
TeleComDataSurv      =  TelecomSurvTable[which(TelecomSurvTable$Churn. == "False."),]

which gives us the probability of each costumer to continue being a subscriber to our services:

State    Phone Accountsurvival surv.prob surv.prob.upper surv.prob.lower
1    SC 336-1043              1+    0.9997               1       0.9991122
2    AK 373-1028              1+    0.9997               1       0.9991122
3    SC 356-8621              1+    0.9997               1       0.9991122
4    NJ 420-6780              1+    0.9997               1       0.9991122
5    IA 331-2144              1+    0.9997               1       0.9991122
6    TN 335-5591              1+    0.9997               1       0.9991122 

178     SC 359-5091             36+ 0.9932518       0.9960663       0.9904451
179     IA 385-3540             36+ 0.9932518       0.9960663       0.9904451
180     ME 335-3110             36+ 0.9932518       0.9960663       0.9904451
181     MI 400-3637             36+ 0.9932518       0.9960663       0.9904451
182     AK 341-9764             36+ 0.9932518       0.9960663       0.9904451
184     MI 386-1131             37+ 0.9926213       0.9955672       0.9896842
186     CO 408-1513             37+ 0.9926213       0.9955672       0.9896842
187     CT 347-7675             37+ 0.9926213       0.9955672       0.9896842
188     NH 341-7332             37+ 0.9926213       0.9955672       0.9896842
189     NE 393-7892             37+ 0.9926213       0.9955672       0.9896842
190     MD 420-2000             37+ 0.9926213       0.9955672       0.9896842

The analysis done here is quite basic as we only looked at some factors that might influence churning, but it gives a rather intuitive idea of the common procedures used in the context. There are also numerous add-ons to this analysis that may be done. For instance, one could design customer services in such a way that account is taken to the fact that a customer has a high probability of churning, e.g. by moving them forward in queues when services are solicited or sending advantageous offers that might influence their decisions. The end game is after all to keep existing customers and avoiding costs associated to the recruitment of new ones to replace them.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

Up ↑

%d bloggers like this: