Uplift Modelling

Resources

Introduction

Uplift modelling applies machine learning to estimate individual or subgroup-level causal effects of a treatment. In recent years, it has become essential for personalization in e-commerce, optimizing interventions to maximize business metrics. This is particularly useful in promotional campaigns, where benefits must be weighed against costs.

Uplift modelling is very common in data science, particularly when you are doing an A/B test and want to test the outcome of a specific treatment. In the A/B test, you could for example, split your customers into a control group and a treatment group, send emails about something to the treatment group, and then measure the outcome, e.g., conversions.

A/B testing is used to measure average treatment effect (ATE), while uplift modeling aims to estimate heterogeneous/conditional treatment effects (CATE) to personalize interventions. A/B tests only tell you whether a treatment works on average, while uplift modeling goes a step further to identify who benefits the most from the treatment.

Thus, while A/B testing measures overall effectiveness, uplift modeling helps predict which individuals are most likely to respond positively to the treatment, enabling targeted decision-making.

The Idea of the Uplift Modelling

When you have done an A/B test and measured the outcomes, you can train an uplift model that measures the average treatment effect conditional on some covariates. This can be very important, because the next time you want to do a treatment, e.g., send out promotional emails, you want to treat the customers that respond well to the treatment. Maybe when you do the second treatment round, you also do a small A/B test setting, such that you increase your training data for the uplift model.

Problem Setup

Let \(Y_i^1\) denote person \(i\)’s outcomes when it receives the treatment and \(Y_i^0\) when it does not receive the treatment. We are interested in the causal effect,

\begin{equation} \tau := Y_i^1 - Y_i^0. \end{equation}

Given a feature vector \(X_i\) of person \(i\), we would like to estimate the conditional average treatment effect,

\begin{equation} \tau (X_i) := \mathbb{E}[Y_i^1 | X_i] - \mathbb{E}[Y_i^0 | X_i]. \end{equation}

The problem is that \(\tau (X_i)\) is not observable since we cannot both treat and not treat person \(i\).

Estimating the Conditional Average Treatment Effect (CATE)

Now, we can estimate \(\tau(X_i)\) under a unconfoundedness (conditional independence) assumption. The assumption means that the treatment assignment is independent of potential outcomes and is a key assumption in causal inference. Unconfoundedness implies that after conditioning on the covariate, \(X_i\), treatment assignment is as good as random. Thus, you need to make sure that there is no hidden variable that explains the assignment of the treatment.

Let \(W_i\) be a binary variable indicating whether person \(i\) received the treatment or not,

\begin{equation} Y_i^{\text{obs}} = Y_i^1 W_i + (1 - W_i) Y_i^0 \end{equation}

If we assume that the treatment assignment \(W_i\) is independent of \(Y_i^1\) and \(Y_i^0\) conditional on \(X_i\), when we can estimate CATE from observational data by computing the empirical counterpart,

\begin{equation} \text{uplift} = \hat{\tau}(X_i) = \mathbb{E}[Y_i^{\text{obs}} | X_i,W_i=1] - \mathbb{E}[Y_i^{\text{obs}} | X_i,W_i=0]. \end{equation}

The empirical estimation for uplift is only valid under the assumption of a randomized experiment or if we have correctly adjusted for confounders. Otherwise, it suffers from selection bias.

When you are doing an A/B you want to split your sample randomly, but if you are not careful about this process there is the risk that hidden variables affect the split of the treatment and control groups.

Estimation Methods

A variety of methods can be used to estimate uplift, including separate models for treatment and control, modified outcome regression, and specialized tree-based approaches such as uplift trees.

Two-Model Approach (Separate Treatment & Control Models)

Class Transformation (e.g., Modified Outcome Modeling)

Uplift Trees & Random Forests (KL Divergence, Delta Method, etc.)

Python Implementations

There are several Python packages for uplift modelling: