Uplift Modeling
一、背景
在促销活动,或者补贴、发红包活动中,如果我们能准确瞄准相应人群,那么即可以促进购买数量,也可以节约成本(本来就可以购买的人群不用额外发放红包)。
二、一些基础
Uplift用于筛选出这类人群。(如果对其采取了补贴,则可以转化为订单)。Uplift模型比较了treatment(投放补贴)客户与 control(不投放补贴)客户之间的差异。我们用 Y 表示 action,
Y = 1 表示购买物品,Y = 0 时表示不够买物品。W表示人群,1为treatment表示发补贴;0表示control表示不发补贴。下图 persuadable是我们要找的客户。
我们将客户分为4类。
- Will Take Action。会购买物品不管是否发补贴。(Y = 1 && W = 1 and Y = 1 && W = 0 )
- Persuadable.。发补贴购买物品否则不买。(Y=1 && W = 1 and Y = 0 && W = 0)
- Do Not Disturb。发补贴不买,不发补贴买。(Y = 0 && W = 1 and Y = 1 && W = 0)
- Never Will Respond。发不发补贴都不买。(Y = 0 && W = 1 and Y = 0 && W = 0)
1)上标1为treatment, 0 为control;下表 i 表示第 i 个样本
2)因果效果(Casual Effect) :$\tau _{i}=Y_{i}^{1}-Y_{i}^{0}$
3)因果效果期望:$CATE=E[Y_{i}^{1}|X_{i}]-E[Y_{i}^{0}|X_{i}]$. $X_{i}$表示i类子人群
4)不能同时用户i又发了补贴,又没发补贴。用户i的结果可以表示为: $Y_{i}^{obs} =W_{i}Y_{i}^{1}-(1-W_{i})Y_{i}^{0}$
5)由 4),CATE的估计为:$\widehat{CATE}=E[Y_{i}^{obs}|X_{i}=x,W_{i}=1] - E[Y_{i}^{obs}|X_{i}=x,W_{i}=0]$ . (注意, 在$X_{i}=x$条件下,$W_{i}$应该独立于 $Y_{i}^{obs}$, 也就是表名用户进入treatment组 或者 contrl组 是随机的。)
6) 倾向分 Propensity Score. $P({X_{i}}) = P(W_{i}=1|X_{i})$. 即用户进入到treatment的概率。
三、Uplift Modeling
1 Two-Model Approch
使用两个模型,一个训练 Treatment的数据,一个训练Control的数据。最后用 Prob(Treatment) - Prob(Control) 得到 uplift score。
2 The Class Transformation Approach
试用于$Y_{i}^{obs}$ 只为1、或者0的情形。构造目标函数
1)$Z_{i}=W_{i}Y_{i}^{obs}+(1-W_{i})(1-Y_{i}^{obs})$
在实验组且用户购买或者对照组用户无响应时,$Z_{i}=1$,其他情况 $Z_{i}=0$ 。 当 $P({X_{i}}) = 1/2$时候,可以证明:
2)$\tau (X_{i})=2P(Z_{i}=1 | X_{i})-1$
此时对$P(Z_{i}=1 | X_{i})$建模即可。缺点是对实验组、对照组样本分布要求严格都为1/2。当样本分布不均匀时候:
3)$Y_{i}^{*} =\frac{W_{i}}{\hat{p}(X_{i})} Y_{i}^{1}-\frac{(1-W_{i})}{(1-\hat{p}(X_{i}))}Y_{i}^{0}$
转换后可得
4)$\tau (X_{i})=E[Y_{i}^{*}|X_{i} ]$
转换后得了
5)$Z_{i}=\frac{1}{2} Y_{i}^{*}+(1-W_{i}) $
此时可得
6)$2E[Z_{i}|X_{i}]=E[Y_{i}^{*} |X_{i}]+1$
3 Modeling Uplift Directly
传统的树模型的分裂规则,例如ID3,是根据信息增益计算是否按照某个特征分裂。
1)$Gain(D|A)=H(D)-H(D|A)$
这种思想也可以引入到 Uplift Model中。
2)$Gain=D_{after\_split}(P^{T},P^{C})-D_{before\_split}(P^{T},P^{C})$
D表示两组分布的差异程度。可以用KL散度,欧式距离等。
3.1 Decesion Trees For Uplift Modeling.
以 Decesion Trees for uplift modeling. 因果树为例子学习一下。 论文下载点此处。一种分裂方法是计算 $\Delta \Delta P$.
论文里使用的评判标准为 KL散度。
1)$KL(P^{T}:P^{C})=\sum P^{T}log\frac{P^{T}}{P^{C}} $
计算在一个测试集的条件KL散度
2)$KL(P^{T}:P^{C}|Test)=\sum \frac{N^{T}(a)+N^{C}(a)}{N^{T}+N^{C}} KL(P^{T}(class|a):P^{C}(class|a))$
KL散度增益
3)$KL_{gain}(Test)=KL(P^{T}:P^{C}|Test)-KL(P^{T}:P^{C})$
在测试集上,算分裂前的KL散度和分裂后的KL散度。
同样可以使用欧几里当作分裂标准。欧几里得评判标准有更好的统计特性。(有限、对称性)
4)$Euclid(P^{T}:P^{C})=\sum(P^{T}-P^{C})^{2}$
剪枝。定义
5)$Diff(Class,node)=P^{T}(Class|node)-P^{C}(Class|node)$
6)$MD(node)=max_{Class}|Diff(Class|node)|$
7)$sign(node)=sgn(Diff(Class^{*},node))$
如果子树被一个叶子结点代替后,验证集的MD值变小 并且如果MD的sign值在验证值和训练集一样则保留子树。 一个有用的Python包为 Causalml。下载使用参考链接
4 Multi-Classification Model 多分类模型
在文章开始,可以将样本分为4类。
- Control Non-Responders (CN)
- Control Responders (CR)
- Treatment Non-Responders (TN)
- Treatment Responders(TR)
然后将分类后的数据喂入模型最后计算 uplift_score. $Uplift\_Score = \frac{P(TR)}{P(T)} + \frac{P(CN)}{P(C)} - \frac{P(TN)}{P(T)} - \frac{P(CR)}{P(C)}$
四、评估
1) decile chart.
对用户计算相应 uplift_score之后,然后对用户进行排序(按照uplift_score由高到低)。然后按照排序结果根据需求按照百分位进行分组,最后计算每个分组的平均uplift_score。无法比较模型之间的好坏。
2) AUUC
最常用的评估方式为AUUC。$G(i)=(\frac{N_{y=1,t=1}}{N_{t=1}}-\frac{N_{c=1,t=1}}{N_{c=1}})*(N_{t=1}+N_{c=1})$
用五的例子分别算了各种模型的AUUC。可以看出 TWO-MODEL最差,MultiClassification 最好。
五、Example
选取 Kaggle一篇关于促销活动当作学习例子。https://www.kaggle.com/datasets/davinwijaya/customer-retention
CODE:
from scipy.spatial import distance from sklearn.neighbors import NearestNeighbors from sklearn.cluster import DBSCAN import numpy as np from math import * from scipy.spatial.distance import pdist, squareform from sklearn.cluster.tests.common import generate_clustered_data import matplotlib.pyplot as plt import pandas as pd import seaborn as sns, warnings warnings.filterwarnings('ignore') data_raw = pd.read_csv("C:\\Users\\nan.wu2\\PycharmProjects\\job\\study\\data\\data.csv") df_model = data_raw.copy() """ ['recency', 'history', 'used_discount', 'used_bogo', 'zip_code', 'is_referral', 'channel', 'offer', 'conversion'] """ # Rename target column df_model = df_model.rename(columns={'conversion': 'target'}) # Rename & Label encode treatment column df_model = df_model.rename(columns={'offer': 'treatment'}) df_model.treatment = df_model.treatment.map({'No Offer': 0, 'Buy One Get One': -1, 'Discount': 1}) df_model = pd.get_dummies(df_model) df_model_bogo = df_model.copy().loc[df_model.treatment <= 0].reset_index(drop=True) df_model_discount = df_model.copy().loc[df_model.treatment >= 0].reset_index(drop=True) def declare_tc(df: pd.DataFrame): # CN df['target_class'] = 0 # CR df.loc[(df.treatment == 0) & (df.target != 0), 'target_class'] = 1 # TN: df.loc[(df.treatment != 0) & (df.target == 0), 'target_class'] = 2 # TR: df.loc[(df.treatment != 0) & (df.target != 0), 'target_class'] = 3 return df df_model_bogo = declare_tc(df_model_bogo) df_model_discount = declare_tc(df_model_discount) from sklearn.model_selection import train_test_split import xgboost as xgb def uplift_split(df: pd.DataFrame): df.treatment = df.treatment.map({-1: 1, 0: 0}) df['Z'] = df['treatment'] * df['target'] + (1 - df['treatment']) * (1 - df['target']) X = df.copy() y = df.Z X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) return X_train, X_test, y_train, y_test def uplift_class_transformation_method(X_train, X_test, y_train, y_test): uplift_model = xgb.XGBClassifier().fit(X_train.copy().drop(['treatment', 'target', 'target_class', 'Z'], axis=1), y_train) result = pd.DataFrame(X_test.copy()) result['Z_prob'] = uplift_model.predict_proba( X_test.copy().drop(['treatment', 'target', 'target_class', 'Z'], axis=1))[:, 0] result['uplift_score'] = 2 * result['Z_prob'] - 1 return result def uplift_split_two_model_approach(X_train, X_test, y_train, y_test): df_treatment_train = X_train.copy().loc[X_train.treatment != 0].reset_index(drop=True) df_control_train = X_train.copy().loc[X_train.treatment == 0].reset_index(drop=True) df_treatment_test = X_test.copy().loc[X_test.treatment != 0].reset_index(drop=True) df_control_test = X_test.copy().loc[X_test.treatment == 0].reset_index(drop=True) Xt_train = df_treatment_train.copy() yt_train = df_treatment_train.target Xt_test = df_treatment_test.copy() yt_test = df_treatment_test.target Xc_train = df_control_train.copy() yc_train = df_control_train.target Xc_test = df_control_test.copy() yc_test = df_control_test.target return Xt_train, Xt_test, yt_train, yt_test, Xc_train, Xc_test, yc_train, yc_test def uplift_two_model_approach_model(X_train: pd.DataFrame, y_train: pd.DataFrame): uplift_model = xgb.XGBClassifier().fit(X_train.copy(), y_train) return uplift_model def uplift_multi_classification_model(X_train, X_test, y_train, y_test): result = pd.DataFrame(X_test.copy()) y_train = X_train.target_class uplift_model = xgb.XGBClassifier().fit(X_train.copy().drop(['treatment', 'target', 'target_class', 'Z'], axis=1), y_train) uplift_prob = uplift_model.predict_proba(X_test.copy().drop(['treatment', 'target', 'target_class', 'Z'], axis=1)) result['prob_CN'] = uplift_prob[:, 0] result['prob_CR'] = uplift_prob[:, 1] result['prob_TN'] = uplift_prob[:, 2] result['prob_TR'] = uplift_prob[:, 3] result['uplift_score'] = result.eval( 'prob_CN/(prob_CN+prob_CR) + prob_TR/(prob_TN+prob_TR) - prob_TN/(prob_TN+prob_TR) - prob_CR/(prob_CN+prob_CR) ') return result def uplift_two_model_approach(Xt_train, Xt_test, yt_train, yt_test, Xc_train, Xc_test, yc_train, yc_test): t_model = uplift_two_model_approach_model( Xt_train.copy().drop(['treatment', 'target', 'target_class', 'Z'], axis=1), yt_train) c_model = uplift_two_model_approach_model( Xc_train.copy().drop(['treatment', 'target', 'target_class', 'Z'], axis=1), yc_train) result_t = pd.DataFrame(Xt_test.copy()) tt_prob = t_model.predict_proba(Xt_test.copy().drop(['treatment', 'target', 'target_class', 'Z'], axis=1)) tc_prob = c_model.predict_proba(Xt_test.copy().drop(['treatment', 'target', 'target_class', 'Z'], axis=1)) result_t['uplift_score'] = tt_prob[:, 0] - tc_prob[:, 0] result_t['target'] = yt_test result_c = pd.DataFrame(Xc_test.copy()) ct_prob = t_model.predict_proba(Xc_test.copy().drop(['treatment', 'target', 'target_class', 'Z'], axis=1)) cc_prob = c_model.predict_proba(Xc_test.copy().drop(['treatment', 'target', 'target_class', 'Z'], axis=1)) result_c['uplift_score'] = ct_prob[:, 0] - cc_prob[:, 0] result_c['target'] = yc_test result = result_t.append(result_c) result['n'] = result.uplift_score.rank(pct=True, ascending=False) result = result.sort_values(by='n').reset_index(drop=True) return result def qini_rank(uplift: pd.DataFrame): ranked = pd.DataFrame(uplift.copy()) ranked['n'] = ranked.uplift_score.rank(pct=True, ascending=False) ranked = ranked.sort_values(by='n').reset_index(drop=True) return ranked def decile_chart(uplift_result, title): decile_list = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] res = [] last = 0.0 for decile in decile_list: decile_df = uplift_result.copy().loc[(uplift_result.n > last) & (uplift_result.n <= decile)].reset_index( drop=True) score_sum = decile_df.uplift_score.sum() score_count = decile_df.uplift_score.count() last = decile res.append(score_sum / score_count) sns.set_style('whitegrid') plt.figure() ax = sns.barplot(x=decile_list, y=res) for p in ax.patches: percentage = f'{p.get_height():.3f}' ##{:. 0%} x = p.get_x() + p.get_width() / 2 y = p.get_height() + 0.05 ax.text(x, y, percentage, ha='center', va='center', fontsize=10, bbox=dict(facecolor='none', edgecolor='black', boxstyle='round', linewidth=0.3)) plt.title(title) plt.show() def plotAuucValue(ranked_result, label): treatment = list(ranked_result['treatment']) target = list(ranked_result['target']) nt, nc, ntr, ncr = 0.0, 0.0, 0.0, 0.0 res = [] for i in range(0, len(target)): if treatment[i] == 1: nt += 1.0 if target[i] == 1: ntr += 1.0 if treatment[i] == 0: nc += 1.0 if target[i] == 1: ncr += 1.0 if nc == 0 or nt == 0: res.append(0) else: res.append((ntr / nt - ncr / nc) * (nt + nc)) plt.plot([ind for ind in range(0, len(res))], res, label=label) plt.xlabel("nums") plt.ylabel("gain_mean") plt.legend() return res X_train, X_test, y_train, y_test = uplift_split(df_model_bogo) Xt_train, Xt_test, yt_train, yt_test, Xc_train, Xc_test, yc_train, yc_test = uplift_split_two_model_approach( X_train, X_test, y_train, y_test) result_bogo_two_model_approach = uplift_two_model_approach(Xt_train, Xt_test, yt_train, yt_test, Xc_train, Xc_test, yc_train, yc_test) result_bogo_two_model_approach_ranked = qini_rank(result_bogo_two_model_approach) result_bogo_class_transformation = uplift_class_transformation_method(X_train, X_test, y_train, y_test) result_bogo_class_transformation_ranked = qini_rank(result_bogo_class_transformation) result_bogo_multi_classification = uplift_multi_classification_model(X_train, X_test, y_train, y_test) result_bogo_multi_classification_ranked = qini_rank(result_bogo_multi_classification) plt.figure() plotAuucValue(result_bogo_two_model_approach_ranked, "two_model_approach") plotAuucValue(result_bogo_class_transformation_ranked, "class_transformation") plotAuucValue(result_bogo_multi_classification_ranked, "multi_classification") plotAuucValue(X_test, "base_line") plt.title("AUUC") plt.show() # decile_chart(result_bogo_class_transformation_ranked, "decile chart class transformation") # decile_chart(result_bogo_two_model_approach_ranked, "decile chart two model approach")