AARRR:2.4

学习来源:

https://blog.csdn.net/qq_22790151/article/details/109700735

https://blog.csdn.net/fei347795790/article/details/98620124

https://zhuanlan.zhihu.com/p/285676746

 

import pandas as pd
df=pd.read_csv('user_behavior.csv')
df['timestamps']=pd.to_datetime(df['timestamps'],unit='s')

1.购买前十的商品及数目:

(1)mysql:

SELECT *,row_number() over(order by t.`购买次数` desc) as 排序 from
(SELECT item_id,count(*) as 购买次数 from userbehavior
where behavior='buy'
GROUP BY item_id) t

 

 

 没有使用并列排序,因为购买次数均为1-4

同理可求点击:

SELECT *,row_number() over(order by t.`点击次数` desc) as 排序 from
(SELECT item_id,count(*) as 点击次数 from userbehavior
where behavior='pv'
GROUP BY item_id) t

 

 

 

(2)python

df2=pd.DataFrame(columns=['购买前10类目','购买前10数量'])
pv_count= df[df["behavior"]=='buy']["category_id"].value_counts().head(10)
df2['购买前10类目']=pv_count.index
df2['购买前10数量']=pv_count.values
df2

 

 

 把category_id改成item_id

 

 

 同理可求点击:

df1=pd.DataFrame(columns=['点击前10类目','点击前10数量'])
pv_count= df[df["behavior"]=='pv']["category_id"].value_counts().head(10)
df1['点击前10类目']=pv_count.index
df1['点击前10数量']=pv_count.values

 

 

 2.购买率

#1.日购买行为
day_buy_user_num = df[df.behavior == 'buy'].drop_duplicates(['user_id', 'dates']).groupby('dates')['user_id'].count()
day_active_user_num = df.drop_duplicates(['user_id', 'dates']).groupby('dates')['user_id'].count()
day_buy_rate = day_buy_user_num / day_active_user_num

#2.时购买行为
hour_buy_user_num = df[df.behavior == 'buy'].drop_duplicates(['user_id', 'hour']).groupby('hour')['user_id'].count()
hour_active_user_num = df.drop_duplicates(['user_id', 'hour']).groupby('hour')['user_id'].count()
hour_buy_rate = hour_buy_user_num / hour_active_user_num

3.复购率

(1)mysql

原文方法,结果是:

SELECT count(t.user_id) as 购买人数,
  count(case when t.`购买次数`>1 then t.user_id else null end)as 复购人数,
  CONCAT(round(100*count(case when t.`购买次数`>1 then t.user_id else null end)/count(t.user_id),2),'%') as 复购率
from
  (

  SELECT
    user_id,count(item_id) as 购买次数
  from userbehavior
  where behavior='buy'
  GROUP BY user_id

  )t
;

 

 

 我的方法是:

select sum(case when t2.`购买次数`>0 then 1 else 0 end)as 购买人数,
sum(case when t2.`购买次数`>1 then 1 else 0 end)as 复购人数,
CONCAT(round(100*sum(case when t2.`购买次数`>1 then 1 else 0 end)/count(t2.user_id),2),'%') as 复购率
from
(
  select user_id,count(date1) as '购买次数'
  from
    (
    SELECT
      user_id,date1
    from userbehavior
    where behavior='buy'
    GROUP BY user_id,date1

    ) t1
  group by user_id
) t2

 

 

 原因在于口径的原因,购买次数的计算,我的计算:第一天购买,以后的每一天任一次或多次购买均属于复购,按照日期。作者是根据类别count,购买了A,然后购买了B就算复购。如果同一天上午购买,下午也购买算复购,指标计算也需要更改。

(2)python

方法一:

#计算复购率(9日复购率)
data_user_buy_all =df[df["behavior"]=='buy'].groupby("user_id")["dates"].apply(lambda x:len(x.unique()))
first = data_user_buy_all.count()
again = data_user_buy_all[data_user_buy_all>=2].count()
print("复购率:",format(again/first,".2%"))

 

 

 方法二:

#计算复购率(9日复购率)
df_rebuy = df[df.behavior == 'buy'].drop_duplicates(['user_id','dates']).groupby('user_id')['dates'].count()
First = df_rebuy.count()
Again = df_rebuy[df_rebuy >= 2].count()

 

 

 4.RF分析

由于并没有消费金额的记录,因此忽略RFM模型中的M,用购买频率F和最近一次购买时间R来划分顾客。

-- 建立R表,离2017-12-04日期越近,R值越高
-- 建立F表,区间段里购买次数越多,F值越大

df4=df
df4['timestamps']=pd.to_datetime(df4['timestamps'],unit='s')
df4['日期']=df4['timestamps'].dt.strftime('%Y-%m-%d').astype('datetime64[ns]')
df4=df4[(df4['timestamps']>=datetime(2017,11,25,0,0,0))&(df4['timestamps']<datetime(2017,12,4,0,0,0))&(df4['behavior']=='buy')]

#求R值
df4['diff']=(datetime(2017,12,4)-df4['日期']).dt.days
r=df4.groupby('user_id')['diff'].min().reset_index()

#求F值(用户在某日多次购买,只计一次)
f=df4.groupby(['user_id','日期']).count().reset_index().groupby('user_id')['日期'].count().reset_index() 

#将表r、表f合并
df5=pd.merge(r,f,on='user_id',how='inner')[['user_id','diff','日期']]
df5.columns=['user_id','r','f']

 

 

 打分:首先分5类,然后分2类

#基于业务节点打分 (区间左开右闭)
df5['r_score']=pd.cut(df5['r'],bins=[0,1,3,5,7,100],labels=[5,4,3,2,1]).astype(int)
df5['f_score']=pd.cut(df5['f'],bins=[0,1,3,5,7,100],labels=[1,2,3,4,5]).astype(int)
#得分与均值比较
df5['r是否大于平均值']=(df5['r_score']>df5['r_score'].mean())*1 #r平均值3.53
df5['f是否大于平均值']=(df5['f_score']>df5['f_score'].mean())*1 #f平均值1.69
df5['rfm']=10*df5['r是否大于平均值']+1*df5['f是否大于平均值']

 

 

 然后给对应的类别打上业务标签:

 

 

posted @ 2022-08-28 13:02  萧六弟  阅读(35)  评论(0编辑  收藏  举报