Fork me on GitHub

spark之AHP层次分析顾客价值得分

一.什么是AHP

RFM是对顾客价值分群,但是每个群内的顾客并没有区分价值度。所以AHP就是针对每个群内的顾客进行打分去区分不同价值顾客。

什么是AHP---------------> https://baike.baidu.com/item/%E5%B1%82%E6%AC%A1%E5%88%86%E6%9E%90%E6%B3%95/1672?fr=aladdin)以及 (https://tellyouwhat.cn/p/ahp-users-value-score/)

AHP(the analytic hierarchy process),层级分析法
为每个用户计算AHP得分,并根据RFM分群结果进行同类中的客户排序
  1.建立层次结构模型
  2.构造成对比较矩阵
  3.计算权向量并做一致性检验

目标:
  针针RFM中同类价值顾客排名
  利用RFM模型中的指标R、F、M
  为每一个用户计算AHP得分(根据AHP得分对同类价值顾客进行排名)

 

二.数据

数据来自:spark之RFM客户价值分群挖掘(https://www.cnblogs.com/little-horse/p/14014812.html)

 

三.代码(spark3.0,java1.8)

详细代码见,AHP层次分析顾客价值得分(https://github.com/jiangnanboy/spark_tutorial)

/**
     * RFM聚类可以分为高价值用户、一般用户、低价值用户等。
     * 对于RFM中的同类用户的排序则使用AHP权向量给每个用户计算最终得分:利用每个用户的RFM向量与权值向量点乘得出AHP分数
     * @param dataset 经过RFM聚类后的数据
     * @param weightVector 权重向量
     */
    public static void ahpScore(Dataset<Row> dataset, List<Double> weightVector) {

        /**
         * 计算每个用户的AHP分值:
         *+----------+------------------+--------------------+----------+--------------------+
         * |customerid|          features|      scaledfeatures|prediction|            ahpscore|
         * +----------+------------------+--------------------+----------+--------------------+
         * |     12940| [46.0,4.0,876.29]|[0.12332439678284...|         1|0.024241021827781713|
         * |     13285|[23.0,4.0,2709.12]|[0.06166219839142...|         1|0.023847531248595018|
         * |     13623| [30.0,7.0,672.44]|[0.08042895442359...|         1|0.024049650279212683|
         * |     13832|  [17.0,2.0,40.95]|[0.04557640750670...|         1|0.014321280782467466|
         * |     14450|[180.0,3.0,483.25]|[0.48257372654155...|         0| 0.04870738944845504|
         * +----------+------------------+--------------------+----------+--------------------+
         */
        dataset = dataset.map((MapFunction<Row, Row>) row -> {
            int customerID = row.getInt(0);
            Vector featureVec = (Vector) row.get(1);
            Vector scaledFeatureVec = (Vector) row.get(2);
            int prediction = row.getInt(3);
            double aphScore = 0.0;
            for(int i = 0; i < weightVector.size(); i++) {
                aphScore += weightVector.get(i) * scaledFeatureVec.apply(i);
            }
            return RowFactory.create(customerID, Vectors.dense(new double[]{featureVec.apply(0), featureVec.apply(1), featureVec.apply(2)}), Vectors.dense(new double[]{scaledFeatureVec.apply(0), scaledFeatureVec.apply(1), scaledFeatureVec.apply(2)}), prediction, aphScore);
        }, RowEncoder.apply(new StructType(new StructField[]{
                new StructField("customerid", DataTypes.IntegerType, false, Metadata.empty()),//用户id
                new StructField("features", SQLDataTypes.VectorType(),false, Metadata.empty()),//rfm特征向量
                new StructField("scaledfeatures", SQLDataTypes.VectorType(), false, Metadata.empty()),//min-max标准化后的rfm特征向量
                new StructField("prediction", DataTypes.IntegerType, false, Metadata.empty()),//预测该用户的价值类别
                new StructField("ahpscore", DataTypes.DoubleType, false, Metadata.empty())//该用户的价值得分
        })));

        /**
         * 在同类价值用户中根据ahpscore排序
         * +----------+--------------------+--------------------+----------+------------------+----+
         * |customerid|            features|      scaledfeatures|prediction|          ahpscore|rank|
         * +----------+--------------------+--------------------+----------+------------------+----+
         * |     14646|[1.0,77.0,279489.02]|[0.00268096514745...|         1|0.7306140418787522|   1|
         * |     18102|[0.0,62.0,256438.49]|[0.0,0.2469635627...|         1|0.6609787921304062|   2|
         * |     14911|[1.0,248.0,132572...|[0.00268096514745...|         1|0.5933314030496094|   3|
         * |     17450|[8.0,55.0,187482.17]|[0.02144772117962...|         1|0.4982050472344627|   4|
         * |     14156|[9.0,66.0,113384.14]|[0.02412868632707...|         1|0.3430011157923704|   5|
         * +----------+--------------------+--------------------+----------+------------------+----+
         */
        dataset = dataset.withColumn("rank", functions.rank().over(Window.partitionBy("prediction").orderBy(col("ahpscore").desc())));
        dataset.show(5);
    }

 

posted @ 2020-11-21 10:49  石头木  阅读(619)  评论(0编辑  收藏  举报