亲和性分析

　　小编最近在看 Robert Layton 的数据挖掘，写随笔一方面为了加深印象，一方面为了以后方便看。

　　通常商城为了增大需求，常常把顾客愿意一起买的东西放在一起。这样顾客买的几率较大，能够刺激消费。

　　最简单的例子就是，你买了羊肉卷，那你肯定也想买墨鱼丸，买了墨鱼丸，想到了火锅底料，如果你没有想到的化，那么商家可能估计把他放在羊肉卷附近，让你很容易的看到他。

　　首先来介绍一下什么是亲和性分析。指的是某几种事物之间有着某种的联系。比如淘宝上会根据你的个人爱好或者经常浏览的东西给予推荐，某种程度上会次基消费。

　　假设商场考虑摆放面包牛奶奶酪苹果香蕉的摆放位置。肯定要遵循一些规则，比如顾客买完面包后，很大可能会买一些牛奶，所以就让面包和牛奶放在一起。任何一个规则都由前提和结论组成，上例中，前提就是顾客买卖面包，结论就是顾客很大程度上买牛奶。

　　评价规则的好坏，就是他发生的可能性的大小，越大，说明人群中符合这个规则的人越多。通常判断规则的好坏的指标有支持度和置信度。支持度代表规则有效的数目，置信度指的是有效的规则所占的比例。

　　该商场考虑5中商品。为了方便，我们将用矩阵的列表示5中商品，行代表每位个体。1代表购买，反之代表不买，同时我们不考虑（逛了一圈啥都不买的人）。

　　首先，数据可以通过调查问卷的形式（足够大，足够客观），为了省事，可以用随机数生成的方法。

　　　在设置阈值大小的时候，我们可以适当的应用一些因果关系，让生成的数据集更加准确。比如：买面包的人通常都会买一些牛奶，所以在购买面包情况下购买牛奶的阈值要比不买面包大一些；同理，买苹果之后再买香蕉的阈值比不买苹果要大一些；还可以这样想，吃完奶酪想来些清口的水果，阈值也可以设置大一些。

import numpy as np
# 创建100*5 mat

X = np.zeros((100, 5), dtype='bool')
features = ["bread", "milk", "cheese", "apples", "bananas"]

for i in range(X.shape[0]):
    if np.random.random() < 0.3:
        # 喜欢面包
        X[i][0] = 1
        if np.random.random() < 0.6:
            # 喜欢牛奶
            X[i][1] = 1
        if np.random.random() < 0.2:
            # 喜欢奶酪
            X[i][2] = 1
        if np.random.random() < 0.25:
            # 喜欢苹果
            X[i][3] = 1
        if np.random.random() < 0.5:
            # 喜欢香蕉
            X[i][4] = 1
    else:
        # 没有购买面包，那么购买牛奶的可能性也就小一些
        if np.random.random() < 0.4:
            # 喜欢牛奶
            X[i][1] = 1
            if np.random.random() < 0.2:
                # 喜欢奶酪
                X[i][2] = 1
            if np.random.random() < 0.3:
                # 喜欢苹果
                X[i][3] = 1
            if np.random.random() < 0.5:
                # 喜欢香蕉
                X[i][4] = 1
        else:
            if np.random.random() < 0.8:
                # 喜欢奶酪
                X[i][2] = 1
            if np.random.random() < 0.6:
                # 喜欢苹果
                X[i][3] = 1
            if np.random.random() < 0.7:
                # 喜欢香蕉
                X[i][4] = 1
    if X[i].sum() == 0:
        X[i][4] = 1  # 不考虑（单纯逛超市的人）

　　得到数据集。下面开始生成没种情况的比例：

from collections import defaultdict

import numpy as np

dataset_filename = "affinity_dataset.txt"
x = np.loadtxt(dataset_filename)
n_samples, n_features = x.shape

# print(n_samples, n_features)



# 创建字典
valid_rules = defaultdict(int)
invalid_rules = defaultdict(int)
num_occurences = defaultdict(int)
confidence = defaultdict(float)
# 特征值
features = ["bread", "milk", "cheese", "apples", "bananas"]

for sample in x:
　　# premise为前提
    for premise in range(5):
        if sample[premise]==0:continue
        num_occurences[premise] += 1
　　　　# conclution为结论
        for conclusion in range(n_features):
            if premise == conclusion:continue
            if sample[conclusion] == 1:
　　　　　　　　　　# 有效规则字典
                valid_rules[(premise, conclusion)] +=1
            else:
　　　　　　　　　　# 无效规则
                invalid_rules[(premise, conclusion)] += 1
# 支持度
support = valid_rules
for premise,conclusion in valid_rules.keys():
    rule = (premise, conclusion)
　　# 计算置信度
    confidence[rule] = valid_rules[rule]/num_occurences[premise]

下面构建一个函数：输入前提和结论，就能根据数据集计算出支持度和置信度。

def print_rule(premise,conclusion, support, confidence,features):
    premise_name = features[premise]
    conclusion_name = features[conclusion]
    print("Rule :If a persion buys {0} they will also buy {1}".format(premise_name, conclusion_name))
    print("-- Support:{0}".format(support[(premise,conclusion)]))
    print("-- Confidence:{0:.3f}".format(confidence[(premise,conclusion)]))

同时可以对置信度进行排序：

from operator import itemgetter

sorted_confidence = sorted(confidence.items(),key=itemgetter(1), reverse=True)

for index in range(5):
    print("Rule #{}".format(index + 1))
    premise,conclusion = sorted_confidence[index][0]
    print_rule(premise, conclusion, support, confidence, features)

注意，这里的confidence种的键再变化。不像平时的键，都是一致的。eg：

rows = [
　　{'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},
　　{'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
　　{'fname': 'John', 'lname': 'Cleese', 'uid': 1001},
　　{'fname': 'Big', 'lname': 'Jones', 'uid': 1004}
　　]

rows_by_fname = sorted(rows, key=itemgetter('fname'))

参考链接： https://www.cnblogs.com/baxianhua/p/8182627.html

　　对顾客的行为做出预测后，一定程度上迎合了顾客的消费行为，销量就会提升一部分。

参考书籍： Robert Layton 的python数据挖掘与分析

posted @ 2020-02-03 15:26 为红颜阅读(757) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

一笑为红颜

亲和性分析

公告