使用Apriori挖掘齐桓公的朋友圈

为了细化齐桓公的朋友圈,又因为与齐桓公相关的人物太多,需要找出与其强相关的人物进行细化。

1、把左传中出现人名的句子中的人名组抽出来

data_qihuangong = pd.read_csv(r'E:\fsgs\relationship\qihuangong.txt',sep='    ')
def eachnrs(group):
    group = group.reset_index()
    nrs_each = group.ix[:,'word'].tolist()
    return nrs_each
  
data_nr_2 = data_qihuangong[data_qihuangong['nr']>1].reset_index()
nrs_all = []
sep_nr_2 = sep[sep.sentence.isin(data_nr_2['id'])]
sep_nr_2 = sep_nr_2.ix[sep_nr_2.biaozhu=='nr',:]
nrs_each = sep_nr_2.groupby('sentence').apply(eachnrs)

得到如下图所示:

2、编写Apriori函数

def aproil(L,nrs_nr,k,diedai):
    diedai = diedai +1
    new_L =[]
    #----new_L是大于支持度的项
    for i in L:
        count =0
        for j in nrs_nr:
            if np.in1d(i,j).all():
                count = count + 1
        if count>k:
            new_L.append(i)
    if len(new_L)>1:
        #----生成新的组配
        L = []
        for i in new_L:
            if type(i)==list:
                for j in i:
                    L.append(j)
            else:
                L.append(i)
        new_L=[]
        L =set(L)
        for i in itertools.combinations(L,diedai):
                new_L.append(list(i))
        return aproil(new_L,nrs_nr,k,diedai)
    elif len(new_L)==1:
        return new_L[0]
    else:
                #----生成新的组配
        new_L = []
        for i in L:
            if type(i)==list:
                for j in i:
                    new_L.append(j)
            else:
                new_L.append(i)
        new_L =set(new_L)
        return new_L

3、把含有齐桓公的句子和不含的句子分开,并调用编写的Apriori函数进行关联挖掘,设置支持度为1

def pengyou(nr,nrs,k):
    print('ok',nr)
    count = 0
    index = []
    other = []
    for i in range(len(nrs)):
        if nr in nrs[i]:
            index.append(i)
        else:
            other.append(i)
   #print(index)
    nrs_nr = nrs[index]
    nrs_other = nrs[other]
    #-----找到了nr的语料
    #-----初始化L
    L = (set(i) for i in nrs_nr)
    L = set.union(*L)
    diedai = 1
    new_L = aproil(L,nrs_nr,k,diedai)
    #print('new_L=',new_L)
    return new_L

pengyou_qihuan = pengyou('齐桓公',nrs_each.values,1)

'''

pengyou_qihuan
Out[17]: {'僖', '管敬仲', '莊', '蔡姬', '齐桓公'}

'''

4、结果解释

'僖':齐僖公是齐桓公的父亲。

'管敬仲':管仲是齐桓公的重臣。

'莊':齐庄公是齐桓公的祖父。

'蔡姬':蔡姬是齐桓公的宠妾

posted @ 2017-12-01 14:43  草莓干123456  阅读(247)  评论(0编辑  收藏  举报