使用Apriori挖掘齐桓公的朋友圈
为了细化齐桓公的朋友圈,又因为与齐桓公相关的人物太多,需要找出与其强相关的人物进行细化。
1、把左传中出现人名的句子中的人名组抽出来
data_qihuangong = pd.read_csv(r'E:\fsgs\relationship\qihuangong.txt',sep=' ') def eachnrs(group): group = group.reset_index() nrs_each = group.ix[:,'word'].tolist() return nrs_each data_nr_2 = data_qihuangong[data_qihuangong['nr']>1].reset_index() nrs_all = [] sep_nr_2 = sep[sep.sentence.isin(data_nr_2['id'])] sep_nr_2 = sep_nr_2.ix[sep_nr_2.biaozhu=='nr',:] nrs_each = sep_nr_2.groupby('sentence').apply(eachnrs)
得到如下图所示:
2、编写Apriori函数
def aproil(L,nrs_nr,k,diedai): diedai = diedai +1 new_L =[] #----new_L是大于支持度的项 for i in L: count =0 for j in nrs_nr: if np.in1d(i,j).all(): count = count + 1 if count>k: new_L.append(i) if len(new_L)>1: #----生成新的组配 L = [] for i in new_L: if type(i)==list: for j in i: L.append(j) else: L.append(i) new_L=[] L =set(L) for i in itertools.combinations(L,diedai): new_L.append(list(i)) return aproil(new_L,nrs_nr,k,diedai) elif len(new_L)==1: return new_L[0] else: #----生成新的组配 new_L = [] for i in L: if type(i)==list: for j in i: new_L.append(j) else: new_L.append(i) new_L =set(new_L) return new_L
3、把含有齐桓公的句子和不含的句子分开,并调用编写的Apriori函数进行关联挖掘,设置支持度为1
def pengyou(nr,nrs,k): print('ok',nr) count = 0 index = [] other = [] for i in range(len(nrs)): if nr in nrs[i]: index.append(i) else: other.append(i) #print(index) nrs_nr = nrs[index] nrs_other = nrs[other] #-----找到了nr的语料 #-----初始化L L = (set(i) for i in nrs_nr) L = set.union(*L) diedai = 1 new_L = aproil(L,nrs_nr,k,diedai) #print('new_L=',new_L) return new_L
pengyou_qihuan = pengyou('齐桓公',nrs_each.values,1)
'''
pengyou_qihuan
Out[17]: {'僖', '管敬仲', '莊', '蔡姬', '齐桓公'}
'''
4、结果解释
'僖':齐僖公是齐桓公的父亲。
'管敬仲':管仲是齐桓公的重臣。
'莊':齐庄公是齐桓公的祖父。
'蔡姬':蔡姬是齐桓公的宠妾