python实现关键词共现矩阵
python实现关键词共现矩阵,将下图中同时出现的关键词,
转化为下图的共现矩阵。
代码如下:
import pandas as pd import numpy as np data = pd.read_excel(r'E:\Python\data.xlsx',header=None) keyword = (set(i.split('/')) for i in data.loc[:,2]) keyword = set.union(*keyword)#所有关键词 togo = pd.DataFrame(np.zeros([len(keyword),len(keyword)]),columns=keyword,index=keyword) for i in data.iloc[:,2]: line = i.split('/') togo.loc[line,line] = togo.loc[line,line] + 1 for i in range(len(togo)):#对角线都为0 togo.iloc[i,i]=0 togo.to_csv(r'E:\Python\togo.csv')
最后生成的表格如上图,总长度较大,不方便展示,下图大概体现下共现矩阵的信息。