利用jieba第三方库对文件进行关键字提取

已经爬取到的斗破苍穹文本以TXT形式存储

代码

import jieba.analyse
path = 'C:/Users/Administrator/Desktop/bishe/doupo.text'
fp = open(path,'r')
content = fp.read()
try:
    jieba.analyse.set_stop_words('C:/Users/Administrator/Desktop/bishe/aa.txt')
    tags = jieba.analyse.extract_tags(content, topK=15, withWeight=True)
    for item in tags:
        print(item[0]+'\t'+str(int(item[1]*1000)))
finally:
    fp.close()

结果

 

posted @ 2018-05-02 14:33  蓝勃斐重新开始  阅读(374)  评论(0编辑  收藏  举报