Python 结巴分词(2)关键字提取

 

提取关键字的文章是,小说完美世界的前十章;

我事先把前十章合并到了一个文件中;

然后直接调用关键字函数;

 1 import sys
 2 sys.path.append('../')
 3 
 4 import jieba
 5 import jieba.analyse
 6 from optparse import OptionParser#引入关键词的包
 7 from docopt import docopt
 8 data_path = "C:\\Users\\wangyuguang\\Desktop\\work_data\\profect_world\\"
 9 topK = 10
10 withWeight = False
11 content = ""
12 for i in range(1,2):
13     Data_path = data_path + "he"+".txt"
14     content ="".join(open(Data_path, 'rb').read())
15 # print content
16 tags = jieba.analyse.extract_tags(content, topK=topK, withWeight=withWeight)#直接调用
17 
18 if withWeight is True:
19     for tag in tags:
20         print("tag: %s\t\t weight: %f" % (tag[0],tag[1]))
21 else:
22     print(",".join(tags))

关键字结果:

Building prefix dict from the default dictionary ...
Loading model from cache c:\users\wangyuguang\appdata\local\temp\jieba.cache
Loading model cost 0.386 seconds.
Prefix dict has been built succesfully.
小不点,孩子,族长,石云峰,石村,凶禽,青鳞鹰,凶兽,一群,石昊
posted on 2016-07-18 20:37  细雨微光  阅读(8973)  评论(0编辑  收藏  举报