中文词频统计及词云制作

1.中软国际华南区技术总监曾老师还会来上两次课，同学们希望曾老师讲些什么内容？（认真想一想回答）

希望曾老师能讲下Python的主要运用领域以及未来就业的方向。

2.中文分词

下载一中文长篇小说，并转换成UTF-8编码。
使用jieba库，进行中文词频统计，输出TOP20的词及出现次数。
**排除一些无意义词、合并同一词。
**使用wordcloud库绘制一个词云。

import jieba
book=open('尸语者2.txt','w')
book.write(''')
book = "尸语者2.txt"
txt = open(book,"r",encoding='utf-8').read()

ex = {'法医','师父','尸体','尸语者'}

words = jieba.lcut(txt)
counts = {}
for word in words:
    if len(word) == 1:
        continue
    else:
        counts[word] = counts.get(word,0)+1

for word in ex:
    del(counts[word])
    
items = list(counts.items())
items.sort(key = lambda x:x[1], reverse = True)
for i in range(10):
    word , count = items[i]
    print ("{:<10}{:>5}".format(word,count))

posted @ 2017-09-25 11:24 21黄玺恒阅读(165) 评论(0) 编辑收藏举报

刷新页面返回顶部

哥哥让你爽

中文词频统计及词云制作

公告