中文词频统计

主要代码如下:
排序出高频词
#
-*- coding: utf-8 -*-""" from wordcloud import WordCloudimport matplotlib.pyplot as plt import jieba article = open('hlm.txt',encoding='UTF-8').read() dele = {'。','!','?','】','“','”','(',')',' ','》','《',','} jieba.add_word('贾宝玉') words = list(jieba.cut(article)) articleDict = {} articleSet = set(words)-dele for w in articleSet: if len(w)>1: articleDict[w] = words.count(w) articlelist = sorted(articleDict.items(),key = lambda x:x[1], reverse = True) cut_text = " ".join(words) 'print(cut_hlm)' mywc = WordCloud().generate(cut_hlm) plt.imshow(mywc)plt.axis("off") plt.show() ''' for i in range(20): print(articlelist[i]) import pandas as pd pd.DataFrame(data=articlelist).to_csv('test.csv',encoding='UTF-8') '''

红楼梦高频词:

红楼梦整篇小说:

成功排序出高频词:

生成词云如下:

 

posted @ 2019-03-18 16:18  xbk6  阅读(437)  评论(0编辑  收藏  举报