draak

导航

jieba库使用以及好玩的词云

jieba库、词云(wordcloud)的安装

打开window的CMD(菜单键+R+Enter)

一般情况下:输入pip install jiaba(回车),等它下好,建议在网络稳定的时候操作

不行就试试这个:pip install -i https://pypi.tuna.tsinghua.edu.cn/simple jiaba

词云安装也是如此:pip install -i https://pypi.tuna.tsinghua.edu.cn/simple wordcloud

显示Successfully installed....就安装成功了(如下图👇:)

jieba库的使用

用jieba库分析文章、小说、报告等,到词频统计,并对词频进行排序

代码👇

(仅限用中文):

 1 # -*- coding: utf-8 -*-
 2 """
 3 Created on Wed Apr 22 15:40:16 2020
 4 
 5 @author: ASUS
 6 """
 7 #jiaba词频统计
 8 import jieba
 9 txt = open(r'C:\Users\ASUS\Desktop\创意策划书.txt', "r", encoding='gbk').read()#读取文件
10 words  = jieba.lcut(txt)#lcut()函数返回一个列表类型的分词结果
11 counts = {}
12 for word in words:
13     if len(word) == 1:#忽略标点符号和其它长度为1的词
14         continue
15     else:
16         counts[word] = counts.get(word,0) + 1
17 items = list(counts.items())#字典转列表
18 items.sort(key=lambda x:x[1], reverse=True) #按词频降序排列
19 n=eval(input("词的个数:"))#循环n次
20 for i in range(n):
21     word, count = items[i]
22     print ("{0:<10}{1:>5}".format(word, count))
jieba分词

(用于英文需要做些许调整):

def getText():
    txt=open('hamlet.txt')#文件的存储位置
    txt = txt.lower()#将字母全部转化为小写
    for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_‘{|}~':
        txt = txt.replace(ch, " ")   #将文本中特殊字符替换为空格
    return txt

 

好玩的词云

做一个词云图

 1 import jieba
 2 import wordcloud
 3 import matplotlib.pyplot as plt
 4 f = open(r"C:\Users\ASUS\Desktop\创意策划书.txt", "r", encoding="gbk")#有些电脑适用encoding="utf-8",我电脑只能用encoding="gbk",我也不知道为啥
 5 t = f.read()
 6 f.close()
 7 ls = jieba.lcut(t)
 8  
 9 txt = " ".join(ls)
10 w = wordcloud.WordCloud( \
11     width = 4800, height = 2700,\
12     background_color = "black",
13     font_path = "msyh.ttc"    #msyh.ttc可以修改字体,在网上下载好自己喜欢的字体替换上去
14     )
15 myword=w.generate(txt)
16 plt.imshow(myword)
17 plt.axis("off")
18 plt.show()
19 w.to_file("词频.png")#生成图片
词云图

 

 

统计的内容可以忽略,代码可以认真看看

posted on 2020-04-22 20:15  c-pig  阅读(452)  评论(0编辑  收藏  举报