词云的绘制

wordcloud库中的函数WordCloud类主要参数：

参数	类型	说明
font_path	string	使用字体的路径，Windows系统下可以在C:\Windows\Fonts中查找本机已安装的字体。
width,height	int (default=400)，int (default=200)	词云的宽高。
prefer_horizontal : float (default=0.90)	float (default=0.90)	水平拟合单词占比。与垂直拟合单词占比默认为0.1
mask	nd-array or None (default=None)	词云的形状；我们可以找一张素材图，用PIL读取并转换成numpy数组然后传入这个参数。白色区域会被直接忽略。
contour_width	float (default=0)	词云的轮廓
contour_color	color value (default="black")	轮廓的颜色
scale	float (default=1)	对词云按比例缩放。如果参数过小会显得很模糊。大的词云会更精细。
min_font_size	int (default=4)	显示的字体最小值
max_font_size	int or None (default=None)	显示字体的最大值
font_step	int (default=1)	字体步长，或者说间距；若>1加快运算但拟合效果下降。
max_words	number (default=200)	词云显示的单词数最大值
stopwords	set of strings or None	将被屏蔽的单词。如果为None，将使用内置STOPWORDS列表。如果使用generate_from_frequencies则忽略。
background_color	color value (default=”black”)	背景颜色
mode	string (default="RGB")	当mode为“RGBA”且background_color为None时，将生成透明背景。
relative_scaling	float (default=auto)	词频对字体大小的权重。当relative_scaling=0时，只考虑单词顺序。如果relative_scaling=1，那么频率加倍的单词大小也会加倍。如果你想考虑单词频率，而不仅仅是它们的排名，relative_scaling在0.5左右通常看起来不错。如果'auto'，它将被设置为0.5，除非repeat为true，在这种情况下，它将被设置为0。
color_func	callable, default=None	为每个单词返回一个PIL颜色数据
regexp	string or None (optional)	使用正则表达式分割文本
collocations	bool, default=True	是否包含两个词的搭配
colormap	string or matplotlib colormap, default="viridis"	给单词随机分配颜色
normalize_plurals	bool, default=True	是否删除单词末尾的s并添加到不带s的词里去，用于统计英文文本
repeat	bool, default=False	是否重复单词，直到达到max_words或min_font_size。
min_word_length	int, default=0	一个单词必须包含的最小字母数。

下面介绍该类的常用方法：

方法	参数	返回值	说明
generate generate_from_text	text:str	<class 'wordcloud.wordcloud.WordCloud'>	输入的“文本”应该是自然的文本。如果传递一个排序的单词列表，单词将在输出中出现两次。要删除这个重复，设置' ' collocations=False ' '。
generate_from_frequencies	frequencies, max_font_size=None frequencies : dict from string to float max_font_size : int Use this font-size instead of self.max_font_size	同上	根据词频生成词云。
fit_words	frequencies	同上	根据词频生成词云
to_file	filename	同上	保存词云图片
to_array	无	ndarray	将词云转换为numpy数组

下面尝试将党的历史决议进行高频词提取，并绘制成词云：

import jieba
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from wordcloud import WordCloud

plt.rcParams['font.sans-serif']=['SimHei']#正常显示中文
p=np.array(Image.open(r"circle.jpg"))
s=open(r"中共中央关于党的百年奋斗重大成就和历史经验的决议.txt",
       "r",
       encoding="utf-8")

def fun1():#这个函数用来统计文本中的高频词，返回一个“词-频”对的字典
    counts={}
    x = jieba.lcut(s.read())
    for word in x:
        if len(word)==1:
            pass
        else:
            counts[word]=counts.get(word,0)+1
    items=list(counts.items())
    items.sort(key=lambda x:x[1],reverse=True)
    return dict(items[:150])

def fun0(d):#这个函数用来生成词云
    x=jieba.lcut(s.read())
    picture=WordCloud(font_path='simkai.ttf',#指定字体的路径，直接填入字体名称也可以
                      mask=p,#指定词云的形状
                      scale=5,#对词云进行5倍扩大，词云将更加精细
                      background_color="white",#设置背景颜色
                      prefer_horizontal=0.9,#水平词占比
                      contour_width=1.5,#画出mask图的轮廓
                      contour_color="red",
                      min_font_size=4,#最小词大小
                      ).fit_words(d)
    
    picture.to_file("历史决议.png")#保存文件
    plt.imshow(picture)
    plt.show()

def draw_circle():#画个简单的圆，用来做词云的形状
    fig,ax=plt.subplots()
    ax.axis("equal")
    plt.axis("off")
    a = np.linspace(0, 2 * np.pi, 1000)
    x0 = 5*np.cos(a)
    y0 = 5*np.sin(a)
    plt.fill(x0,y0)
    plt.savefig("circle.jpg")
    plt.show()

最后的结果：

然后绘制一下饼图：

def draw0(d):
    fig,ax=plt.subplots(figsize=(5,5))
    ax.set(title="党的历史决议词频图")
    ax.pie(list(d.values())[:50],explode=[0.1]+[0]*49,
            labels=list(d.keys())[:50],
            autopct="%.1f%%",
            shadow=True,
            labeldistance=0.8,
            pctdistance=0.4,
            startangle=30,
            radius=1.3,
            counterclock=True
            )
    plt.show()

posted @ 2021-12-04 09:59 zeroy610 阅读(190) 评论(0) 编辑收藏举报

刷新页面返回顶部

zeroy610

词云的绘制

公告