扩展学习：中文分词；词云制作

中文分词工具有很多，这里选择使用jieba jieba · PyPI

wordcloud安装时候需要注意版本号 pip安装wordcloud出错 ERROR: Command errored out with exit status 1:_密言的博客-CSDN博客_pip安装wordcloud 出错

词云代码词云可视化：四行Python代码轻松上手到精通_同济子豪兄的博客-CSDN博客_python词云代码

1. 中文分词

import jieba

seg_list = jieba.cut("我在河北大学上学，My name is Tom", cut_all=True)# 全模式
print("Full Mode: " + "/ ".join(seg_list))

seg_list = jieba.cut("我在河北大学上学，My name is Tom", cut_all=False) # 精确模式（默认是精确模式）
print("Default Mode: " + "/ ".join(seg_list))

2. 词云

简易版

import wordcloud
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np

w = wordcloud.WordCloud()
w.generate('and that government of the people, by the people, for the people, shall not perish from the earth.')
w.to_file('output1.png')

img = Image.open("output1.png")
m = np.asarray(img)
plt.imshow(m)
plt.show()

升级版

import wordcloud
import jieba
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import imageio

mk = imageio.imread("chinamap.png")

# 构建并配置词云对象w，注意要加stopwords集合参数，将不想展示在词云中的词放在stopwords集合里，这里去掉“曹操”和“孔明”两个词
w = wordcloud.WordCloud(width=1000,
                        height=700,
                        background_color='white',
                        font_path='msyh.ttc',
                        mask=mk,
                        scale=5,
                        stopwords={'曹操','孔明'})

# 对来自外部文件的文本进行中文分词，得到string
f = open('三国演义.txt',encoding='utf-8')
txt = f.read()
txtlist = jieba.lcut(txt)
string = " ".join(txtlist)

# 将string变量传入w的generate()方法，给词云输入文字
w.generate(string)

# 将词云图片导出到当前文件夹
w.to_file('output8-threekingdoms.png')
img=Image.open("output8-threekingdoms.png")
m=np.asarray(img)
plt.imshow(m)
plt.show()

posted on 2022-06-09 15:48 HBU_DAVID 阅读(88) 评论(0) 编辑收藏举报

刷新页面返回顶部

Notebook

扩展学习：中文分词；词云制作

导航

Notebook

扩展学习： 中文分词；词云制作

导航

扩展学习：中文分词；词云制作