利用WordCloud和jieba生成词云图(也叫文字云), (同样的代码,有的图片可以生成mask起作用,有的就不起作用,这个还不不知道原因)
Python生成词云的常用库「wordcloud」。安装: pip install wordcloud
wordcloud默认是为了英文文本来做词云的,如果需要制作中文文本词云,就需要先对中文进行分词。这里就需要用到中文分词库「jieba」。安装:pip install jieba
# coding: utf-8 # Project:pythonProject8 # File:词云图.py # Author:李凤娟 # Date :2023/9/12 14:09 # IDE:PyCharm # 功能:生成词云图 from wordcloud import WordCloud import jieba from collections import Counter from matplotlib import pyplot as plt from imageio.v2 import imread """ word.txt里的内容为: Python C逆向 C++逆向 C++逆向 C++逆向 C逆向 网络爬虫 数据解析 """ # 读取文件内容 with open('word.txt', 'r', encoding='utf-8') as f: words = f.read() # 增加jieba库词语(一些完整的词可能被分割) jieba.add_word("网络爬虫") jieba.add_word("JS逆向") jieba.add_word("APP逆向") jieba.add_word("C逆向") jieba.add_word("C++逆向") jieba.add_word("网络数据") # 使用jieba进行分词 words_list_jieba = jieba.lcut(words) # ['Python', '\n', 'C逆向', '\n', 'C++逆向', '\n', 'C++逆向', '\n', 'C++逆向', '\n', 'C逆向', '\n', '网络爬虫', '\n', '数据', '解析'] # 定义需要排除的词语集合 excluded_words = ["将", "\n", "地", '小说', '侧', "又", "一雄", "如何", "什么", '可以', '吗', "只是", "他", "本", '们', " ", "…", "把", "人", "很", '那么', '着', '太', '能', '给', '不是', '里', "被", "就是", "一个", "没有", "剧", "让", "/", "而", "与", "一部", "的", "我", "你", "她", "我们", "你们", "他们", "是", "在", "了", "有", "这", "那", "就", "也", "还", "但", "如果", "然后", "因为", "所以", "一", "二", "三", "四", "五", "六", "七", "八", "九", "十", "百", "千", "万", "个", "这些", "那些", "更", "最", "好", "坏", "大", "小", "高", "低", "长", "短", "新", "旧", "常", "少", "多", "全", "每", "些", "去", "来", "到", "从", "为", "以", "对", "和", "或", "及", "上", "下", "中", "前", "后", "左", "右", "内", "外", "间", "部", "种", "年", "月", "日", "时", "分", "秒", "这个", "那个", "这样", "那样", "一些", "很多", "非常", "可能", "一定", "一直", "经常", "不断", "不只", "不要", "不得", "不能", "无法", "没法", "必须", "应该", "需要", "会", "想", "要", "找", "看", "听", "说", "写", "读", "学", "做", "吃", "喝", "睡", "玩", "工作", "生活", "家庭", "朋友", "嫌", "感觉", "思考", "想法", "方法", "原因", "结果", "可能性", "比较", "不同", "相同", "重要", "容易", "困难", "简单", "复杂", "正确", "错误", ",", "。", "!", "?", ";", ":", "“", "”", "‘", "’", "(", ")", "【", "】", "《", "》", "——", "—", "·", "、", "~", "像", "“", "”", ",", "\'", "\"", ","]
# 过滤排除的词语 words_list = [x for x in words_list_jieba if x not in excluded_words] # ['Python', 'C逆向', 'C++逆向', 'C++逆向', 'C++逆向', 'C逆向', '网络爬虫', '数据', '解析'] # 使用Counter进行词频统计 word_counter = Counter(words_list) # Counter({'C++逆向': 3, 'C逆向': 2, 'Python': 1, '网络爬虫': 1, '数据': 1, '解析': 1}) sorted_file = word_counter.most_common() #[('C++逆向', 3), ('C逆向', 2), ('Python', 1), ('网络爬虫', 1), ('数据', 1), ('解析', 1)] print(sorted_file) # 加载图像作为遮罩 mask = imread("2.png") # 生成词云时指定遮罩 # 将字体文件(simhei.ttf)放到本项目目录下,或者指定C:\Windows\Fonts\simhei.ttf目录 wordcloud = WordCloud( font_path='simhei.ttf', background_color='white', mask=mask).generate_from_frequencies(dict(sorted_file)) # 保存词云图 wordcloud_image_path = 'wordcloud.png' wordcloud.to_file(wordcloud_image_path) # 到目前为止图片生成完毕!!!! # 下边只是展示图片,无所谓的 # 生成图片 image = wordcloud.to_image() # 展示图片 image.show() # 显示词云图 plt.imshow(wordcloud, interpolation='bilinear') plt.axis('off') plt.figure() plt.show()
把WordCloud()里的mask去掉,就会生成指定大小的图片
wordcloud = WordCloud( font_path='simhei.ttf', background_color='white', height=1000, width=800 ).generate_from_frequencies(dict(sorted_file))
stopwords 的用法
stopwords = {}.fromkeys(["爬虫"]) # 生成词云时指定遮罩 # 将字体文件(simhei.ttf)放到本项目目录下,或者指定C:\Windows\Fonts\simhei.ttf目录 wordcloud = WordCloud( font_path='simhei.ttf', background_color='white', height=600, width=800, stopwords=stopwords, mask=mask).generate_from_frequencies(dict(sorted_file))
分类:
Python
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· CSnakes vs Python.NET:高效嵌入与灵活互通的跨语言方案对比
· DeepSeek “源神”启动!「GitHub 热点速览」
· 我与微信审核的“相爱相杀”看个人小程序副业
· Plotly.NET 一个为 .NET 打造的强大开源交互式图表库
· 上周热点回顾(2.17-2.23)