对美国历届总统就职演说统计分析

目标

对历年美国总统就职演说词频分析，并绘制云图

准备

分析工具：matplotlib，wordcloud
数据文档：美国历届总统就职演说英文版文档
这些文档，放到了后面

效果

分析

1、出现最多的是government，第二是People
2、出现了很多new hope great well best life duty等词，说明在新一任开始时，总是对未来有美好的希望，或者总统对人民有美好的承诺
3、War和peace，战争和和平同时出现了，两者相对相生，这也是伴随美国历史的永久话题，从独立战，南北战，第一第二次世界大战，美苏冷战，越战，朝鲜战争，阿富汗等等，美国打了太多仗，可笑的是美国总统是诺贝尔和平奖的热门人选，可能这就是“以战止战”😃。
4、出现了很多freedom law right spirit constitution justice liberty等，这是政治上常提的，同时也有很多是美国一直对外的形象输出，自由，公民，权利等
5、United States America美国被多次提到，毕竟是美国总统，但world也高居前列，美国的全球战略可见一斑

程序及数据文档

绘制云图所使用罩图

程序

from wordcloud import WordCloud
import matplotlib.pyplot as plt

# 打开美国历届总统就职演说文档
f1 = open(r"C:\Users\Desktop\美国历届总统就职演说.txt", encoding='utf8')
text = f1.read()

# 数据符号清洗
text = text.lower()  # 所有转化为小写
delchar = "-,:.;\""  # 去除标点符号
for ch in delchar:
    text = text.replace(ch, ' ')
words = text.split()  # 去除不可见字符，转为列表

# 对单词进行统计
counts = {}
for i in words:
    counts[i] = counts.get(i, 0) + 1

# 对数据进行二次清洗，去掉无意义的词
# delword包含一些无意义的代词，介词，连词等等
delword = ['i', 'you', 'he', 'she', 'it', 'her', 'his', 'your', 'our', 'we', 'us', 'the', 'a', 'for', 'on', 'in',
           'with', 'of', 'to', 'are', 'is', 'am', 'were', 'was', 'and', 'or', 'what', 'where', 'which', 'who', 'that',
           'be', 'by', 'as', 'not', 'will', 'can', 'all', 'this', 'their', 'but', 'have', 'has', 'its', 'from', 'my',
           'been', 'no', 'they', 'so', 'an', 'upon', 'must', 'may', 'at', 'should', 'them', 'shall', 'those', 'more',
           'if', 'any', 'every', 'these', 'other', 'without', 'men', 'when', 'most', 'just', 'let', 'some', 'much',
           'many', 'two', 'him', 'none', 'myself', 'four', 'ours', 'soon', 'me', 'between', 'out', 'first', 'because',
           'over', 'nor', 'one', 'only', 'there', 'than', 'such', 'do', 'would', 'under', 'now', 'before', 'never',
           'into', 'each', 'again', 'thing', 'since', 'self', 'called', 'had', 'both', 'among', 'through', 'within',
           'even', 'up', 'toward', 'like', 'say', 'once', 'call', 'about', 'ever', 'man', 'too', 'less', 'then',
           'while', 'yet', 'here', 'still', 'others', 'itself', 'could', 'themselves', 'being', 'always', 'against',
           'years', 'fellow', 'time']
for i in delword:
    del (counts[i])

# 转化为列表，进行排序
items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=True)

# 生成词云，取前50个
fifty = dict(items[:50])  # 将前50个变为字典
c_mask = plt.imread(r'C:\Users\Desktop\1.jpg')  # 读入罩图
wc = WordCloud(font_path="simhei.ttf",  # 设置字体
               mask=c_mask,  # 添加罩图
               background_color="white",  # 设置背景色
               max_font_size=100)  # 最大字体
wc.generate_from_frequencies(fifty)  # 产生词云
wc.to_file(r'C:\Users\Desktop\11.jpg')  # 保存图片

文档数据

链接：https://pan.baidu.com/s/1Msk8mvAZdE-OV1XxRqXt3w
提取码：bo1z

posted @ 2020-01-12 11:54 启林O_o 阅读(406) 评论(0) 收藏举报

刷新页面返回顶部

启林O_o忍住诱惑，耐住寂寞

忍住诱惑，耐住寂寞。

对美国历届总统就职演说统计分析

目标

准备

效果

分析

程序及数据文档

绘制云图所使用罩图

程序

文档数据

公告