【Python】对我自己的博客进行统计,看看哪年哪月发帖量最大
代码很简单,主要利用了requests进行网络访问,beautifulSoup进行页面文本分析,re进行正则表达式抽取文字,前面两个需要pip install name去安装,后者是内部对象所以不用安装。代码如下,只有区区二十七行:
#encoding=utf-8 from bs4 import BeautifulSoup import requests import re user_agent='Mozilla/4.0 (compatible;MEIE 5.5;windows NT)' headers={'User-Agent':user_agent} dic={}; #定义个字典对象,存月份和个数 for i in range(1,139): html=requests.get('http://www.cnblogs.com/heyang78/p/?page='+str(i),headers=headers) soup= BeautifulSoup(html.text,'html.parser'); for descDiv in soup.find_all(class_="postDesc2"): rawInfo=descDiv.text #得到class="postDesc2"的div的内容 yearMonth=re.search(r'\d{4}-\d{2}',rawInfo).group() #用正则表达式去匹配年月并取其值 # 将年月存入字典,如果存在就在原基础上加一 if yearMonth in dic: dic[yearMonth]=dic[yearMonth]+1 else: dic[yearMonth]=1 list=sorted(dic.items(),key=lambda x:x[1]) #将排序后的字典转化为数组 list.reverse() for item in list: print(item)
而得到的结果如下:
('2020-01', 80) ('2017-09', 78) ('2019-11', 73) ('2020-05', 68) ('2019-12', 64) ('2019-10', 62) ('2020-03', 61) ('2018-04', 53) ('2021-09', 48) ('2018-05', 44) ('2020-04', 43) ('2013-09', 42) ('2020-02', 42) ('2019-09', 41) ('2021-08', 39) ('2019-03', 37) ('2013-08', 33) ('2017-11', 32) ('2020-06', 26) ('2014-12', 22) ('2017-12', 21) ('2017-01', 20) ('2017-06', 20) ('2018-03', 19) ('2014-07', 17) ('2016-07', 17) ('2019-08', 17) ('2017-08', 16) ('2013-11', 15) ('2014-08', 15) ('2013-10', 14) ('2014-04', 14) ('2014-05', 14) ('2016-03', 14) ('2015-01', 13) ('2014-11', 12) ('2016-08', 12) ('2015-07', 10) ('2016-02', 9) ('2017-07', 9) ('2014-01', 8) ('2014-10', 7) ('2018-01', 7) ('2015-04', 6) ('2015-08', 6) ('2014-02', 5) ('2015-06', 5) ('2017-10', 5) ('2013-12', 4) ('2015-02', 4) ('2015-05', 4) ('2014-03', 3) ('2017-02', 3) ('2014-09', 2) ('2015-12', 2) ('2017-03', 2) ('2018-06', 2) ('2018-07', 2) ('2019-05', 2) ('2014-06', 1) ('2015-11', 1) ('2016-05', 1) ('2016-06', 1) ('2016-10', 1) ('2017-04', 1) ('2017-05', 1) ('2019-04', 1) ('2019-07', 1) ('2020-09', 1)
END