【Python爬虫】爬取个人博客的图片
今天看博客的时候发现了一个个人博客里面有两篇有很多图片的博文地址如下https://www.fifiblog.com/xiuxianyule/yangyanmeitu
图片很多浏览器加载太慢了就用Python写了个小爬虫,下载下来再看。步骤如下
1.查看源码
可以看到图片地址格式为https://www.fifiblog.com/wp-content/uploads/2015/06/1434923267140.jpg,清空cookie重新访问该地址,可以看到,没有防护措施。所以直接get就可以。
2.构造正则表达式
restr='https://www.fifiblog.com/wp-content/uploads/2015/06/14349(.*?).jpg'
3.完整代码
import urllib
import urllib2
import re
#values={}
url="https://www.fifiblog.com/xiuxianyule/yangyanmeitu/gongkouxiaoluoligaoqingwushuiyin.html" #
url2='https://www.fifiblog.com/xiuxianyule/yangyanmeitu/gongkouxiaoluoli.html'
try:
response = urllib2.urlopen(url2)
except urllib2.HTTPError, e:
print e.code
except urllib2.URLError, e:
print e.reason
else:
ans=response.read()
restr='https://www.fifiblog.com/wp-content/uploads/2015/06/14349(.*?).jpg'
#https://www.fifiblog.com/wp-content/uploads/2015/06/1434923267140.jpg'
pattern=re.compile(restr)
items=re.findall(pattern,ans)
for item in items:
imgurl='https://www.fifiblog.com/wp-content/uploads/2015/06/14349'+item+'.jpg'
print imgurl
imgresponse=urllib2.urlopen(imgurl)
imgsave=open(repr(item)+".jpg","w")
imgsave.write(imgresponse.read())
imgsave.close()
print repr(item)+".jpg saved!"