python 爬站长素材网页图片
一、我们要用python第三方库:
import requests import re
二、找到自己感兴趣的网页图片:
for i in range(1,2): url = "https://sc.chinaz.com/tupian/index.html" # 网站地址 if i ==1: url = "https://sc.chinaz.com/tupian/index.html"#网站地址 else: url = "https://sc.chinaz.com/tupian/index_%s.html" %i
res = requests.get(url) res.encoding = res.apparent_encoding# 自适应字符编码设置
三、用正则表达式提取面每个页面图片对应的详情页的地址
获取图片地址和图片名字
res_url_i=re.findall('<img src2="(.*?)" alt=".*?">',res.text) res_url_name = re.findall('<img src2=".*?" alt="(.*?)">',res.text)
四、我们在压缩下载
for res_url_i1, res_url_name1 in zip(res_url_i, res_url_name): res_url_i = "https:" + res_url_i1 image_url = requests.get(res_url_i) f = open("./111/%s.jpg"%res_url_name1,"wb") print("%s.jpg" % res_url_name1 + "下载成功!!!") f.write(image_url.content) f.close()
完整代码如下:
import requests import re # 循环爬取前2页的页面 for i in range(1,2): url = "https://sc.chinaz.com/tupian/index.html" # 网站地址 if i ==1: url = "https://sc.chinaz.com/tupian/index.html"#网站地址 else: url = "https://sc.chinaz.com/tupian/index_%s.html" %i # #获取页面 res = requests.get(url) res.encoding = res.apparent_encoding# 自适应字符编码设置 # 提取页面中每个图片对应的详情页的地址 正则 列表 res_url_i=re.findall('<img src2="(.*?)" alt=".*?">',res.text) res_url_name = re.findall('<img src2=".*?" alt="(.*?)">',res.text) for res_url_i1, res_url_name1 in zip(res_url_i, res_url_name): res_url_i = "https:" + res_url_i1 image_url = requests.get(res_url_i) f = open("./111/%s.jpg"%res_url_name1,"wb") print("%s.jpg" % res_url_name1 + "下载成功!!!") f.write(image_url.content) f.close()
提示:
每个网页地址不一样,正则表达式也不一样,文件操作地址也不一样,我的仅供参考,不要完成相信