复制代码

python 爬站长素材网页图片

一、我们要用python第三方库:

import requests
import re

二、找到自己感兴趣的网页图片:

for i in range(1,2):
    url = "https://sc.chinaz.com/tupian/index.html"  # 网站地址
    if i ==1:
        url = "https://sc.chinaz.com/tupian/index.html"#网站地址
    else:
        url = "https://sc.chinaz.com/tupian/index_%s.html" %i
res = requests.get(url)
    res.encoding = res.apparent_encoding# 自适应字符编码设置

三、用正则表达式提取面每个页面图片对应的详情页的地址

获取图片地址和图片名字
res_url_i=re.findall('<img src2="(.*?)" alt=".*?">',res.text) res_url_name = re.findall('<img src2=".*?" alt="(.*?)">',res.text)

四、我们在压缩下载

    for res_url_i1, res_url_name1 in zip(res_url_i, res_url_name):
        res_url_i = "https:" + res_url_i1
        image_url = requests.get(res_url_i)
        f = open("./111/%s.jpg"%res_url_name1,"wb")
        print("%s.jpg" % res_url_name1 + "下载成功!!!")
        f.write(image_url.content)
        f.close()

完整代码如下:

 

import requests
import re
# 循环爬取前2页的页面
for i in range(1,2):
    url = "https://sc.chinaz.com/tupian/index.html"  # 网站地址
    if i ==1:
        url = "https://sc.chinaz.com/tupian/index.html"#网站地址
    else:
        url = "https://sc.chinaz.com/tupian/index_%s.html" %i
    #     #获取页面
    res = requests.get(url)
    res.encoding = res.apparent_encoding# 自适应字符编码设置
    #     提取页面中每个图片对应的详情页的地址    正则  列表
    res_url_i=re.findall('<img src2="(.*?)" alt=".*?">',res.text)
    res_url_name = re.findall('<img src2=".*?" alt="(.*?)">',res.text)
    for res_url_i1, res_url_name1 in zip(res_url_i, res_url_name):
        res_url_i = "https:" + res_url_i1
        image_url = requests.get(res_url_i)
        f = open("./111/%s.jpg"%res_url_name1,"wb")
        print("%s.jpg" % res_url_name1 + "下载成功!!!")
        f.write(image_url.content)
        f.close()

 

提示:

每个网页地址不一样,正则表达式也不一样,文件操作地址也不一样,我的仅供参考,不要完成相信

 

posted @ 2022-02-22 15:02  怪~咖  阅读(86)  评论(0编辑  收藏  举报
复制代码