爬虫-多线程抓取图片

一、目的

  利用多线程的方式爬取图片,地址:其他电脑动态壁纸 - 其他桌面动态壁纸 - 元气壁纸 (cheetahfun.com)

二、分析

  F12分析网页结构,图片的地址都在class = "flex flex-wrap justify-between font-normal"标签中的li里面,只需要在a标签中img中

   根据前面学过的内容,可以先写出单线程爬取图片的方式,在此基础上添加多线程

# -*- coding: utf-8 -*-
#第一步:导包
from concurrent.futures import ThreadPoolExecutor
import requests
from lxml import etree
import time

headers = {
    "User-Agent":
    "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Mobile Safari/537.36 Edg/120.0.0.0"
}

def download(url):
    resp = requests.get(url,headers)
    html = etree.HTML(resp.text)
    li_list = html.xpath('//*[@id="1"]/main/div[2]/div/section/ul/li')

    for li in li_list:
        pic_url = li.xpath('./div/a/img/@src')[0]
        pic_name = li.xpath('./div/a/img/@title')[0]
        # print(pic_name)
        # print(pic_url)
        with open("E:/元气图片/" + pic_name + ".png", mode="wb")as fp:
            fp.write(requests.get(pic_url).content)
            print(pic_name + "下载完成!!")
            time.sleep(1)

if __name__ == '__main__':
    with ThreadPoolExecutor(5) as t:
        for i in range(1,10):
            url_Temp = f"https://mbizhi.cheetahfun.com/dn/c11d/p{i}"
        # print(url)
            t.submit(download, url=url_Temp)

注:多线程需要导包

from concurrent.futures import ThreadPoolExecutor

download方式,可以开启多个线程去同时调用此方法来请求获取图片,

    with ThreadPoolExecutor(5) as t:
        for i in range(1,10):
            url_Temp = f"https://mbizhi.cheetahfun.com/dn/c11d/p{i}"
            t.submit(download, url=url_Temp)

这里的要开启5个线程去调用download方法

posted @ 2024-01-16 19:59  zhang0513  阅读(37)  评论(0编辑  收藏  举报