爬虫-多线程抓取图片
一、目的
利用多线程的方式爬取图片,地址:其他电脑动态壁纸 - 其他桌面动态壁纸 - 元气壁纸 (cheetahfun.com)
二、分析
F12分析网页结构,图片的地址都在class = "flex flex-wrap justify-between font-normal"标签中的li里面,只需要在a标签中img中
根据前面学过的内容,可以先写出单线程爬取图片的方式,在此基础上添加多线程
# -*- coding: utf-8 -*- #第一步:导包 from concurrent.futures import ThreadPoolExecutor import requests from lxml import etree import time headers = { "User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Mobile Safari/537.36 Edg/120.0.0.0" } def download(url): resp = requests.get(url,headers) html = etree.HTML(resp.text) li_list = html.xpath('//*[@id="1"]/main/div[2]/div/section/ul/li') for li in li_list: pic_url = li.xpath('./div/a/img/@src')[0] pic_name = li.xpath('./div/a/img/@title')[0] # print(pic_name) # print(pic_url) with open("E:/元气图片/" + pic_name + ".png", mode="wb")as fp: fp.write(requests.get(pic_url).content) print(pic_name + "下载完成!!") time.sleep(1) if __name__ == '__main__': with ThreadPoolExecutor(5) as t: for i in range(1,10): url_Temp = f"https://mbizhi.cheetahfun.com/dn/c11d/p{i}" # print(url) t.submit(download, url=url_Temp)
注:多线程需要导包
from concurrent.futures import ThreadPoolExecutor
download方式,可以开启多个线程去同时调用此方法来请求获取图片,
with ThreadPoolExecutor(5) as t: for i in range(1,10): url_Temp = f"https://mbizhi.cheetahfun.com/dn/c11d/p{i}" t.submit(download, url=url_Temp)
这里的要开启5个线程去调用download方法