多线程爬取4K图片

试试多线程爬取能提高多少性能

单线程爬取180张图片时间大约为60秒左右

下面上多线程代码

import time
import requests
from lxml import etree
import os
from multiprocessing.dummy import Pool
if __name__ == '__main__':
start=time.perf_counter()
url_list=[]
headers= {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36',
}
os.chdir(r"C:\Users\Administrator\Desktop\img")
for page in range(1,10):
if page==1:
url="http://pic.netbian.com/4kmeinv"
else:
url="http://pic.netbian.com/4kmeinv/index_{}.html".format(page)
res=requests.get(url,headers=headers)
res.encoding=res.apparent_encoding

infos=etree.HTML(res.text)
info=infos.xpath('//ul[@class="clearfix"]/li')

for item in info:
dic={"filename":item.xpath("./a/b/text()")[0]+".jpg","url":"http://pic.netbian.com" + item.xpath('./a/img/@src')[0]}
url_list.append(dic)

def get_pic(dic):
url=dic["url"]
print(dic)
res=requests.get(url,headers=headers)
print("正在下载"+dic["filename"])
with open(dic["filename"], "wb") as fp:
fp.write(res.content)
print("下载完成" + dic["filename"])

pool=Pool(10)
pool.map(get_pic,url_list)
print("共用时{}".format(start-time.perf_counter()))


共开启了十个线程,大约运行时间为35秒,多线程对这种下载爬取的提升还是非常明显的
posted @ 2020-09-17 23:29  高内聚低耦合  阅读(235)  评论(0)    收藏  举报