一个基于gevent的异步请求库 - grequests
1.安装
pip install grequests -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
2.基础用法教程
用grequests.map()方法时,传入的必须是生成器或列表,下面是用小括号创建的是生成器,用方括号也行,生成列表。
import grequests import time urls = [ "http://www.baidu.com", "http://www.taobao.com", "http://www.xinwen.com", "http://www.meituan.com", "http://www.jingdong.com", ] reqs = (grequests.get(u) for u in urls) start_time = time.time() responses = grequests.map(reqs) end_time = time.time() print(f"耗时:{end_time - start_time}秒")
3.异常处理
异步请求处理异常跟普通requests不太一样
import grequests import time urls = [ "http://www.baidu.com", "http://www.taobao.com", "http://www.xinwen.com", "http://www.meituan.com", "http://www.jingdong.com", ] reqs = (grequests.get(u) for u in urls) start_time = time.time() responses = grequests.map(reqs) end_time = time.time() print(f"耗时:{end_time - start_time}秒") def handle_error(request, exception): print(f"哎呀,出错了:{request.url}") return None # 带异常处理的请求 res = grequests.map(reqs, exception_handler=handle_error)
4. 并发数控制
# size参数控制并发数,这里限制最多同时发5个请求 responses = grequests.map(reqs, size=5)
5. 发送不同的请求
post_urls = ['http://httpbin.org/post'] * 3 post_reqs = [grequests.post(u, json={'msg': 'hello'}) for u in post_urls] # 各种不同的请求 mixed_reqs = [ # 发请求时加上超时设置,避免某个请求卡住影响整体效率 grequests.get('http://httpbin.org/get', timeout=5), grequests.post('http://httpbin.org/post', data={'key': 'value'}, timeout=5), grequests.put('http://httpbin.org/put', json={'name': 'cat'}, timeout=5), grequests.patch('http://httpbin.org/put', json={'name': 'cat'}, timeout=5), grequests.delete('http://httpbin.org/put', json={'name': 'cat'}, timeout=5) ] responses = grequests.map(mixed_reqs)
# 1. 带请求头 headers = {'User-Agent': 'Mozilla/5.0 ...'} reqs = (grequests.get(u, headers=headers) for u in urls) # 2. 使用代理 proxies = {'http': 'http://10.10.10.1:8888'} reqs = (grequests.get(u, proxies=proxies) for u in urls) # 3. 带cookies cookies = {'session': 'abc123'} reqs = (grequests.get(u, cookies=cookies) for u in urls)
grequests的速度确实很快,但用的时候也要注意几点:
1. 不是所有网站都能抗住并发请求,该加延迟的地方得加延迟;
2. 代码里最好加上异常处理,不然一个请求出错可能影响整体;数据量特别大的时候,建议分批请求,避免内存爆掉。