gevent 和twisted模块实现并发
对于多线程和多进程的缺点是在IO阻塞时会造成线程和进程的浪费,所以异步IO会是首选,有下面几种:
一、异步IO
1、asyncio + aiohttp + requests
2、gevent + requests +grequests
3、twisted
4、tornado
5、asyncio
6、gevent+requests
7、grequests
gevent+requests
import gevent
import requests
from gevent import monkey
#替换内置的socket,更换成gevent封装的弄成非阻塞的
monkey.patch_all()
def fetch_async(method,url,req_kwargs):
print(method,url,req_kwargs)
response = requests.request(method=method,url=url,**req_kwargs)
print(response.url,response.content)
#发送请求,可以称下面为三个协程
gevent.joinall([
gevent.spawn(fetch_async,method="get",url="https://www.python.org/",req_kwargs={}),
gevent.spawn(fetch_async, method="get", url="https://www.yahoo.com/", req_kwargs={}),
gevent.spawn(fetch_async, method="get", url="https://github.com/", req_kwargs={}),
])
利用gevent+urllib爬取网站如下:
import gevent
import requests
import urllib.request
from gevent import monkey
#替换内置的socket,更换成gevent封装的弄成非阻塞的
monkey.patch_all()
def run_task(url):
print("Visit --> %s" %url)
try:
response = urllib.request.urlopen(url)
data = response.read()
print("%d bytes received from %s." %(len(data),url))
except Exception as e:
print(e)
if __name__ == '__main__':
urls = ['https://www.baidu.com','https://docs.python.org/3/library/urllib.html','https://www.cnblogs.com/wangmo/p/7784867.html']
greenlets = [gevent.spawn(run_task,url) for url in urls]
gevent.joinall(greenlets)
gevent协程池控制最大的协程数量
import gevent
import requests
import urllib.request
from gevent import monkey
#替换内置的socket,更换成gevent封装的弄成非阻塞的
monkey.patch_all()
def fetch_async(method,url,req_kwargs):
print(method,url,req_kwargs)
response = requests.request(method=method,url=url,**req_kwargs)
print(response.url,response.content)
#发送请求,可以称下面为三个协程
#发送请求(协程池控制最大协程数量)
from gevent.pool import Pool
pool = Pool(3)
gevent.joinall([
pool.spawn(fetch_async,method="get",url="https://www.python.org/",req_kwargs={}),
pool.spawn(fetch_async, method="get", url="https://www.yahoo.com/", req_kwargs={}),
pool.spawn(fetch_async, method="get", url="https://github.com/", req_kwargs={}),
])
grequests内置有gevent.joinall
import grequests
request_list = [
grequests.get('http://httpbin.org/delay/1', timeout=0.001),
grequests.get('http://fakedomain/'),
grequests.get('http://httpbin.org/status/500')
]
####执行并获取响应列表####
response_list = grequests.map(request_list)
print(response_list)
twisted
1、事件循环是,循环等待获取请求返回的内容
2、当所有的请求都获取到了结果,事件循环会一直在循环,所以得判断当请求数与获取的结果数一样时,利用twisted.stop()停止事件循环:
#发送http请求
from twisted.web.client import getPage
#事件循环
from twisted.internet import reactor
REV_COUNTER = 0
REQ_COUNTER = 0
def callback(contents):
print(contents)
global REV_COUNTER
REV_COUNTER +=1
if REV_COUNTER == REQ_COUNTER:
#已经获取到请求的所有数量时关闭事件循环
reactor.stop()
url_list = ['http://www.bing.com', 'http://www.baidu.com', ]
REQ_COUNTER = len(url_list)
for url in url_list:
deferred = getPage(bytes(url,encoding="utf8"))
deferred.addCallback(callback)
#时间循环等待返回的结果
reactor.run()