python异步框架中协程之间的并行
python异步框架中协程之间的并行
python中的异步协程框架有很多,比如 tornado , gevent , asyncio , twisted 等。协程带来的是低消耗的并发,在等待IO事件的时候可以把控制权交给其它的协程,这个是它并发能力的保障。但是光有并发还是不够的,高并发并不能保证低延迟,因为一个业务逻辑的流程可能包含多个异步IO的请求,如果这些异步IO的请求是一个一个逐步执行的,虽然server的吞吐量还是很高,但是每个请求的延迟就会很大。为了解决这类问题,每个框架都有各自不同的方式,下面我们就来分别看看,它们都是怎么管理互不相关协程之间的并行的。
tornado
python2.7及以上
tornado的代码就简短很多,直接yield一个coroutine的列表出去就好了:
#!/usr/bin/env python # _*_coding:utf-8_*_ import random import requests,json import time from tornado import gen from tornado.ioloop import IOLoop @gen.coroutine def get_url(url): r = requests.get(url, timeout=3) print url, r.status_code resp = r.text print type(resp) raise gen.Return((url, r.status_code)) @gen.coroutine def process_once_everything_ready(): before = time.time() # coroutines = [get_url(url) for url in ['URL1', 'URL2', 'URL3']] coroutines = [get_url(url) for url in ['https://www.python.org/', 'https://github.com/', 'https://www.yahoo.com/']] result = yield coroutines after = time.time() print(result) print('total time: {} seconds'.format(after - before)) if __name__ == '__main__': IOLoop.current().run_sync(process_once_everything_ready)
输出:
/usr/bin/python /Users/liujianzuo/py_test/s83_company_code/edns_prober_v2/cname_search/a_sync_io.py https://www.python.org/ 200 <type 'unicode'> https://github.com/ 200 <type 'unicode'> https://www.yahoo.com/ 200 <type 'unicode'> [('https://www.python.org/', 200), ('https://github.com/', 200), ('https://www.yahoo.com/', 200)] total time: 4.64905309677 seconds
import random import time from tornado import gen from tornado.ioloop import IOLoop @gen.coroutine def get_url(url): wait_time = random.randint(1, 4) yield gen.sleep(wait_time) print('URL {} took {}s to get!'.format(url, wait_time)) raise gen.Return((url, wait_time)) @gen.coroutine def process_once_everything_ready(): before = time.time() coroutines = [get_url(url) for url in ['URL1', 'URL2', 'URL3']] result = yield coroutines after = time.time() print(result) print('total time: {} seconds'.format(after - before)) if __name__ == '__main__': IOLoop.current().run_sync(process_once_everything_ready)
$ python3 tornado_test.py URL URL2 took 1s to get! URL URL3 took 1s to get! URL URL1 took 4s to get! [('URL1', 4), ('URL2', 1), ('URL3', 1)] total time: 4.000649929046631 seconds
在这里,总的运行时间也是等于最长的协程的运行时间
因为现在tornado已经集成了 asyncio
以及 twisted
模块,也可以利用它们的方式去做,这里就不展开了。
asyncio
python3.4及以上http://xidui.github.io/2015/11/11/python%E5%BC%82%E6%AD%A5%E6%A1%86%E6%9E%B6%E5%8D%8F%E7%A8%8B%E4%B9%8B%E9%97%B4%E7%9A%84%E5%B9%B6%E8%A1%8C/?utm_source=tuicool&utm_medium=referral
在我的博客里有一篇关于asyncio库的译文,里面最后一部分就有介绍它是如何管理互不相关的协程的。这里我们还是引用它,并给他增加了计时的功能来更好地阐述协程是如何并行的:
import asyncio import random import time @asyncio.coroutine def get_url(url): wait_time = random.randint(1, 4) yield from asyncio.sleep(wait_time) print('URL {} took {}s to get!'.format(url, wait_time)) return url, wait_time @asyncio.coroutine def process_as_results_come_in(): before = time.time() coroutines = [get_url(url) for url in ['URL1', 'URL2', 'URL3']] for coroutine in asyncio.as_completed(coroutines): url, wait_time = yield from coroutine print('Coroutine for {} is done'.format(url)) after = time.time() print('total time: {} seconds'.format(after - before)) @asyncio.coroutine def process_once_everything_ready(): before = time.time() coroutines = [get_url(url) for url in ['URL1', 'URL2', 'URL3']] results = yield from asyncio.gather(*coroutines) print(results) after = time.time() print('total time: {} seconds'.format(after - before)) def main(): loop = asyncio.get_event_loop() print("First, process results as they come in:") loop.run_until_complete(process_as_results_come_in()) print("\nNow, process results once they are all ready:") loop.run_until_complete(process_once_everything_ready()) if __name__ == '__main__': main()
总结
- 在协程框架中的sleep,都不能用原来
time
模块中的sleep了,不然它会阻塞整个线程,而所有协程都是运行在同一个线程中的。可以看到两个框架都会sleep作了封装gen.sleep()
和asyncio.sleep()
,内部的实现上,它们都是注册了一个定时器在eventloop中,把CPU的控制权交给其它协程。 - 从协程的实现原理层面去说,也是比较容易理解这种并行方式的。两个框架都是把一个生成器对象的列表yield出去,交给调度器,再由调度器分别执行并注册回调,所以才能够实现并行。