asyncio 基础用法
asyncio 基础用法
-
python也是在python 3.4中引入了协程的概念。也通过这次整理更加深刻理解这个模块的使用
asyncio 是干什么的?
-
asyncio
是Python 3.4版本引入的标准库,直接内置了对异步IO的支持。 -
异步网络操作
-
并发
-
协程
asyncio的一些关键字:
- event_loop 事件循环:程序开启一个无限循环,把一些函数注册到事件循环上,当满足事件发生的时候,调用相应的协程函数
- **coroutine **协程:协程对象,指一个使用async关键字定义的函数,它的调用不会立即执行函数,而是会返回一个协程对象。协程对象需要注册到事件循环,由事件循环调用。
- **task **任务:一个协程对象就是一个原生可以挂起的函数,任务则是对协程进一步封装,其中包含了任务的各种状态
- future: 代表将来执行或没有执行的任务的结果。它和task上没有本质上的区别
- async/await 关键字:python3.5用于定义协程的关键字,async定义一个协程,await用于挂起阻塞的异步调用接口。
Python 3.4 asyncio 用法
import asyncio
import threading
@asyncio.coroutine
def hello():
print("Hello world!", threading.currentThread())
# 异步调用asyncio.sleep(1):
r = yield from asyncio.sleep(2)
# time.sleep(2)
print("Hello world!", threading.currentThread())
# 获取EventLoop:
loop = asyncio.get_event_loop()
# 执行coroutine
tasks = [hello(), hello()]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
-
Python 3.5 定义了 async/await 直接替换 @asyncio.coroutine 和 yield from
基础用法:
import asyncio
import time
now = lambda: time.time()
async def do_some_work(x):
print(f"协程执行: {x}")
await asyncio.sleep(x)
return "done after {}".format(x)
def callback(future):
print("回调执行获取返回值: ", future.result())
start = now()
# 这里是一个协程对象,这个时候do_some_work函数并没有执行
coroutine = do_some_work(2)
# print(coroutine) # <coroutine object do_some_work at 0x000001A2AAA9FCA8>
loop = asyncio.get_event_loop()
# 创建一个 task 对象
# task = loop.create_task(coroutine)
# 第二种创建方式 通过 asyncio 创建
task = asyncio.ensure_future(coroutine)
# 绑定回调,在task执行完成的时候可以获取执行的结果,回调的最后一个参数是future对象,通过该对象可以获取协程返回值。
task.add_done_callback(callback)
print("未执行的task pending 状态: ", task)
loop.run_until_complete(task)
print("执行完的task finished 状态: ", task)
print("执行时间: ", now() - start)
-
执行效果
未执行的task pending 状态: <Task pending coro=<do_some_work() running at F:/爬虫/爬虫项目使用 pycharm/PyppeteerDemo/asyncioDemo.py:7> cb=[callback() at F:/爬虫/爬虫项目使用 pycharm/PyppeteerDemo/asyncioDemo.py:14]>
协程执行: 2
回调执行获取返回值: done after 2
执行完的task finished 状态: <Task finished coro=<do_some_work() done, defined at F:/爬虫/爬虫项目使用 pycharm/PyppeteerDemo/asyncioDemo.py:7> result='done after 2'>
执行时间: 2.0016515254974365
并发和并行
-
并发指的是同时具有多个活动的系统
并行值得是用并发来使一个系统运行的更快。并行可以在操作系统的多个抽象层次进行运用
所以并发通常是指有多个任务需要同时进行,并行则是同一个时刻有多个任务执行
用了aysncio实现了并发
import asyncio
import time
now = lambda: time.time()
async def do_some_work(x):
print("协程执行", x)
await asyncio.sleep(x)
return f"done after {x}"
start = now()
coroutine1 = do_some_work(2)
coroutine2 = do_some_work(3)
coroutine3 = do_some_work(4)
tasks = [
asyncio.ensure_future(coroutine1),
asyncio.ensure_future(coroutine2),
asyncio.ensure_future(coroutine3),
]
loop = asyncio.get_event_loop()
# 执行所有的 task 接收 一个 task 列表
# loop.run_until_complete(asyncio.wait(tasks))
# 第二种写法 接收 一堆 task
loop.run_until_complete(asyncio.gather(*tasks))
for item in tasks:
print(item.result())
print("执行时间: ", now() - start)
- 执行效果
协程执行 2
协程执行 3
协程执行 4
done after 2
done after 3
done after 4
执行时间: 4.00330114364624
协程嵌套
- 封装更多的io操作过程,即一个协程中await了另外一个协程,连接起来。这样就实现了嵌套的协程.
import asyncio
import time
now = lambda: time.time()
start = now()
async def do_some_work(x):
print("协程执行: ", x)
await asyncio.sleep(x)
return f"done after {x}"
async def main():
coroutine1 = do_some_work(2)
coroutine2 = do_some_work(3)
coroutine3 = do_some_work(4)
tasks = [
asyncio.ensure_future(coroutine1),
asyncio.ensure_future(coroutine2),
asyncio.ensure_future(coroutine3),
]
# 第一种
# dones 完成的 task对象 pendings 等待的 task
# dones, pendings = await asyncio.wait(tasks)
# print(pendings)
# for task in dones:
# print(task.result())
# 或者直接返回
# return await asyncio.wait(tasks)
# 第二种
# 使用 asyncio.gather 直接得到结果列表
# results = await asyncio.gather(*tasks)
# print(results)
# for result in results:
# print(result)
# 或者直接返回
# return results
# 第三种
# asyncio.as_completed(tasks) 是一个生成器
# print( asyncio.as_completed(tasks)) # <generator object as_completed at 0x000001F0DB4767D8>
for task in asyncio.as_completed(tasks):
result = await task
print(task) # <generator object as_completed.<locals>._wait_for_one at 0x000001557DA46830>
print(result) # done after 2
loop = asyncio.get_event_loop()
# 第一种 返回值
# dines, pendings = loop.run_until_complete(main())
# print(pendings)
# for task in dines:
# print(task.result())
# 第二种返回值
# results = loop.run_until_complete(main())
# for result in results:
# print("返回的内容 : ", result)
# 第三种
loop.run_until_complete(main())
print("执行时间: ", now() - start)
协程的停止
-
future对象有几个状态:
- Pending
- Running
- Done
- Cacelled
创建future的时候,task为pending,事件循环调用执行的时候当然就是running,调用完毕自然就是done,如果需要停止事件循环,就需要先把task取消。可以使用asyncio.Task获取事件循环的task.
import asyncio
import time
now = lambda: time.time()
start = now()
async def do_some_work(x):
print("协程执行: {}".format(x))
await asyncio.sleep(x)
return "done after {}".format(x)
coroutine1 = do_some_work(2)
coroutine2 = do_some_work(3)
coroutine3 = do_some_work(4)
tasks = [
asyncio.ensure_future(coroutine1),
asyncio.ensure_future(coroutine2),
asyncio.ensure_future(coroutine3),
]
loop = asyncio.get_event_loop()
try:
results = loop.run_until_complete(asyncio.wait(tasks))
except KeyboardInterrupt as k:
# print(k)
print(asyncio.Task.all_tasks())
for task in asyncio.Task.all_tasks():
print(task.cancel()) # 循环task,逐个cancel
loop.stop() # stop之后还需要再次开启事件循环
loop.run_forever()
finally:
loop.close() # 最后在close,不然还会抛出异常
print(now() - start)
- 执行结果 : 使用 命令窗口执行 Ctrl + c 会抛出 run_until_complete 的 KeyboardInterrupt 异常
协程执行: 2
协程执行: 3
协程执行: 4
{<Task pending coro=<do_some_work() running at asyncioDemo.py:158> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x0000025F84BF3DF8>()]> cb=[_wait.<locals>._on_comp
letion() at c:\python36\Lib\asyncio\tasks.py:380]>, <Task pending coro=<do_some_work() running at asyncioDemo.py:158> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at
0x0000025F84B64C18>()]> cb=[_wait.<locals>._on_completion() at c:\python36\Lib\asyncio\tasks.py:380]>, <Task pending coro=<wait() running at c:\python36\Lib\asyncio\tasks.py:313> w
ait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x0000025F84BF3F18>()]>>, <Task pending coro=<do_some_work() running at asyncioDemo.py:158> wait_for=<Future pending cb
=[<TaskWakeupMethWrapper object at 0x0000025F84BF3D98>()]> cb=[_wait.<locals>._on_completion() at c:\python36\Lib\asyncio\tasks.py:380]>}
True
True
True
True
2.00264573097229
不同线程的事件循环
- 我们的事件循环用于注册协程,而有的协程需要动态的添加到事件循环中。一个简单的方式就是使用多线程。当前线程创建一个事件循环,然后在新建一个线程,在新线程中启动事件循环。当前线程不会被block。
import asyncio
from threading import Thread
import time
now = lambda: time.time()
def start_loop(loop):
asyncio.set_event_loop(loop)
loop.run_forever()
async def do_some_work(x):
print('Waiting {}'.format(x))
await asyncio.sleep(x)
print('Done after {}s'.format(x))
def work(x):
print("开始", x)
time.sleep(x)
print("结束", x)
start = now()
new_loop = asyncio.new_event_loop()
t = Thread(target=start_loop, args=(new_loop,))
t.start()
new_loop.call_soon_threadsafe(work, 6)
new_loop.call_soon_threadsafe(work, 3)
print(now() - start)
"""
开始 6
0.002008199691772461
结束 6
开始 3
结束 3
"""
'''
启动上述代码之后,当前线程不会被block,新线程中会按照顺序执行call_soon_threadsafe方法注册的more_work方法, 后者因为time.sleep操作是同步阻塞的,因此运行完毕more_work需要大致6 + 3
'''
#
# asyncio.run_coroutine_threadsafe(do_some_work(6), new_loop)
# asyncio.run_coroutine_threadsafe(do_some_work(3), new_loop)
# print(now() - start)
"""
Waiting 6
Waiting 3
0.0009968280792236328
Done after 3s
Done after 6s
"""
'''
上述的例子,主线程中创建一个new_loop,然后在另外的子线程中开启一个无限事件循环。 主线程通过run_coroutine_threadsafe新注册协程对象。这样就能在子线程中进行事件循环的并发操作,同时主线程又不会被block。一共执行的时间大概在6s左右。
'''
- 参考 廖雪峰 文档
async def wget(host):
print('wget %s...' % host)
reader, writer = await asyncio.open_connection(host, 80)
header = 'GET / HTTP/1.0\r\nHost: %s\r\n\r\n' % host
writer.write(header.encode('utf-8'))
await writer.drain()
while True:
line = await reader.readline()
if line == b'\r\n':
break
print('%s header > %s' % (host, line.decode('utf-8').rstrip()))
# Ignore the body, close the socket
writer.close()
loop = asyncio.get_event_loop()
tasks = [wget(host) for host in ['www.sina.com.cn', 'www.sohu.com', 'www.163.com']]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
wget www.sina.com.cn...
wget www.sohu.com...
wget www.163.com...
www.sina.com.cn header > HTTP/1.1 302 Moved Temporarily
www.sina.com.cn header > Server: nginx
www.sina.com.cn header > Date: Sat, 27 Apr 2019 14:14:29 GMT
www.sina.com.cn header > Content-Type: text/html
www.sina.com.cn header > Content-Length: 154
www.sina.com.cn header > Connection: close
www.sina.com.cn header > Location: https://www.sina.com.cn/
www.sina.com.cn header > X-Via-CDN: f=edge,s=cmcc.hebei.ha2ts4.140.nb.sinaedge.com,c=183.197.88.253;
www.sina.com.cn header > X-Via-Edge: 1556374469876fd58c5b798403e6f6e7f94d2
www.sohu.com header > HTTP/1.1 200 OK
www.sohu.com header > Content-Type: text/html;charset=UTF-8
www.sohu.com header > Connection: close
www.sohu.com header > Server: nginx
www.sohu.com header > Date: Sat, 27 Apr 2019 14:13:44 GMT
www.sohu.com header > Cache-Control: max-age=60
www.sohu.com header > X-From-Sohu: X-SRC-Cached
www.sohu.com header > Content-Encoding: gzip
www.sohu.com header > FSS-Cache: HIT from 4742539.7953813.5615036
www.sohu.com header > FSS-Proxy: Powered by 3628410.5725572.4500890
www.163.com header > HTTP/1.0 302 Moved Temporarily
www.163.com header > Server: Cdn Cache Server V2.0
www.163.com header > Date: Sat, 27 Apr 2019 14:14:29 GMT
www.163.com header > Content-Length: 0
www.163.com header > Location: http://www.163.com/special/0077jt/error_isp.html
www.163.com header > X-Via: 1.0 xiyidong136:1 (Cdn Cache Server V2.0)
www.163.com header > Connection: close