python 协程实现文件I/O
前言
- 前段时间突然被人问到python的协程,当场我就宕机了。然后就开始查询各种资料,来弥补这里的欠缺。虽然暂时没实战过,但总比啥都不知道好一些。
- 当我学了一些资料之后,我发现网上资料,大多数代码是这样的:
import asyncio, time
async def hello(x):
print("Hello world!")
# 异步调用asyncio.sleep(1):
r = await asyncio.sleep(1) # 模拟阻塞
print(x, r)
print("Hello again!")
return False
# 获取EventLoop:
loop = asyncio.get_event_loop()
# 执行coroutine
tasks = [hello("第一:"), hello("第二:")]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
- 然后,后面的代码就变成类似这样的:
import asyncio
async def wget(host):
print('wget %s...' % host)
connect = asyncio.open_connection(host, 80)
reader, writer = await connect
header = 'GET / HTTP/1.0\r\nHost: %s\r\n\r\n' % host
writer.write(header.encode('utf-8'))
await writer.drain()
while True:
line = await reader.readline()
if line == b'\r\n':
break
print('%s header > %s' % (host, line.decode('utf-8').rstrip()))
# Ignore the body, close the socket
writer.close()
loop = asyncio.get_event_loop()
tasks = [wget(host) for host in ['www.sina.com.cn', 'www.sohu.com', 'www.163.com']]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
- 好吧,我太菜了,我现在只知道这玩意儿,也就是协程可以实现网络异步I/O,然后我好奇心就来了,既然网络异步I/O都实现了,那么文件异步I/O没道理不能实现啊,然后我又开始了查询之路
- 在我努力了一天之后,我终于找到了两个似乎可行的方案,先说一句,我真的才开始学协程,所以各位大佬,有啥不对的,直接指出来
- 方案一:
async def hello(x):
print("Hello world!")
# 异步调用asyncio.sleep(1):
async with open("2.txt", "r") as f:
data = await f.readlines()
print("Hello again!")
return False
# 获取EventLoop:
loop = asyncio.get_event_loop()
# 执行coroutine
tasks = [hello("第一:"), hello("第二:")]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
- 执行结果是这样的:
D:\Users\User\Desktop\python_code_test\my_code\venv\Scripts\python.exe D:/Users/User/Desktop/python_code_test/my_code/asynchronous/02.py
Hello world!
Hello world!
Task exception was never retrieved
future: <Task finished name='Task-3' coro=<hello() done, defined at D:\Users\User\Desktop\python_code_test\my_code\asynchronous\02.py:25> exception=AttributeError('__aenter__')>
Traceback (most recent call last):
File "D:\Users\User\Desktop\python_code_test\my_code\asynchronous\02.py", line 28, in hello
async with open("2.txt", "r") as f:
AttributeError: __aenter__
Task exception was never retrieved
future: <Task finished name='Task-2' coro=<hello() done, defined at D:\Users\User\Desktop\python_code_test\my_code\asynchronous\02.py:25> exception=AttributeError('__aenter__')>
Traceback (most recent call last):
File "D:\Users\User\Desktop\python_code_test\my_code\asynchronous\02.py", line 28, in hello
async with open("2.txt", "r") as f:
AttributeError: __aenter__
Process finished with exit code 0
- 好吧,open对象没有__aenter__方法,那么我不用with总可以吧
async def hello(x):
print("Hello world!")
# 异步调用asyncio.sleep(1):
# async with open("2.txt", "r") as f:
# data = await f.readlines()
f = open("2.txt", "r")
data = await f.readlines()
print("Hello again!")
return False
# 获取EventLoop:
loop = asyncio.get_event_loop()
# 执行coroutine
tasks = [hello("第一:"), hello("第二:")]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
- 运行结果如下:
D:\Users\User\Desktop\python_code_test\my_code\venv\Scripts\python.exe D:/Users/User/Desktop/python_code_test/my_code/asynchronous/02.py
Task exception was never retrieved
future: <Task finished name='Task-3' coro=<hello() done, defined at D:\Users\User\Desktop\python_code_test\my_code\asynchronous\02.py:25> exception=TypeError("object list can't be used in 'await' expression")>
Traceback (most recent call last):
File "D:\Users\User\Desktop\python_code_test\my_code\asynchronous\02.py", line 31, in hello
data = await f.readlines()
TypeError: object list can't be used in 'await' expression
Task exception was never retrieved
future: <Task finished name='Task-2' coro=<hello() done, defined at D:\Users\User\Desktop\python_code_test\my_code\asynchronous\02.py:25> exception=TypeError("object list can't be used in 'await' expression")>
Traceback (most recent call last):
File "D:\Users\User\Desktop\python_code_test\my_code\asynchronous\02.py", line 31, in hello
data = await f.readlines()
TypeError: object list can't be used in 'await' expression
Hello world!
Hello world!
Process finished with exit code 0
- list类型的数据不能使用await关键字,啊这。。。。。就很尴尬了,然后我就又又又开始查资料,然后发现有个第三方库:gevent,或许它可以通过实现文件异步I/O
import gevent
from gevent import monkey
# monkey.patch_all() # 用于将标准库中大部分阻塞式调用修改为协作式运行
def fetch(path, name):
# f = open(path, "r")
# data = f.read()
print(path)
with open(path, "r") as f:
data = f.read()
# print(name)
print(path)
return data
if __name__ == "__main__":
g_list = list()
for url in ["1.txt", "2.txt", "3.txt"]:
g = gevent.spawn(fetch, url, url)
g_list.append(g)
gevent.joinall(g_list)
for g in g_list:
print(g.value)
- 这个代码的运行结果如下:
D:\Users\User\Desktop\python_code_test\my_code\venv\Scripts\python.exe D:/Users/User/Desktop/python_code_test/my_code/asynchronous/05.py
1.txt
1.txt
2.txt
2.txt
3.txt
3.txt
我是文件1
我是文件2
我是文件3
Process finished with exit code 0
- 说实话,最开始的时候,我看见上面的打印信息,以为gevent已经可以实现文件异步I/O了,因为两次
print(path)
完成后才打印结果,这里我把自己坑了,之所以后面才打印结果,完全是因为我把打印结果返回之后才能打印,如果直接在print(path)
中间写print(data)
,那么结果就应该是类似这样的:
1.txt
我是文件1
1.txt
2.txt
我是文件2
2.txt
3.txt
我是文件3
3.txt
我是文件1
我是文件2
我是文件3
- 所以这里是我理解错误,我重新复盘这一块的时候,总感觉不对,仔细想了想,才发现自己坑了自己。后面我找到了另一个方法实现,使用第三方库:aiofiles
- 如果不使用gevent,直接跑,那么代码是这样的:
async def hello(x):
print("Hello world!")
print(x)
with open(x, "r") as f:
data = f.readlines()
# r = await hello_wait(x)
print(data)
print(x)
print("Hello again!")
return False
# 获取EventLoop:
loop = asyncio.get_event_loop()
# 执行coroutine
tasks = [hello("1.txt"), hello("2.txt"), hello("3.txt")]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
- 结果如下:
D:\Users\User\Desktop\python_code_test\my_code\venv\Scripts\python.exe D:/Users/User/Desktop/python_code_test/my_code/asynchronous/02.py
Hello world!
3.txt
['我是文件3']
3.txt
Hello again!
Hello world!
2.txt
['我是文件2']
2.txt
Hello again!
Hello world!
1.txt
['我是文件1']
1.txt
Hello again!
Process finished with exit code 0
- 实现文件异步I/O的模块:aiofiles,代码如下:
import asyncio
import aiofiles
async def hello_wait(x):
n = 0
print(n)
while True:
async with aiofiles.open("2.txt", "a") as f:
await f.write(f"{n}\n")
if n > 2:
break
n += 1
async with aiofiles.open("2.txt", "r") as f:
data = await f.readlines()
print(x,data)
return False
async def hello(x):
print("Hello world!")
print(x)
r = await hello_wait(x)
print(r)
print("Hello again!")
return False
# 获取EventLoop:
loop = asyncio.get_event_loop()
# 执行coroutine
tasks = [hello("1.txt"), hello("2.txt"), hello("3.txt")]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
- 执行结果:
D:\Users\User\Desktop\python_code_test\my_code\venv\Scripts\python.exe D:/Users/User/Desktop/python_code_test/my_code/asynchronous/02.py
Hello world!
2.txt
0
Hello world!
3.txt
0
Hello world!
1.txt
0
3.txt ['0\n', '0\n', '0\n', '1\n', '1\n', '1\n', '2\n', '2\n', '3\n', '3\n', '3\n']
False
Hello again!
1.txt ['0\n', '0\n', '0\n', '1\n', '1\n', '1\n', '2\n', '2\n', '3\n', '3\n', '3\n']
False
Hello again!
2.txt ['0\n', '0\n', '0\n', '1\n', '1\n', '1\n', '2\n', '2\n', '3\n', '3\n', '3\n']
False
Hello again!
Process finished with exit code 0
- 这个模块可以实现本地的文件异步I/O,支持版本在python3.6以上,具体可以参考github上的说明
总结一下
-
用async+await真的可以实现文件异步I/O,只不过需要安装另一个第三方库:aiofiles
-
废了那么多时间去研究这个,很大一部分原因是,钻牛角尖去了,但是还是有收获,至少知道了aiofiles可以实现
-
我参考的文档有这些:
- https://www.liaoxuefeng.com/wiki/1016959663602400/1017970488768640
- https://www.liujiangblog.com/course/python/83
- https://docs.python.org/zh-cn/3/library/asyncio.html
- https://www.jianshu.com/p/bb6c7f9aa1ae
- https://www.pythonheidong.com/blog/article/791134/d83c8d68767182c3549c/
- https://www.cnblogs.com/xingzheai/p/14964076.html
- https://www.iplaypy.com/wenda/wd14247.html#google_vignette
- https://www.cnblogs.com/rim99/p/6160207.html
- https://www.cnblogs.com/zhangyux/p/6195860.html
-
gevent模块虽然我最后没弄出来文件异步I/O,但是这个模块依然是个很强大的模块,可以参考这几个链接:
-
aiofiles模块代码链接: