python 协程实现文件I/O

前言

  • 前段时间突然被人问到python的协程,当场我就宕机了。然后就开始查询各种资料,来弥补这里的欠缺。虽然暂时没实战过,但总比啥都不知道好一些。
  • 当我学了一些资料之后,我发现网上资料,大多数代码是这样的:
import asyncio, time

async def hello(x):
    print("Hello world!")
    # 异步调用asyncio.sleep(1):
    r = await asyncio.sleep(1) # 模拟阻塞
    print(x, r)
    print("Hello again!")
    return False

# 获取EventLoop:
loop = asyncio.get_event_loop()
# 执行coroutine
tasks = [hello("第一:"), hello("第二:")]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
  • 然后,后面的代码就变成类似这样的:
import asyncio

async def wget(host):
    print('wget %s...' % host)
    connect = asyncio.open_connection(host, 80)
    reader, writer = await connect
    header = 'GET / HTTP/1.0\r\nHost: %s\r\n\r\n' % host
    writer.write(header.encode('utf-8'))
    await writer.drain()
    while True:
        line = await reader.readline()
        if line == b'\r\n':
            break
        print('%s header > %s' % (host, line.decode('utf-8').rstrip()))
    # Ignore the body, close the socket
    writer.close()

loop = asyncio.get_event_loop()
tasks = [wget(host) for host in ['www.sina.com.cn', 'www.sohu.com', 'www.163.com']]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
  • 好吧,我太菜了,我现在只知道这玩意儿,也就是协程可以实现网络异步I/O,然后我好奇心就来了,既然网络异步I/O都实现了,那么文件异步I/O没道理不能实现啊,然后我又开始了查询之路
  • 在我努力了一天之后,我终于找到了两个似乎可行的方案,先说一句,我真的才开始学协程,所以各位大佬,有啥不对的,直接指出来
  • 方案一:
async def hello(x):
    print("Hello world!")
    # 异步调用asyncio.sleep(1):
    async with open("2.txt", "r") as f:
        data = await f.readlines()
    print("Hello again!")
    return False

# 获取EventLoop:
loop = asyncio.get_event_loop()
# 执行coroutine
tasks = [hello("第一:"), hello("第二:")]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
  • 执行结果是这样的:
D:\Users\User\Desktop\python_code_test\my_code\venv\Scripts\python.exe D:/Users/User/Desktop/python_code_test/my_code/asynchronous/02.py
Hello world!
Hello world!
Task exception was never retrieved
future: <Task finished name='Task-3' coro=<hello() done, defined at D:\Users\User\Desktop\python_code_test\my_code\asynchronous\02.py:25> exception=AttributeError('__aenter__')>
Traceback (most recent call last):
  File "D:\Users\User\Desktop\python_code_test\my_code\asynchronous\02.py", line 28, in hello
    async with open("2.txt", "r") as f:
AttributeError: __aenter__
Task exception was never retrieved
future: <Task finished name='Task-2' coro=<hello() done, defined at D:\Users\User\Desktop\python_code_test\my_code\asynchronous\02.py:25> exception=AttributeError('__aenter__')>
Traceback (most recent call last):
  File "D:\Users\User\Desktop\python_code_test\my_code\asynchronous\02.py", line 28, in hello
    async with open("2.txt", "r") as f:
AttributeError: __aenter__

Process finished with exit code 0
  • 好吧,open对象没有__aenter__方法,那么我不用with总可以吧
async def hello(x):
    print("Hello world!")
    # 异步调用asyncio.sleep(1):
    # async with open("2.txt", "r") as f:
    #     data = await f.readlines()
    f = open("2.txt", "r")
    data = await f.readlines()
    print("Hello again!")
    return False

# 获取EventLoop:
loop = asyncio.get_event_loop()
# 执行coroutine
tasks = [hello("第一:"), hello("第二:")]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
  • 运行结果如下:
D:\Users\User\Desktop\python_code_test\my_code\venv\Scripts\python.exe D:/Users/User/Desktop/python_code_test/my_code/asynchronous/02.py
Task exception was never retrieved
future: <Task finished name='Task-3' coro=<hello() done, defined at D:\Users\User\Desktop\python_code_test\my_code\asynchronous\02.py:25> exception=TypeError("object list can't be used in 'await' expression")>
Traceback (most recent call last):
  File "D:\Users\User\Desktop\python_code_test\my_code\asynchronous\02.py", line 31, in hello
    data = await f.readlines()
TypeError: object list can't be used in 'await' expression
Task exception was never retrieved
future: <Task finished name='Task-2' coro=<hello() done, defined at D:\Users\User\Desktop\python_code_test\my_code\asynchronous\02.py:25> exception=TypeError("object list can't be used in 'await' expression")>
Traceback (most recent call last):
  File "D:\Users\User\Desktop\python_code_test\my_code\asynchronous\02.py", line 31, in hello
    data = await f.readlines()
TypeError: object list can't be used in 'await' expression
Hello world!
Hello world!

Process finished with exit code 0
  • list类型的数据不能使用await关键字,啊这。。。。。就很尴尬了,然后我就又又又开始查资料,然后发现有个第三方库:gevent,或许它可以通过实现文件异步I/O
import gevent
from gevent import monkey
# monkey.patch_all()      # 用于将标准库中大部分阻塞式调用修改为协作式运行


def fetch(path, name):
    # f = open(path, "r")
    # data = f.read()
    print(path)
    with open(path, "r") as f:
        data = f.read()
    # print(name)
    print(path)
    return data


if __name__ == "__main__":
    g_list = list()
    for url in ["1.txt", "2.txt", "3.txt"]:
        g = gevent.spawn(fetch, url, url)
        g_list.append(g)
    gevent.joinall(g_list)
    for g in g_list:
        print(g.value)
  • 这个代码的运行结果如下:
D:\Users\User\Desktop\python_code_test\my_code\venv\Scripts\python.exe D:/Users/User/Desktop/python_code_test/my_code/asynchronous/05.py
1.txt
1.txt
2.txt
2.txt
3.txt
3.txt
我是文件1
我是文件2
我是文件3

Process finished with exit code 0
  • 说实话,最开始的时候,我看见上面的打印信息,以为gevent已经可以实现文件异步I/O了,因为两次print(path)完成后才打印结果,这里我把自己坑了,之所以后面才打印结果,完全是因为我把打印结果返回之后才能打印,如果直接在print(path)中间写print(data),那么结果就应该是类似这样的:
1.txt
我是文件1
1.txt
2.txt
我是文件2
2.txt
3.txt
我是文件3
3.txt
我是文件1
我是文件2
我是文件3
  • 所以这里是我理解错误,我重新复盘这一块的时候,总感觉不对,仔细想了想,才发现自己坑了自己。后面我找到了另一个方法实现,使用第三方库:aiofiles
  • 如果不使用gevent,直接跑,那么代码是这样的:
async def hello(x):
    print("Hello world!")
    print(x)
    with open(x, "r") as f:
        data = f.readlines()
    # r = await hello_wait(x)
    print(data)
    print(x)
    print("Hello again!")
    return False

# 获取EventLoop:
loop = asyncio.get_event_loop()
# 执行coroutine
tasks = [hello("1.txt"), hello("2.txt"), hello("3.txt")]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
  • 结果如下:
D:\Users\User\Desktop\python_code_test\my_code\venv\Scripts\python.exe D:/Users/User/Desktop/python_code_test/my_code/asynchronous/02.py
Hello world!
3.txt
['我是文件3']
3.txt
Hello again!
Hello world!
2.txt
['我是文件2']
2.txt
Hello again!
Hello world!
1.txt
['我是文件1']
1.txt
Hello again!

Process finished with exit code 0
  • 实现文件异步I/O的模块:aiofiles,代码如下:
import asyncio
import aiofiles

async def hello_wait(x):
    n = 0
    print(n)
    while True:
        async with aiofiles.open("2.txt", "a") as f:
             await f.write(f"{n}\n")
        if n > 2:
            break
        n += 1
    async with aiofiles.open("2.txt", "r") as f:
        data = await f.readlines()
    print(x,data)
    return False

async def hello(x):
    print("Hello world!")
    print(x)
    r = await hello_wait(x)
    print(r)
    print("Hello again!")
    return False

# 获取EventLoop:
loop = asyncio.get_event_loop()
# 执行coroutine
tasks = [hello("1.txt"), hello("2.txt"), hello("3.txt")]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
  • 执行结果:
D:\Users\User\Desktop\python_code_test\my_code\venv\Scripts\python.exe D:/Users/User/Desktop/python_code_test/my_code/asynchronous/02.py
Hello world!
2.txt
0
Hello world!
3.txt
0
Hello world!
1.txt
0
3.txt ['0\n', '0\n', '0\n', '1\n', '1\n', '1\n', '2\n', '2\n', '3\n', '3\n', '3\n']
False
Hello again!
1.txt ['0\n', '0\n', '0\n', '1\n', '1\n', '1\n', '2\n', '2\n', '3\n', '3\n', '3\n']
False
Hello again!
2.txt ['0\n', '0\n', '0\n', '1\n', '1\n', '1\n', '2\n', '2\n', '3\n', '3\n', '3\n']
False
Hello again!

Process finished with exit code 0
  • 这个模块可以实现本地的文件异步I/O,支持版本在python3.6以上,具体可以参考github上的说明

总结一下

  1. 用async+await真的可以实现文件异步I/O,只不过需要安装另一个第三方库:aiofiles

  2. 废了那么多时间去研究这个,很大一部分原因是,钻牛角尖去了,但是还是有收获,至少知道了aiofiles可以实现

  3. 我参考的文档有这些:

  4. gevent模块虽然我最后没弄出来文件异步I/O,但是这个模块依然是个很强大的模块,可以参考这几个链接:

  5. aiofiles模块代码链接:

posted @ 2022-01-18 17:07  影梦无痕  阅读(972)  评论(2编辑  收藏  举报