1-Python - aiohttp
About
asyncio
的鼎鼎大名就不用多说了吧,谁用谁糊涂!
今天来看看它儿子怎么aiohttp
怎么用。
download
pip install aiohttp
无返回值的多任务
import time
import asyncio
import aiohttp
urls = [
'https://www.baidu.com',
'https://edgeapi.rubyonrails.org/',
'https://www.cnblogs.com',
'https://www.bing.com',
'https://www.zhihu.com/',
]
async def get(url): # async开头
async with aiohttp.ClientSession() as session:
async with session.get(url) as result:
print(result.status, result.url)
t1 = time.time()
loop = asyncio.get_event_loop() # 创建一个事件循环模型
tasks = [get(i) for i in urls] # 初始化任务列表
loop.run_until_complete(asyncio.wait(tasks)) # 执行任务
print('running time: ', time.time() - t1)
async with aiohttp.ClientSession() as session:
中的async with aiohttp.ClientSession() as
是固定写法,至于as
后面的session
可以自定义。
虽然,我们能打印了,但是,我们怎么能获取到返回值呢?
有返回值的多任务
import time
import asyncio
import aiohttp
from fake_useragent import UserAgent # pip install fake_useragent
urls = [
'https://www.baidu.com',
'https://edgeapi.rubyonrails.org/',
'https://www.cnblogs.com',
'https://www.bing.com',
'https://www.zhihu.com/',
]
async def get(url):
async with aiohttp.ClientSession() as session:
headers = {'User-Agent': UserAgent().random}
async with session.request(method='get', url=url, headers=headers) as result:
return result.status, result.url
t1 = time.time()
loop = asyncio.get_event_loop()
# 想要获取返回值需要使用 loop.create_task(get(i))
tasks = [loop.create_task(get(i)) for i in urls]
loop.run_until_complete(asyncio.wait(tasks))
for i in tasks:
print(i.result()) # 循环tasks获取每个result
loop.close()
print('running time: ', time.time() - t1)
上例展示了带请求头的写法。
看到session.request(method='get', url=url, headers=headers)
这种写法,你一定不陌生,其实aiohttp
和requets
模块用法基本一致。
再来看,进一步封装的用法:
urls = [
'https://www.baidu.com', 'https://edgeapi.rubyonrails.org/', 'https://www.cnblogs.com',
'https://www.bing.com', 'https://www.zhihu.com/',
]
import time
import asyncio
import aiohttp
from fake_useragent import UserAgent # pip install fake_useragent
async def get(url):
async with aiohttp.ClientSession() as session:
headers = {'User-Agent': UserAgent().random}
async with session.request(method='get', url=url, headers=headers) as result:
return result.status, result.url
async def main():
task_l = [get(i) for i in urls]
for ret in asyncio.as_completed(task_l):
res = await ret
print(res)
t1 = time.time()
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
print('running time: ', time.time() - t1)
欢迎斧正,未完... see also:[Python aiohttp异步爬虫(萌新读物,大神勿扰)](