8-3-2python语法基础-并发编程-协程-aiohttp的使用

####

安装与使用

安装与使用
install
pip install aiohttp 
简单实例使用
aiohttp的自我介绍中就包含了客户端和服务器端，所以我们分别来看下客户端和服务器端的简单实例代码。

客户端：
import aiohttp
import asyncio

async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text()


async def main():
    async with aiohttp.ClientSession() as session:
        html = await fetch(session, "http://httpbin.org/headers")
        print(html)

asyncio.run(main())


"""输出结果：
{
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "Python/3.7 aiohttp/3.6.2"
  }
}
"""
这个代码是不是很简单，一个函数用来发起请求，另外一个函数用来下载网页。

服务器端
from aiohttp import web


async def handle(request):
    name = request.match_info.get('name', "Anonymous")
    text = "Hello, " + name
    return web.Response(text=text)

app = web.Application()
app.add_routes([web.get('/', handle),
                web.get('/{name}', handle)])

if __name__ == '__main__':
    web.run_app(app)
运行这个代码，然后访问http://127.0.0.1:8080就可以看到你的网站了，很 基础的一个网页，你可以在后面跟上你的名字。

这是两个简单的关于aiohttp的使用示例，下面快速开始。

####

入门简单示范

入门简单示范
首先是学习客户端，也就是用来发送http请求的用法。首先看一段代码，会在代码中讲述需要注意的地方：

import aiohttp
import asyncio

async def main():
    async with aiohttp.ClientSession() as session:
        async with session.get('http://httpbin.org/get') as resp:
            print(resp.status)
            print(await resp.text())

asyncio.run(main())

####

代码解释

在网络请求中，一个请求就是一个会话，然后aiohttp使用的是ClientSession来管理会话，所以第一个重点，看一下ClientSession类:

在源码中，这个类的注释是使用HTTP请求接口的第一个类。然后上面的代码就是实例化一个ClientSession类然后命名为session，然后用session去发送请求。

async with aiohttp.ClientSession() as session:

注意如果是请求的https的，还可能遇到证书的问题，可以使用：
async with aiohttp.ClientSession(connector=aiohttp.TCPConnector(limit=64,verify_ssl=False)) as session:

下面讲解get的使用：
async with session.get('http://httpbin.org/get') as resp:

时候在发起网络请求的时候需要附加一些参数到url中，这一点也是支持的。

params = {'key1': 'value1', 'key2': 'value2'}
async with session.get('http://httpbin.org/get',
                       params=params) as resp:
    expect = 'http://httpbin.org/get?key2=value2&key1=value1'
    assert str(resp.url) == expect


读取响应内容
读取到服务器的响应状态和响应内容，这个很重要
用法：
async def main():
    async with aiohttp.ClientSession() as session:
        async with session.get('http://httpbin.org/get') as resp:
            print(resp.status)
            print(await resp.text(encoding=utf-8))


非文本内容格式
对于网络请求，有时候是去访问一张图片，这种返回值是二进制的也是可以读取到的：
await resp.read()
将text()方法换成read()方法就好。

#####

请求的自定义：

请求的自定义
自定义Headers
有时候做请求的时候需要自定义headers，主要是为了让服务器认为我们是一个浏览器。然后就需要我们自己来定义一个headers：

headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                      "AppleWebKit/537.36 (KHTML, like Gecko)"
                      " Chrome/78.0.3904.108 Safari/537.36"
    }
await session.post(url, headers=headers)


自定义cookie
发送你自己的cookies给服务器，你可以为ClientSession对象指定cookies参数:
url = 'http://httpbin.org/cookies'
cookies = {'cookies_are': 'working'}
async with ClientSession(cookies=cookies) as session:
    async with session.get(url) as resp:
        assert await resp.json() == {
           "cookies": {"cookies_are": "working"}}



使用代理
有时候在写爬虫的时候需要使用到代理，所以aiohttp也是支持使用代理的，我们可以在发起请求的时候使用代理，只需要使用关键字proxy来指明就好，
但是有一个很难受的地方就是它只支持http代理，不支持HTTPS代理。使用起来大概是这样：
    proxy = “http://127.0.0.1:10809” 
    async with aiohttp.ClientSession(headers=headers) as session:
        async with session.get(url=login_url, proxy=proxy) as response:
            resu = await response.text()

#####

和asyncio结合使用

和asyncio结合使用
其实aiohttp最适合的伴侣就是asyncio，这两个结合起来使用是最好不过的了。然后这里我就写一个简单的实例代码来对比一下。同步和异步的差别。

示例代码
示例就简单的用豆瓣电影吧，这是我从开始学习爬虫就一直练习的网站。然后写一些基本需要使用到的库：

lxml
requests
datetime
asyncio
aiohttp
然后需要大家安装好这些库，然后提取网页内容使用的是xpath。

同步
其实同步写了很多次了，然后把之前的代码放上来就好：

from datetime import datetime

import requests
from lxml import etree

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit"
                         "/537.36 (KHTML, like Gecko) "
                         "Chrome/72.0.3626.121 Safari/537.36"}


def get_movie_url():
    req_url = "https://movie.douban.com/chart"
    response = requests.get(url=req_url, headers=headers)
    html = etree.HTML(response.text)
    movies_url = html.xpath(
        "//*[@id='content']/div/div[1]/div/div/table/tr/td/a/@href")
    return movies_url


def get_movie_content(movie_url):
    response = requests.get(movie_url, headers=headers)
    result = etree.HTML(response.text)
    movie = dict()
    name = result.xpath('//*[@id="content"]/h1/span[1]//text()')
    author = result.xpath('//*[@id="info"]/span[1]/span[2]//text()')
    movie["name"] = name
    movie["author"] = author
    return movie


if __name__ == '__main__':
    start = datetime.now()
    movie_url_list = get_movie_url()
    movies = dict()
    for url in movie_url_list:
        movies[url] = get_movie_content(url)
    print(movies)
    print("同步用时为：{}".format(datetime.now() - start))

看一下同步的结果：

E:\venv\spider\Scripts\python.exe E:/python_project/filetest/douban.py
[{'name': ['小丑 Joker'], 'author': ['托德·菲利普斯']}, {'name': ['好莱坞往事 Once Upon a Time... in Hollywood'], .....
同步用时为：0:00:08.765342


异步
异步也很简单，关于异步的文章我还在整理，因为涉及到太多的东西了。先看这个爬虫代码：

import asyncio
from datetime import datetime

import aiohttp
from lxml import etree
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit"
                         "/537.36 (KHTML, like Gecko) "
                         "Chrome/72.0.3626.121 Safari/537.36"}


async def get_movie_url():
    req_url = "https://movie.douban.com/chart"
    async with aiohttp.ClientSession(headers=headers) as session:
        async with session.get(url=req_url, headers=headers) as response:
            result = await response.text()
            result = etree.HTML(result)
        return result.xpath("//*[@id='content']/div/div[1]/div/div/table/tr/td/a/@href")


async def get_movie_content(movie_url):
    async with aiohttp.ClientSession(headers=headers) as session:
        async with session.get(url=movie_url, headers=headers) as response:
            result = await response.text()
            result = etree.HTML(result)
        movie = dict()
        name = result.xpath('//*[@id="content"]/h1/span[1]//text()')
        author = result.xpath('//*[@id="info"]/span[1]/span[2]//text()')
        movie["name"] = name
        movie["author"] = author
    return movie

if __name__ == '__main__':
    start = datetime.now()
    loop = asyncio.get_event_loop()
    movie_url_list = loop.run_until_complete(get_movie_url())
    tasks = [get_movie_content(url) for url in movie_url_list]
    movies = loop.run_until_complete(asyncio.gather(*tasks))
    print(movies)
    print("异步用时为：{}".format(datetime.now() - start))

看一下结果，你就知道差距了：
E:\venv\spider\Scripts\python.exe E:/python_project/filetest/aio_douban.py
[{'name': ['小丑 Joker'], 'author': ['托德·菲利普斯']}, {'name': ['好莱坞往事 Once Upon a Time... in Hollywood'], .....
异步用时为：0:00:02.230956

Process finished with exit code 0
总结
异步是未来，这一点毋庸置疑。写了一点aiohttp库的基础使用，关于更多高级用法建议拜读参考文章中的文章。

####

posted @ 2021-07-22 19:07 技术改变命运Andy 阅读(118) 评论(0) 编辑收藏举报

刷新页面返回顶部