Python之Requests模块

Requests 支持 Python 2.6—2.7以及3.3—3.7

官方文档：https://docs.python-requests.org/zh_CN/latest/

发送请求

首先需要导入模块：

import requests

`get`请求

url = r'http://httpbin.org/get'
r= requests.get(url=url)

带参数的get请求

向url传递参数，如：httpbin.org/get?key=val，可使用params关键字参数（字典）。

url = r'http://httpbin.org/get'
payload1 = {'key1':'aaa','key2':'bbb'}
payload2 = {'key1':'aaa','key2':['bbb','ccc']}

r1= requests.get(url=url,params=payload1)
r2= requests.get(url=url,params=payload2)

打印该url，可见url已被正确编码：

>>> print(r1.url)
http://httpbin.org/get?key1=aaa&key2=bbb
>>> print(r2.url)
http://httpbin.org/get?key1=aaa&key2=bbb&key2=ccc

注意字典里值为 None 的键都不会被添加到 URL 的查询字符串里。

`post`请求

基础用法：

url = r'http://httpbin.org/post'
data = {'key':'value'}
r = requests.post(url=url,data=data)			# data参数为关键字参数，不赋值也可

`data`参数

传入字典dict数据

发送表单形式的数据（类似html表单），只需要传递一个dict给data关键字参数。

数据字典在requests发出请求时会被自动编码为表单形式。

url= r'https://httpbin.org/post'
data = {'key1':'aaa','keys':'bbb'}
r = requests.post(url=url,data=data)
print(r.text)

输出：

{
  ...
  "form": {
    "key1": "aaa",
    "key2": "bbb"
  },
  ...
}

字典的值可以为列表：

url= r'https://httpbin.org/post'
data = {'key1':'aaa','keys':['bbb','ccc']}
r = requests.post(url=url,data=data)
print(r.text)

输出：

{
  ...
  "form": {
    "key1": "aaa",
    "keys": [
      "bbb",
      "ccc"
    ]
  },
  ...
}

传递列表list、元组tuple的数据：

以下集几种数据类型，效果相同。

data = {'key1':'aaa','key2':['bbb','ccc']}

data = [['key1','aaa'],['key2','bbb'],['key2','ccc']]

data = (('key1','aaa'),('key2','bbb'),('key2','ccc'))

传递字符串string数据

传递字符串数据，将不会被编码为表单形式，而是直接发布出去。

url= r'https://httpbin.org/post'
data = r"{'key1':'aaa','keys':'bbb'}"
r = requests.post(url=url,data=data)
print(r.text)

输出：

{
  ...
  "data": "{'key1':'aaa','keys':'bbb'}",
  "form": {},
  "json": null,
  ...
}

传递json字符串

传递json字符串，需要先将python数据编码为json字符串

url= r'https://httpbin.org/post'
data = {'key1':'aaa','keys':'bbb'}
r = requests.post(url=url,data=json.dumps(data))			# json.dumps()方法，将python数据类型转换成json字符串，需要import json
print(r.text)

输出：

{
  ...
  "data": "{\"key1\": \"aaa\", \"keys\": \"bbb\"}",
  "form": {},
  "json": {
    "key1": "aaa",
    "keys": "bbb"
  },
  ...
}

json参数

发送json形式的数据，只需要传递一个dict给json关键字参数。

dict数据在requests发出请求时会被自动编码为json形式。

url= r'https://httpbin.org/post'
data = {'key1':'aaa','keys':'bbb'}
r = requests.post(url=url,json=data)
print(r.text)

输出：

{
  ...
  "data": "{\"key1\": \"aaa\", \"keys\": \"bbb\"}",
  "form": {},
  "json": {
    "key1": "aaa",
    "keys": "bbb"
  },
  ...
}

从输出可见，效果同传入json字符串给data一样。

`files`参数上传文件

Requests 使得上传多部分编码(Multipart-Encoded)文件变得很简单

url= r'https://httpbin.org/post'
with open(file='study_requests/numbers.csv',mode='rb') as f:			# 二进制文件，mode='rb'
    files = {'file':f,}
    r = requests.post(url=url,files=files)
print(r.text)

输出：

{
  ...
  "files": {
    "file": "1,2,3,4\na,b,c,d\n"
  },
  ...
}

也可以发送作为文件来接收的字符串：

url= r'https://httpbin.org/post'
files = {'file':('report.csv','1,2,3,4\n,a,b,c,d')}
r = requests.post(url=url,files=files)

可以显式地设置文件名，文件类型和请求头（下面是官网示例代码）：

>>> url = 'http://httpbin.org/post'
>>> files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}

>>> r = requests.post(url, files=files)
>>> r.text
{
  ...
  "files": {
    "file": "<censored...binary...data>"
  },
  ...
}

建议用二进制模式(binary mode)打开文件。这是因为 Requests 可能会试图为你提供 Content-Length header，在它这样做的时候，这个值会被设为文件的字节数（bytes）。如果用文本模式(text mode)打开文件，就可能会发生错误。

如果你发送一个非常大的文件作为 multipart/form-data 请求，你可能希望将请求做成数据流。默认下 requests 不支持, 但有个第三方包 requests-toolbelt 是支持的。你可以阅读 toolbelt 文档来了解使用方法。

流式上传

Requests支持流式上传，这允许你发送大的数据流或文件而无需先把它们读入内存。

要使用流式上传，仅需为你的请求体提供一个类文件对象即可：

with open('massive-body','rb') as f:
    requests.post('http://some.url/streamed', data=f)

上传多个文件

可以在一个请求中发送多个文件。只要把文件设到一个元组的列表中，其中元组结构为 (form_field_name, file_info)。

例如，假设你要上传多个图像文件到一个 HTML 表单，使用一个多文件 field 叫做 "images":

<input type="file" name="images" multiple="true" required="true"/>

代码：

>>> url = 'http://httpbin.org/post'
>>> multiple_files = [
        ('images', ('foo.png', open('foo.png', 'rb'), 'image/png')),
        ('images', ('bar.png', open('bar.png', 'rb'), 'image/png'))]
>>> r = requests.post(url, files=multiple_files)
>>> r.text
{
  ...
  'files': {'images': 'data:image/png;base64,iVBORw ....'}
  'Content-Type': 'multipart/form-data; boundary=3131623adb2043caaeb5538cc7aa0b3a',
  ...
}

其它请求

>>> r = requests.put('http://httpbin.org/put', data = {'key':'value'})
>>> r = requests.delete('http://httpbin.org/delete')
>>> r = requests.head('http://httpbin.org/get')
>>> r = requests.options('http://httpbin.org/get')

获取响应

Requests会自动解码来自服务器的响应内容，包括被gzip压缩的格式。

r.encoding，响应的编码，Requests会基于 HTTP 头部对响应的编码作出有根据的推测。可对其赋值。Requests将以新值对r.text重新编码。

r.rext，文本形式的响应内容，自动根据响应头部的字符编码进行解码。当你访问 r.text 之时，Requests 会使用r.encoding对其编码。

r.content，字节形式（二进制）的响应内容。会自动解码 gzip 和 deflate 压缩。

r.json()，json格式的响应内容，字典类型。requests内置的json解码器，调用失败将抛出异常，调用成功也不意味着响应成功（有的服务器会在失败的响应中包含一个 JSON 对象（比如 HTTP 500 的错误细节）。所以最好使用 r.raise_for_status() 或者检查 r.status_code 是否和你的期望相同。

r.raw，原始套接字响应，必须在初始请求中设置 stream=True。

r.status_code，响应状态码。为方便引用，Requests还附带了一个内置的状态码查询对象requests.codes.ok

>>> r.status_code == requests.codes.ok
True

r.raise_for_status()，请求失败时，抛出异常。请求正常时，返回None

下面的代码将展示r.text、r.content、r.json()的效果：

url = r'https://httpbin.org/get'
r = requests.get(url=url)

print(r.url, end='\n-------------\n')
print(r.encoding, end='\n-------------\n')
print(r.text, end='\n-------------\n')
print(r.content, end='\n-------------\n')
print(r.json(), end='\n-------------\n')
print(type(r.json()))

输出：

https://httpbin.org/get
-------------
utf-8
-------------
{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Host": "httpbin.org",
    "User-Agent": "python-requests/2.26.0",
    "X-Amzn-Trace-Id": "Root=1-61d11e2d-43e8d7612c24f5075f3c902e"
  },
  "origin": "58.39.97.213",
  "url": "https://httpbin.org/get"
}

-------------
b'{\n  "args": {}, \n  "headers": {\n    "Accept": "*/*", \n    "Accept-Encoding": "gzip, deflate", \n    "Host": "httpbin.org", \n    "User-Agent": "python-requests/2.26.0", \n    "X-Amzn-Trace-Id": "Root=1-61d11e2d-43e8d7612c24f5075f3c902e"\n  }, \n  "origin": "58.39.97.213", \n  "url": "https://httpbin.org/get"\n}\n'       
-------------
{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.26.0', 'X-Amzn-Trace-Id': 'Root=1-61d11e2d-43e8d7612c24f5075f3c902e'}, 'origin': '58.39.97.213', 'url': 'https://httpbin.org/get'}
-------------
<class 'dict'>

r.raw最大的作用是流下载，这里不说明了，可查阅官网。

获取请求头和响应头

r.headers，服务器响应头，字典形式。

r.requests.headers，发送到服务器的请求头，字典形式。

url = r'https://httpbin.org/get'
r = requests.get(url=url)
print('请求头:\n', r.request.headers)
print('响应头:\n', r.headers)

输出：

请求头:
 {'User-Agent': 'python-requests/2.26.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
响应头:
 {'Date': 'Sun, 02 Jan 2022 06:36:20 GMT', 'Content-Type': 'application/json', 'Content-Length': '306', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}

可以使用任意大小写形式来访问这些头字段：

>>> r.headers['Content-Type']
'application/json'
>>> r.headers.get('content-type')
'application/json'
>>> r.request.headers['User-Agent']
'python-requests/2.26.0'

添加请求头

给请求添加HTTP头部信息，只需要传递一个dict给headers关键字参数即可。

r.request.headers，查看发送请求的头部信息。

url = r'https://httpbin.org/get'
headers = {'user-agent':'my-app/0.0.1'}
r = requests.get(url=url,headers=headers)
print(r.request.headers)

输出：

{'user-agent': 'my-app/0.0.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

注意: 定制 header 的优先级低于某些特定的信息源，例如：

如果在 .netrc 中设置了用户认证信息，使用 headers= 设置的授权就不会生效。而如果设置了 auth= 参数，.netrc 的设置就无效了。

如果被重定向到别的主机，授权 header 就会被删除。

代理授权 header 会被 URL 中提供的代理身份覆盖掉。

在我们能判断内容长度的情况下，header 的 Content-Length 会被改写。

更进一步讲，Requests 不会基于定制 header 的具体情况改变自己的行为。只不过在最后的请求中，所有的 header 信息都会被传递进去。

获取响应中的cookie：

r.cookies可以获取响应中的cookie，前提是响应中包含

>>> url = 'http://example.com/some/cookie/setting/url'
>>> r = requests.get(url)

>>> r.cookies['example_cookie_name']
'example_cookie_value'

发送cookies到服务器：

可以传递一个dict给 cookies 参数：

url = r'https://httpbin.org/cookies'
cookies = {'cookies_1':'working'}

r = requests.get(url=url,cookies=cookies)
print(r.text)
print(r.cookies)
print(r.cookies.get('cookies_1'))
print(r.cookies['cookies_1'])

输出：

{
  "cookies": {
    "cookies_1": "working"
  }
}

<RequestsCookieJar[]>			# Cookie 的返回对象为 RequestsCookieJar，它的行为和字典类似
None
Traceback (most recent call last):
  File "d:/software/VSCode/Project_study/study_requests/study_requests.py", line 59, in <module>
    print(r.cookies['cookies_1'])
  File "C:\Users\10282787\.virtualenvs\Project_study-Zg5dkSCc\lib\site-packages\requests\cookies.py", line 328, in __getitem__
    return self._find_no_duplicates(name)
  File "C:\Users\10282787\.virtualenvs\Project_study-Zg5dkSCc\lib\site-packages\requests\cookies.py", line 399, in _find_no_duplicates
    raise KeyError('name=%r, domain=%r, path=%r' % (name, domain, path))
KeyError: "name='cookies_1', domain=None, path=None"

Cookie 的返回对象为 RequestsCookieJar，它的行为和字典类似，但接口更为完整，适合跨域名跨路径使用。你还可以把 Cookie Jar 传到 Requests 中：

>>> jar = requests.cookies.RequestsCookieJar()
>>> jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')
>>> jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere')
>>> url = 'http://httpbin.org/cookies'
>>> r = requests.get(url, cookies=jar)
>>> r.text
'{"cookies": {"tasty_cookie": "yum"}}'

重定向和请求历史

默认情况下，除了 HEAD, Requests 会自动处理所有重定向。

HEAD方法，可以设置allow_redirects=True启用重定向；
GET、POST、OPTIONS、PUT、PATCH 或者 DELETE，可以设置allow_redirects=False禁用重定向；

可以使用响应对象的 r.history 方法来追踪重定向，它是一个 Response 对象的列表，按照从最老到最近的请求进行排序。

>>> r = requests.get('http://github.com')
>>> r.url
'https://github.com/'
>>> r.status_code
200
>>> r.history
[<Response [301]>]

超时

可以设置requests请求方法的timeout参数，超过设定的秒数时间之后停止等待响应。

建议所有代码均设置该参数，否则程序可能永远失去响应。

url = r'http://www.httpbin.org'
r = requests.get(url=url,timeout=0.01)

异常：

Traceback (most recent call last):
    File "d:/software/VSCode/Project_study/study_requests/study_requests.py", line 45, in <module>
        r = requests.get(url=url,timeout=0.01)
    ...
requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='www.httpbin.org', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x0000015C0428A430>, 'Connection to www.httpbin.org timed out. (connect timeout=0.01)'))

注意：

timeout 仅对连接过程有效，与响应体的下载无关。 timeout 并不是整个下载响应的时间限制，而是如果服务器在 timeout 秒内没有应答，将会引发一个异常（更精确地说，是在timeout 秒内没有从基础套接字上接收到任何字节的数据时）If no timeout is specified explicitly, requests do not time out.

错误与异常

遇到网络问题（如：DNS 查询失败、拒绝连接等）时，Requests 会抛出一个 ConnectionError 异常。

如果 HTTP 请求返回了不成功的状态码（请求失败）， Response.raise_for_status() 会抛出一个 HTTPError异常。

若请求超时，则抛出一个 Timeout 异常。

若请求超过了设定的最大重定向次数，则会抛出一个 TooManyRedirects 异常。

所有Requests显式抛出的异常都继承自 requests.exceptions.RequestException 。

解决SSL证书问题

参考：https://www.jianshu.com/p/d715df88a5ef

requests库支持.crt和.key证书

requests库支持这两种证书

import requests

url = r'https://www.example.com/path'
resp = requests.post(url=url, data='payload', cert=('example.crt', 'example.key'), verify=False)  # 若需要对响应包进行验证，则需要给 verify 传参

requests-pkcs12库支持pfx证书

pfs证书，可使用requests-pkcs12库

文档：https://www.cnpython.com/pypi/requests-pkcs12

在拥有 .pfx 文件和其密码(若有加密)的前提下进行 https 请求。

获取到请求后，响应的使用方法和requests一样：

import requests
import requests_pkcs12
url = r'https://www.example.com/path'
resp = requests_pkcs12.post(url=url, pkcs12_filename=r'D:/tmp/client/aaa/client.pfx', pkcs12_password='123456', verify=False)
print(resp.status_code)
print(resp.text)

posted @ 2022-02-04 21:39 wuenwuen 阅读(720) 评论(0) 收藏举报

刷新页面返回顶部

Loading

windInAlley

Python之Requests模块

发送请求

`get`请求

带参数的get请求

`post`请求

`data`参数

json参数

`files`参数上传文件

流式上传

上传多个文件

其它请求

获取响应

获取请求头和响应头

添加请求头

重定向和请求历史

超时

错误与异常

解决SSL证书问题

requests库支持.crt和.key证书

requests-pkcs12库支持pfx证书

公告

Loading

windInAlley

Python之Requests模块

发送请求

get请求

带参数的get请求

post请求

data参数

json参数

files参数上传文件

流式上传

上传多个文件

其它请求

获取响应

获取请求头和响应头

添加请求头

Cookie

重定向和请求历史

超时

错误与异常

解决SSL证书问题

requests库支持.crt和.key证书

requests-pkcs12库支持pfx证书

公告

`get`请求

`post`请求

`data`参数

`files`参数上传文件