Python-Requests 笔记（Ⅰ）

Requests

Requests是基于urllib，Python实现的一款可以便捷发送HTTP/HTTPS请求的工具。它能让你方便的添加请求字符串到URL中，或者规定发送的数据的格式。 Keep-Alive以及HTTP连接池机制则是完全自动管理的。

代码示例：

 1 >>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
 2 >>> r.status_code
 3 200
 4 >>> r.headers['content-type']
 5 'application/json; charset=utf8'
 6 >>> r.encoding
 7 'utf-8'
 8 >>> r.text
 9 '{"type":"User"...'
10 >>> r.json()
11 {'private_gists': 419, 'total_private_repos': 77, ...}

安装Requests

1 $ python -m pip install requests

使用Requests：

导入requests模块，并发起一个get请求， r是响应的对象，我们能从r中获取到所需要的各种信息

import requests
r = requests.get('<https://api.github.com/events>')

除了get请求外，其他类型请求的写法：

r = requests.post('<https://httpbin.org/post>', data = {'key':'value'})
r = requests.put('<https://httpbin.org/put>', data = {'key':'value'})
r = requests.delete('<https://httpbin.org/delete>')
r = requests.head('<https://httpbin.org/get>')
r = requests.options('<https://httpbin.org/get>')

使用params关键字参数来发送一个URL请求字符串（httpbin.org/get?key=val）

payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get('<https://httpbin.org/get>', params=payload)

>>> print(r.url)
<https://httpbin.org/get?key2=value2&key1=value1>

Response Content

Requests会自动的解码从服务器返回的内容

HTML/XML支持在数据体中定义编码格式，对于此类格式的返回数据，应先使用r.content找到文件中给定的编码，然后使用Set r.encoding

Binary / JSON Response Content

# 二进制response示例
from PIL import Image
from io import BytesIO

i = Image.open(BytesIO(r.content))
# JSON response
>>> import requests

>>> r = requests.get('<https://api.github.com/events>')
>>> r.json()
[{'repository': {'open_issues': 0, 'url': '<https://github.com/>...

注：

若JSON解码失败， r.json() 会抛出一个异常 requests.exceptions.JSONDecodeError

而r.json成功并不代表返回数据是成功的，一些服务器会在失败的情况下返回JSON对象(e.g. error details with HTTP 500) 要判断一个请求是否成功，请使用 r.raise_for_status() 或者 r.status_code

RAW Response Content

想要获取raw格式的返回数据，需要设置stream = True

>>> r = requests.get('<https://api.github.com/events>', stream=True)

>>> r.raw
<urllib3.response.HTTPResponse object at 0x101194810>

>>> r.raw.read(10)
'\\x1f\\x8b\\x08\\x00\\x00\\x00\\x00\\x00\\x00\\x03'

一般情况下，存储流数据应使用下面的形式

with open(filename, 'wb') as fd:
    for chunk in r.iter_content(chunk_size=128):
        fd.write(chunk)

Response.iter_content 会自动处理一些东西（例如解码gzip ，deflate），而使用Response.raw则不处理任何返回数据

当下载流数据时，应使用Response.iter_content

而仅当你很清楚自己需要原始数据时，再使用Response.raw

自定义Headers

在请求中加入HTTP头数据，使用下面的写法

>>> url = '<https://api.github.com/some/endpoint>'
>>> headers = {'user-agent': 'my-app/0.0.1'}

>>> r = requests.get(url, headers=headers)

POST请求

发送form-encoded 数据：

>>> payload = {'key1': 'value1', 'key2': 'value2'}

>>> r = requests.post("<https://httpbin.org/post>", data=payload)
>>> print(r.text)
{
  ...
  "form": {
    "key2": "value2",
    "key1": "value1"
  },
  ...
}

Response Status Codes

查看返回状态码

>>> r = requests.get('<https://httpbin.org/get>')
>>> r.status_code
200

Requests也有内建的状态码查看方式

>>> r.status_code == requests.codes.ok
True

使用Response.raise_for_status()来处理失败的请求

>>> bad_r = requests.get('<https://httpbin.org/status/404>')
>>> bad_r.status_code
404

>>> bad_r.raise_for_status()
Traceback (most recent call last):
  File "requests/models.py", line 832, in raise_for_status
    raise http_error
requests.exceptions.HTTPError: 404 Client Error

Response Headers

查看服务器的响应数据头

>>> r.headers
{
    'content-encoding': 'gzip',
    'transfer-encoding': 'chunked',
    'connection': 'close',
    'server': 'nginx/1.0.4',
    'x-runtime': '148ms',
    'etag': '"e1ca502697e5c9317743dc078f67693f"',
    'content-type': 'application/json'
}

这类字典是专为HTTP headers设计的，不区分大小写，因此想要访问header中的信息，以下写法均可

>>> r.headers['Content-Type']
'application/json'

>>> r.headers.get('content-type')
'application/json'

若响应数据包含Cookies，可以用下面的方式访问

>>> url = '<http://example.com/some/cookie/setting/url>'
>>> r = requests.get(url)

>>> r.cookies['example_cookie_name']
'example_cookie_value'

向服务器发送自己的Cookies

>>> url = '<https://httpbin.org/cookies>'
>>> cookies = dict(cookies_are='working')

>>> r = requests.get(url, cookies=cookies)
>>> r.text
'{"cookies": {"cookies_are": "working"}}'

Timeouts

标准的代码应该给Requests设置超时时间，否则可能造成请求无限期挂起

>>> requests.get('<https://github.com/>', timeout=0.001)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): 
Request timed out. (timeout=0.001)

超时参数并不是一个时间限制，而是当服务器没有在time out 时间内发出响应，则会引发异常。如果未明确time out的值，则请求不会超时

Errors and Exceptions

Response.raise_for_status() HTTP请求返回失败的状态码

Timeout 请求超时

TooManyRedirects 请求超出了重定向数量最大值

requests.exceptions.RequestException Requests的所有异常都显示继承自该异常

Session

Session对象可以实现跨请求保留某些参数。它能在Session实例发出的所有请求中保留Cookies，并使用urllib3连接池。

如果你向同一个主机发送多个请求，底层TCP会复用，使得性能显著提升。

示例：

s = requests.Session()

s.get('<https://httpbin.org/cookies/set/sessioncookie/123456789>')
r = s.get('<https://httpbin.org/cookies>')

print(r.text)
# '{"cookies": {"sessioncookie": "123456789"}}'

Session也可以用于上下文管理：

　　Session在with退出时也会关闭

with requests.Session() as s:
    s.get('<https://httpbin.org/cookies/set/sessioncookie/123456789>')

Request and Response Objects

当调用requests.get() 或者其他方法时，Requests完成2个工作：

构建了一个Request对象，它将被发送到服务端去获取服务端资源
当Requests收到服务端的响应，一个Response对象就被创建了。Response 包含了服务端返回的所有数据并且包含了一开始创建的Request对象

>>> r = requests.get('<https://en.wikipedia.org/wiki/Monty_Python>')

获取返回的数据的headers

>>> r.headers
{'content-length': '56170', 'x-content-type-options': 'nosniff', 'x-cache':
'HIT from cp1006.eqiad.wmnet, MISS from cp1010.eqiad.wmnet', 'content-encoding':
'gzip', 'age': '3080', 'content-language': 'en', 'vary': 'Accept-Encoding,Cookie',
'server': 'Apache', 'last-modified': 'Wed, 13 Jun 2012 01:33:50 GMT',
'connection': 'close', 'cache-control': 'private, s-maxage=0, max-age=0,
must-revalidate', 'date': 'Thu, 14 Jun 2012 12:59:39 GMT', 'content-type':
'text/html; charset=UTF-8', 'x-cache-lookup': 'HIT from cp1006.eqiad.wmnet:3128,
MISS from cp1010.eqiad.wmnet:80'}

同样的，可以获取我们一开始发送给服务端的对象

>>> r.request.headers
{'Accept-Encoding': 'identity, deflate, compress, gzip',
'Accept': '*/*', 'User-Agent': 'python-requests/1.2.0'}

SSL Cert Verification

对于HTTPS请求，Requests和web浏览器一样也需要校验SSL。SSL校验默认enabled

设置CA_BOUNDLE文件或者包含受信任的CAs的路径：

>>> requests.get('<https://github.com>', verify='/path/to/certfile')

或者在session中复用认证：

s = requests.Session()
s.verify = '/path/to/certfile'

Requests会在verify设置为False的时候，忽略对SSL的认证

>>> requests.get('<https://kennethreitz.org>', verify=False)
<Response [200]>

注意：若verify设置为False，Requests会接受服务端提供的任何TLS证书，并且忽略主机名不匹配，忽略过期的证书。这将导致应用容易受到中间人（MitM）的攻击。因此False可以用于开发，调试阶段，但不推荐应用在生产环境中。

Proxies

若用到代理，可以用proxies设置单个requests

import requests

proxies = {
  'http': '<http://10.10.1.10:3128>',
  'https': '<http://10.10.1.10:1080>',
}

requests.get('<http://example.org>', proxies=proxies)

或者设置它在整个Session中生效

import requests

proxies = {
  'http': '<http://10.10.1.10:3128>',
  'https': '<http://10.10.1.10:1080>',
}
session = requests.Session()
session.proxies.update(proxies)

session.get('<http://example.org>')

当没有如上面例子中一样覆写proxies，Request会默认使用标准环境变量中的http_proxy, https_proxy, no_proxy 及curl_ca_bundle

当有需要时，可以设置这些变量去作为Requests的配置（仅设置相关的变量）

$ export HTTP_PROXY="<http://10.10.1.10:3128>"
$ export HTTPS_PROXY="<http://10.10.1.10:1080>"

$ python
>>> import requests
>>> requests.get('<http://example.org>')

HTTP Basic Auth结合proxy使用，使用下面示例的语法

$ export HTTPS_PROXY="<http://user:pass@10.10.1.10:1080>"

$ python
>>> proxies = {'http': '<http://user:pass@10.10.1.10:3128/>'}

给Proxy指定一个 URL，这样任何访问这个URL的请求都会使用设置的代理

proxies = {'<http://10.20.1.128>': '<http://10.10.1.10:5323>'}

最后，请注意对HTTPS链接使用代理，需要本机信任代理的根证书，Requests默认信任的证书列表可以通过下面的方式查看

from requests.utils import DEFAULT_CA_BUNDLE_PATH
print(DEFAULT_CA_BUNDLE_PATH)

设置curl_ca_bundle环境变量来覆写默认的证书包

$ export curl_ca_bundle="/usr/local/myproxy_info/cacert.pem"
$ export https_proxy="<http://10.10.1.10:1080>"

$ python
>>> import requests
>>> requests.get('<https://example.org>')

posted @ 2021-08-08 23:29 landof 阅读(300) 评论(0) 编辑收藏举报

刷新页面返回顶部

Python-Requests 笔记 （Ⅰ）

Requests

安装Requests

使用Requests：

导入requests模块，并发起一个get请求， r是响应的对象，我们能从r中获取到所需要的各种信息

除了get请求外，其他类型请求的写法：

使用params关键字参数来发送一个URL请求字符串（httpbin.org/get?key=val）

Response Content

Binary / JSON Response Content

RAW Response Content

自定义Headers

POST请求

Response Status Codes

Response Headers

Cookies

Timeouts

Errors and Exceptions

Session

Request and Response Objects

SSL Cert Verification

Proxies

公告

Python-Requests 笔记（Ⅰ）