Python Requests 最详细教程!爬虫必会之!
requests 是Python中一个非常出名的库,它极大的简化了 Python中进行HTTP请求的流程,我们来看一个简单的例子:
In [1]: import requests
In [2]: requests.get("https://jiajunhuang.com")
Out[2]: <Response [200]>
只需要两行便可以发起一个HTTP请求,多么的简单。
针对HTTP协议的 GET
, POST
, PUT
, DELETE
等方法, requests
分别有:
requests.get
requests.options
requests.head
requests.post
requests.put
requests.patch
requests.delete
等对应的方法,他们都是 requests.request
的便捷版,也就是说,调用 requests.get
其实相当于调用 requests.request("GET", xxx)
。
接下来我们看看 requests.request
都能接受哪些参数,这其实也就是上述对应的 get
等方法能传入的参数:
def request(method, url, **kwargs):
"""Constructs and sends a :class:`Request <Request>`.
:param method: method for the new :class:`Request` object.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary, list of tuples or bytes to send
in the body of the :class:`Request`.
:param data: (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the :class:`Request`.
:param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
:param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
:param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
:param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
to add for the file.
:param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
:param timeout: (optional) How many seconds to wait for the server to send data
before giving up, as a float, or a :ref:`(connect timeout, read
timeout) <timeouts>` tuple.
:type timeout: float or tuple
:param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
:type allow_redirects: bool
:param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
:param verify: (optional) Either a boolean, in which case it controls whether we verify
the server's TLS certificate, or a string, in which case it must be a path
to a CA bundle to use. Defaults to ``True``.
:param stream: (optional) if ``False``, the response content will be immediately downloaded.
:param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
:return: :class:`Response <Response>` object
:rtype: requests.Response
Usage::
>>> import requests
>>> req = requests.request('GET', 'https://httpbin.org/get')
<Response [200]>
"""
# By using the 'with' statement we are sure the session is closed, thus we
# avoid leaving sockets open which can trigger a ResourceWarning in some
# cases, and look like a memory leak in others.
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)
我们来看看这些便捷函数的参数:
- 第一个参数都是
url
,这就是要请求的url
,例如https://jiajunhuang.com
,是必填的。 - 参数
params
是query string,它可以是一个字典,或者一个list,list的内容是一堆的tuple,也可以是bytes。 - 参数
data
是HTTP
请求中的body
,它可以是一个字典,或者一个list,list的内容是一堆的tuple,也可以是bytes或者文件 - 参数
json
是为了方便请求而提供的参数,它其实相当于data
,但是会自动把请求的Content-Type
设置为application/json
- 参数
headers
是HTTP
请求中的头部,它是一个字典 - 参数
cookies
是所携带的Cookie
,它可以是一个字典或者CookieJar
的实例 - 参数
files
是要上传的文件,它可以是一个字典,字典的内容是tuple - 参数
auth
用于开启HTTP
请求的认证 - 参数
timeout
是超时时间 - 参数
allow_redirects
是否允许重定向,它是一个布尔值 - 参数
proxies
是是否使用代理 - 参数
verify
是否检查服务端的证书,传布尔值或者CA
的路径 - 参数
stream
是布尔值,代表是否以流的方式读取结果 - 参数
cert
传入客户端的SSL证书
而 requests
中, HTTP
请求的返回结果是 Response
的实例,包括方法:
json
用于把返回结果自动转换成JSON
ok
用于检查返回的状态码是否是 400 以下,注意,ok是一个propertytext
用于以文本方式输出返回结果,注意,text是一个 property
因此,我们可以看看常见的两种用法,一种是请求接口并且转换成 JSON
:
import requests
resp = requests.get("https://api.jiajunhuang.com/v1/ip")
print(resp.json())
另外一种是打印状态码并且打印出响应结果:
import requests
resp = requests.get("https://api.jiajunhuang.com/v1/ip")
print(resp.status_code)
print(resp.text)