8251gr

Python Requests 最详细教程!爬虫必会之!

requests 是Python中一个非常出名的库,它极大的简化了 Python中进行HTTP请求的流程,我们来看一个简单的例子:

In [1]: import requests

In [2]: requests.get("https://jiajunhuang.com")
Out[2]: <Response [200]>

只需要两行便可以发起一个HTTP请求,多么的简单。

针对HTTP协议的 GET , POST , PUT , DELETE 等方法, requests 分别有:

requests.get
requests.options
requests.head
requests.post
requests.put
requests.patch
requests.delete

等对应的方法,他们都是 requests.request 的便捷版,也就是说,调用 requests.get 其实相当于调用 requests.request("GET", xxx) 。

接下来我们看看 requests.request 都能接受哪些参数,这其实也就是上述对应的 get 等方法能传入的参数:

def request(method, url, **kwargs):
    """Constructs and sends a :class:`Request <Request>`.

    :param method: method for the new :class:`Request` object.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary, list of tuples or bytes to send
        in the body of the :class:`Request`.
    :param data: (optional) Dictionary, list of tuples, bytes, or file-like
        object to send in the body of the :class:`Request`.
    :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
    :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
    :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
    :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
        ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
        or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
        defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
        to add for the file.
    :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
    :param timeout: (optional) How many seconds to wait for the server to send data
        before giving up, as a float, or a :ref:`(connect timeout, read
        timeout) <timeouts>` tuple.
    :type timeout: float or tuple
    :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
    :type allow_redirects: bool
    :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
    :param verify: (optional) Either a boolean, in which case it controls whether we verify
            the server's TLS certificate, or a string, in which case it must be a path
            to a CA bundle to use. Defaults to ``True``.
    :param stream: (optional) if ``False``, the response content will be immediately downloaded.
    :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response

    Usage::

      >>> import requests
      >>> req = requests.request('GET', 'https://httpbin.org/get')
      <Response [200]>
    """

    # By using the 'with' statement we are sure the session is closed, thus we
    # avoid leaving sockets open which can trigger a ResourceWarning in some
    # cases, and look like a memory leak in others.
    with sessions.Session() as session:
        return session.request(method=method, url=url, **kwargs)

我们来看看这些便捷函数的参数:

  • 第一个参数都是 url ,这就是要请求的 url ,例如 https://jiajunhuang.com ,是必填的。
  • 参数 params 是query string,它可以是一个字典,或者一个list,list的内容是一堆的tuple,也可以是bytes。
  • 参数 data 是 HTTP 请求中的 body ,它可以是一个字典,或者一个list,list的内容是一堆的tuple,也可以是bytes或者文件
  • 参数 json 是为了方便请求而提供的参数,它其实相当于 data ,但是会自动把请求的 Content-Type 设置为 application/json
  • 参数 headers 是 HTTP 请求中的头部,它是一个字典
  • 参数 cookies 是所携带的 Cookie ,它可以是一个字典或者 CookieJar 的实例
  • 参数 files 是要上传的文件,它可以是一个字典,字典的内容是tuple
  • 参数 auth 用于开启 HTTP 请求的认证
  • 参数 timeout 是超时时间
  • 参数 allow_redirects 是否允许重定向,它是一个布尔值
  • 参数 proxies 是是否使用代理
  • 参数 verify 是否检查服务端的证书,传布尔值或者 CA 的路径
  • 参数 stream 是布尔值,代表是否以流的方式读取结果
  • 参数 cert 传入客户端的SSL证书

而 requests 中, HTTP 请求的返回结果是 Response 的实例,包括方法:

  • json 用于把返回结果自动转换成 JSON
  • ok 用于检查返回的状态码是否是 400 以下,注意,ok是一个property
  • text 用于以文本方式输出返回结果,注意,text是一个 property

因此,我们可以看看常见的两种用法,一种是请求接口并且转换成 JSON :

import requests

resp = requests.get("https://api.jiajunhuang.com/v1/ip")
print(resp.json())

另外一种是打印状态码并且打印出响应结果:

import requests

resp = requests.get("https://api.jiajunhuang.com/v1/ip")
print(resp.status_code)
print(resp.text)

posted on 2023-03-27 20:15  牛二娃  阅读(386)  评论(0编辑  收藏  举报

导航