1)requests模块

一:requests 介绍

  requests 是使用 Apache2 Licensed 许可证的 基于Python开发的HTTP 库,其在Python内置模块的基础上进行了高度的封装,

  从而使得Pythoner进行网络请求时,变得美好了许多,使用requests可以轻而易举的完成浏览器可有的任何操作。

二:requests 安装

  pip install requests

三:requests常用方法

    response=requests.get(url)  #以GET方式请求
  response=requests.post(url) #以POST方式请求
    response.text   #获取内容
    response.content
    response.encoding #设置编码格式
    response.apparent_encoding#自动获取编码
    response.code_status #200,404 #返回数据的状态码

    response.cookies.get_dict() #获取cookies信息

    requests.get(url,cookie={})

    requests.get
    requests.post
    requests.delete
    requests.request(
        'get',#post,get,delete...
    )

四:requests常用参数

def request(method, url, **kwargs):
    """Constructs and sends a :class:`Request <Request>`.

    :param method: method for the new :class:`Request` object.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
    :param data: (optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`.
    :param json: (optional) json data to send in the body of the :class:`Request`.
    :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
    :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
    :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
        ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
        or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
        defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
        to add for the file.
    :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
    :param timeout: (optional) How long to wait for the server to send data
        before giving up, as a float, or a :ref:`(connect timeout, read
        timeout) <timeouts>` tuple.
    :type timeout: float or tuple
    :param allow_redirects: (optional) Boolean. Set to True if POST/PUT/DELETE redirect following is allowed.
    :type allow_redirects: bool
    :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
    :param verify: (optional) whether the SSL cert will be verified. A CA_BUNDLE path can also be provided. Defaults to ``True``.
    :param stream: (optional) if ``False``, the response content will be immediately downloaded.
    :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response

    Usage::

      >>> import requests
      >>> req = requests.request('GET', 'http://httpbin.org/get')
      <Response [200]>
    """

参数列表
View Code

参数详细说明

    #参数
    requests.request

    - method:提交方式
    - url:提交地址
    - params:在url上传递的参数 GET http://www.oldboyedu.com
        params={"k1":"v1","k2":"v2"}
        requests.request(
            method="GET,
            url="http://www.oldboyedu.com",
            params={"k1":"v1","k2":"v2"}
            )
        http://www.oldboyedu.com?k1=v1&k2=v2

    - data: 在请求体里面传递的数据(字典,字节,字符串)(form表单提交以这种形式)

        requests.request(
            method="POST,
            url="http://www.oldboyedu.com",
            params={"k1":"v1","k2":"v2"}
            data={"user1":"alex","pwd":"123"}
            )

        #以这种形式传递会在请求头增加
            content-type:application/x-www-form-urlencoded 
            #这有什么作用
            在django里面
                request.POST是从request.body提取,就是根据application/x-www-form-urlencoded 判断,如果你修改了
                request.body有值 但是request.POST里面没有
        #会把数据封装成
            user=alex&pwd=123


    - json 在请求体里传递的数据

        requests.request(
            method="POST,
            url="http://www.oldboyedu.com",
            params={"k1":"v1","k2":"v2"}
            json={"user1":"alex","pwd":"123"}#当作字符串发送
            )

        #请求头
            content-type:application/json
        #会把数据封装成字符串
            {"user1":"alex","pwd":"123"}转字符串


        #这两种有什么区别:
            data={"user1":"alex","pwd":"123",“x":[1,2,3]}这种data不行
            json可以传递字典中嵌套字典时

    - headers请求头

        requests.request(
            method="POST,
            url="http://www.oldboyedu.com",
            params={"k1":"v1","k2":"v2"}
            json={"user1":"alex","pwd":"123"}
            headers={
            "Referer":"http://dig.chouti.com", #上次访问的地址
            "User-Agent":"...",是什么客户端发的
                }
            )
    - cookies cookies是怎么发给服务器端,是放在请求头里面

    - files 上传文件
    
        requests.request(
            method="POST,
            url="http://www.oldboyedu.com",
            files={
                "f1":open("a.txt",'rb"),
                或者"f2":("文件名",open("a.txt",'rb"))
            }
            )
    - auth 认证 用的比较多的是路由器如FTP等弹出个弹框,输入用户名和密码,。这种形式页面输入和输出代码都看不到http://httpbin.org
    - timeout 超时
    - param_timeout #连接和发送的超时param_timeout=(5,1)
    - allow_redirects:是否重写向跳转
    - proxies 权重或者代理

        requests.post(
            
            url="http://www.oldboyedu.com",
            proxys={
                "http":"http://4.19.128.5:8099"
            }
            )
        不会直接发oldboyedu,会先发代理,代理在发oldboyedu
    - stream get是先把东西下载到内在。stream一点一点下载。
    - vertify
    - cert 提供证书
        https服务器会先给客户端发一个证书。服务器加密,客户端解密
    
        request.get(
            url="https://"
            cert="fuck.pem"#自己做的cent,还有第三方的证书
        )
        request.get(
            url="https://"
            cert=("fuck.crt','xxx.key')
        )    


        request.get(
            url="https://"
            vertify=False,hulei证书
        )
        
        - session:用于保存客户端历史访问信息            


    - proixes
        #"http":"61.172.249.96:80"
        #"http":"root:123@61.172.249.96:80"

 五:请求头和请求体

    ##响应是也是有请求头和请求体

    不管是get,post 都需要发送HTTP请求,HTTP请求都包含请求头和请求体。
    请求头和请求体如何分割
    请求头\r\n\r\n请求体
        


    ######
    请求头
    Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
    Accept-Encoding:gzip, deflate, br
    Accept-Language:zh-CN,zh;q=0.9

    请求头上面每个KEY-VALUE是如何分割的。以\r\n分割

    \r\n\r\n
    请求体

    上面形成一行,以\r\n或者\r\n\rn分割一起发送

    ####如果是get请求,只会发请求头
    有个协议
    Http1.1 / GET  "/ 就是访问的URL" #协议
    Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
    Accept-Encoding:gzip, deflate, br
    Accept-Language:zh-CN,zh;q=0.9    

    #比如http://www.baidu.com?nid=1&v=1
    
    Http1.1 http://www.baidu.com?nid=1&v=1 GET  
    Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
    Accept-Encoding:gzip, deflate, br
    Accept-Language:zh-CN,zh;q=0.9    
    \r\n\r\n
    ###如果是POST请求

    比如http://www.baidu.com?nid=1&v=1
    
    Http1.1 / GET  
    Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
    Accept-Encoding:gzip, deflate, br
    Accept-Language:zh-CN,zh;q=0.9    

    \r\n\r\n
    nid=1&v=1
    

    ##响应

    普通都是这样
        响应头
            Cache-Control:no-cache
            Content-Encoding:gzip
        响应体
        <html>
        </html>
    如果是跳转(重定向)就没有响应体
        响应头
            Cache-Control:no-cache
            Content-Encoding:gzip
            location:http://www.baidu.com #多了个跳转地址
        可以获取响应码301/302
        或者通过响应头获取location
        只在响应有有location就要可以跳转

六:总结

#总结
    #get参数
    requests.get(
        url="http://www.baidu.com",
        params={"k1":"v1","k2":"v2"}, #传递的参数http://www.baidu.com?k1=v1&k2=v2
        cookies={"c1":"v1","c2":"v2"}, #cookie在请求头
        headers={
            "User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Mobile Safari/537.36" ,#模拟浏览器,有些网站会检查
            "Referer":"htt",  #浏览器上次访问的地址,有的网站会检查,如果不带,网站会认为是爬冲
        }
    )

    扩展
    1. HTTP请求
       --2. cookies
       - 请求放在请求头
       - 响应在响应头

    3. 重定向

        - 响应头

 

posted on 2018-06-03 17:23  shisanjun  阅读(209)  评论(0编辑  收藏  举报

导航