1)requests模块
一:requests 介绍
requests 是使用 Apache2 Licensed 许可证的 基于Python开发的HTTP 库,其在Python内置模块的基础上进行了高度的封装,
从而使得Pythoner进行网络请求时,变得美好了许多,使用requests可以轻而易举的完成浏览器可有的任何操作。
二:requests 安装
pip install requests
三:requests常用方法
response=requests.get(url) #以GET方式请求 response=requests.post(url) #以POST方式请求 response.text #获取内容 response.content response.encoding #设置编码格式 response.apparent_encoding#自动获取编码 response.code_status #200,404 #返回数据的状态码 response.cookies.get_dict() #获取cookies信息 requests.get(url,cookie={}) requests.get requests.post requests.delete requests.request( 'get',#post,get,delete... )
四:requests常用参数
def request(method, url, **kwargs): """Constructs and sends a :class:`Request <Request>`. :param method: method for the new :class:`Request` object. :param url: URL for the new :class:`Request` object. :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`. :param data: (optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`. :param json: (optional) json data to send in the body of the :class:`Request`. :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`. :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`. :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload. ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')`` or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers to add for the file. :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth. :param timeout: (optional) How long to wait for the server to send data before giving up, as a float, or a :ref:`(connect timeout, read timeout) <timeouts>` tuple. :type timeout: float or tuple :param allow_redirects: (optional) Boolean. Set to True if POST/PUT/DELETE redirect following is allowed. :type allow_redirects: bool :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy. :param verify: (optional) whether the SSL cert will be verified. A CA_BUNDLE path can also be provided. Defaults to ``True``. :param stream: (optional) if ``False``, the response content will be immediately downloaded. :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair. :return: :class:`Response <Response>` object :rtype: requests.Response Usage:: >>> import requests >>> req = requests.request('GET', 'http://httpbin.org/get') <Response [200]> """ 参数列表
参数详细说明
#参数 requests.request - method:提交方式 - url:提交地址 - params:在url上传递的参数 GET http://www.oldboyedu.com params={"k1":"v1","k2":"v2"} requests.request( method="GET, url="http://www.oldboyedu.com", params={"k1":"v1","k2":"v2"} ) http://www.oldboyedu.com?k1=v1&k2=v2 - data: 在请求体里面传递的数据(字典,字节,字符串)(form表单提交以这种形式) requests.request( method="POST, url="http://www.oldboyedu.com", params={"k1":"v1","k2":"v2"} data={"user1":"alex","pwd":"123"} ) #以这种形式传递会在请求头增加 content-type:application/x-www-form-urlencoded #这有什么作用 在django里面 request.POST是从request.body提取,就是根据application/x-www-form-urlencoded 判断,如果你修改了 request.body有值 但是request.POST里面没有 #会把数据封装成 user=alex&pwd=123 - json 在请求体里传递的数据 requests.request( method="POST, url="http://www.oldboyedu.com", params={"k1":"v1","k2":"v2"} json={"user1":"alex","pwd":"123"}#当作字符串发送 ) #请求头 content-type:application/json #会把数据封装成字符串 {"user1":"alex","pwd":"123"}转字符串 #这两种有什么区别: data={"user1":"alex","pwd":"123",“x":[1,2,3]}这种data不行 json可以传递字典中嵌套字典时 - headers请求头 requests.request( method="POST, url="http://www.oldboyedu.com", params={"k1":"v1","k2":"v2"} json={"user1":"alex","pwd":"123"} headers={ "Referer":"http://dig.chouti.com", #上次访问的地址 "User-Agent":"...",是什么客户端发的 } ) - cookies cookies是怎么发给服务器端,是放在请求头里面 - files 上传文件 requests.request( method="POST, url="http://www.oldboyedu.com", files={ "f1":open("a.txt",'rb"), 或者"f2":("文件名",open("a.txt",'rb")) } ) - auth 认证 用的比较多的是路由器如FTP等弹出个弹框,输入用户名和密码,。这种形式页面输入和输出代码都看不到http://httpbin.org - timeout 超时 - param_timeout #连接和发送的超时param_timeout=(5,1) - allow_redirects:是否重写向跳转 - proxies 权重或者代理 requests.post( url="http://www.oldboyedu.com", proxys={ "http":"http://4.19.128.5:8099" } ) 不会直接发oldboyedu,会先发代理,代理在发oldboyedu - stream get是先把东西下载到内在。stream一点一点下载。 - vertify - cert 提供证书 https服务器会先给客户端发一个证书。服务器加密,客户端解密 request.get( url="https://" cert="fuck.pem"#自己做的cent,还有第三方的证书 ) request.get( url="https://" cert=("fuck.crt','xxx.key') ) request.get( url="https://" vertify=False,hulei证书 ) - session:用于保存客户端历史访问信息 - proixes #"http":"61.172.249.96:80" #"http":"root:123@61.172.249.96:80"
五:请求头和请求体
##响应是也是有请求头和请求体 不管是get,post 都需要发送HTTP请求,HTTP请求都包含请求头和请求体。 请求头和请求体如何分割 请求头\r\n\r\n请求体 ###### 请求头 Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8 Accept-Encoding:gzip, deflate, br Accept-Language:zh-CN,zh;q=0.9 请求头上面每个KEY-VALUE是如何分割的。以\r\n分割 \r\n\r\n 请求体 上面形成一行,以\r\n或者\r\n\rn分割一起发送 ####如果是get请求,只会发请求头 有个协议 Http1.1 / GET "/ 就是访问的URL" #协议 Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8 Accept-Encoding:gzip, deflate, br Accept-Language:zh-CN,zh;q=0.9 #比如http://www.baidu.com?nid=1&v=1 Http1.1 http://www.baidu.com?nid=1&v=1 GET Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8 Accept-Encoding:gzip, deflate, br Accept-Language:zh-CN,zh;q=0.9 \r\n\r\n ###如果是POST请求 比如http://www.baidu.com?nid=1&v=1 Http1.1 / GET Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8 Accept-Encoding:gzip, deflate, br Accept-Language:zh-CN,zh;q=0.9 \r\n\r\n nid=1&v=1 ##响应 普通都是这样 响应头 Cache-Control:no-cache Content-Encoding:gzip 响应体 <html> </html> 如果是跳转(重定向)就没有响应体 响应头 Cache-Control:no-cache Content-Encoding:gzip location:http://www.baidu.com #多了个跳转地址 可以获取响应码301/302 或者通过响应头获取location 只在响应有有location就要可以跳转
六:总结
#总结 #get参数 requests.get( url="http://www.baidu.com", params={"k1":"v1","k2":"v2"}, #传递的参数http://www.baidu.com?k1=v1&k2=v2 cookies={"c1":"v1","c2":"v2"}, #cookie在请求头 headers={ "User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Mobile Safari/537.36" ,#模拟浏览器,有些网站会检查 "Referer":"htt", #浏览器上次访问的地址,有的网站会检查,如果不带,网站会认为是爬冲 } ) 扩展 1. HTTP请求 - 头 - 体 2. cookies - 请求放在请求头 - 响应在响应头 3. 重定向 - 响应头