requests(2):高级用法
1.文件上传
代码:
import requests files={'file':open('favicon.ioc','rb')}#将之前保存的图标上传 r=requests.post("http://httpbin.org/post",files=files) print(r.text)
运行结果:输出
{ "args": {}, "data": "", "files": { "file": "data:application/octet-stream;base64,AAABAAIAE......=" }, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "6665", "Content-Type": "multipart/form-data; boundary=33cdfa79c0730d8f84ea9bdb3501852d", "Host": "httpbin.org", "User-Agent": "python-requests/2.25.0", "X-Amzn-Trace-Id": "Root=1-5fd8b40b-2727d1dd67bcac6071e02b44" }, "json": null, "origin": "183.92.250.185", "url": "http://httpbin.org/post" }
2.Cookies
获取Cookies
代码:
import requests r=requests.get("http://www.baidu.com") print(r.cookies) for key,value in r.cookies.items(): print(key+'='+value)
运行结果:输出
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]> BDORZ=27315
(与书中不一致)
也可以直接使用Cookie来维持登录状态。
代码:
import requests headers={ 'Cookie':'_xsrf=1IbxapcP037H2q4hiTOHsEGg5Ep1mgUH; d_c0="AFCgLlaSaBCPTmv-uet83q--TEfaCHzj2jU=|1574675590"; _zap=f9eba4eb-c88d-4ea6-b641-76d6dd3aeb81; r_cap_id="NDU2ZDA5N2U4NGUwNDBhNjgwZmRkNWM2NjBmOGIwNjQ=|1607844299|849a3ebc0b6b228fb8ccbe7bc956d4ab87fbb7de"; cap_id="NzY4NGQzYWQwMzVjNDZiNmEwZjc5N2M0YjFkMjE1ZWU=|1607844299|79454ea6ae9b184b66107ef5499f73a76d378dcf"; l_cap_id="YjMyMDNhYjQxYmJmNDQxMGFmNGQ5ZGI5MjlhMWQ1NWU=|1607844299|6ff49d12514843f595e6452ec70cd8c79904b72f"; auth_type=d2VjaGF0|1607844323|de686e370b55abea1d50ed497d5e75c1f11ceaf2; token="NDBfLW1xdUp5cV9uWDF5RHFBRHZGb01TRmlGSnc0RlBzaU96ekFOZXQ4Sl9zX3VBQ1laMW1NckNCcDZyWW9JaF9FOU5Za1poNENHYXQ4QUZfM3BQZEtxQ0xneFlWbTQ4NGo1akpNOG1PLXdnVWc=|1607844323|e244c2a10cc41ef5398b89afc7f132ef2f33789e"; client_id="bzNwMi1qa1RCSDh1TWQ1cGtlRWhnWF9TSVI1MA==|1607844323|ba8ea5aacdf5e242254be17b0a39013db27f02b7"; capsion_ticket="2|1:0|10:1607845046|14:capsion_ticket|44:MTM1YzU3MWI1MTcxNDQ2YTgyOTA0ZmRkNDMwNTlmMTU=|18e09e534dcd722a06d6c1cabb4256bca9429743fa8dff7847a12de33d44210d"; z_c0="2|1:0|10:1607845086|4:z_c0|92:Mi4xWV9qN0Z3QUFBQUFBVUtBdVZwSm9FQ2NBQUFDRUFsVk4zbEg5WHdDWEdaM01FUDZNV2xfa0U0V3hnejBWNXY0Vm13|5d849402605e16c9cfde29501841cd672a237595b0eada30f6ace2326d9a3885"; tst=r; Hm_lvt_98beee57fd2ef70ccdd5ca52b9740c49=1606400831,1607224141,1607844250,1608038711; Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49=1608038711; SESSIONID=HLfiVBjVfxVpMa6fEvx5rioAGM3bCCHuGAEVoPObzXq; JOID=VFoSBkJRNcGGFuW9alJO0RA7gLN5HXyl40-j0j44SZL0V4uPCWeLF9od5LpkY1japKhIBVOvrrfr_V-s5S8xMC4=; osd=VFgWAEpRN8WAHuW_blRG0RI_hrt5H3ij60-h1jgwSZDwUYOPC2ONH9of4LxsY1reoqBIB1epprfp-Vmk5S01NiY=; KLBRSID=0a401b23e8a71b70de2f4b37f5b4e379|1608038712|1608038704', 'Host':'www.zhihu.com', 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.60' } r=requests.get('https://www.zhihu.com',headers=headers) print(r.text)
运行结果:输出登陆后的结果。
或者直接设置Cookie参数,但是较繁琐。
代码:
import requests cookies='_xsrf=1IbxapcP037H2q4hiTOHsEGg5Ep1mgUH; d_c0="AFCgLlaSaBCPTmv-uet83q--TEfaCHzj2jU=|1574675590"; _zap=f9eba4eb-c88d-4ea6-b641-76d6dd3aeb81; r_cap_id="NDU2ZDA5N2U4NGUwNDBhNjgwZmRkNWM2NjBmOGIwNjQ=|1607844299|849a3ebc0b6b228fb8ccbe7bc956d4ab87fbb7de"; cap_id="NzY4NGQzYWQwMzVjNDZiNmEwZjc5N2M0YjFkMjE1ZWU=|1607844299|79454ea6ae9b184b66107ef5499f73a76d378dcf"; l_cap_id="YjMyMDNhYjQxYmJmNDQxMGFmNGQ5ZGI5MjlhMWQ1NWU=|1607844299|6ff49d12514843f595e6452ec70cd8c79904b72f"; auth_type=d2VjaGF0|1607844323|de686e370b55abea1d50ed497d5e75c1f11ceaf2; token="NDBfLW1xdUp5cV9uWDF5RHFBRHZGb01TRmlGSnc0RlBzaU96ekFOZXQ4Sl9zX3VBQ1laMW1NckNCcDZyWW9JaF9FOU5Za1poNENHYXQ4QUZfM3BQZEtxQ0xneFlWbTQ4NGo1akpNOG1PLXdnVWc=|1607844323|e244c2a10cc41ef5398b89afc7f132ef2f33789e"; client_id="bzNwMi1qa1RCSDh1TWQ1cGtlRWhnWF9TSVI1MA==|1607844323|ba8ea5aacdf5e242254be17b0a39013db27f02b7"; capsion_ticket="2|1:0|10:1607845046|14:capsion_ticket|44:MTM1YzU3MWI1MTcxNDQ2YTgyOTA0ZmRkNDMwNTlmMTU=|18e09e534dcd722a06d6c1cabb4256bca9429743fa8dff7847a12de33d44210d"; z_c0="2|1:0|10:1607845086|4:z_c0|92:Mi4xWV9qN0Z3QUFBQUFBVUtBdVZwSm9FQ2NBQUFDRUFsVk4zbEg5WHdDWEdaM01FUDZNV2xfa0U0V3hnejBWNXY0Vm13|5d849402605e16c9cfde29501841cd672a237595b0eada30f6ace2326d9a3885"; tst=r; SESSIONID=Lw1oXe6qnCJ82o9ksVXjfwe0DNCniiBKX34UfmTXuLx; JOID=WlgTCk4yx7ADY72OYDOyr5tD2o58draOYDbQvh5RjPw0Xd-_DThCal1ms4djtwCvTf2KMHI_GjSp_ueD_caBq74=; osd=UV8RCk05wLIDYLaJYjOxpJxB2o13cbSOYz3XvB5Sh_s2Xdy0CjpCaVZhsYdgvAetTf6BN3A_GT-u_OeA9sGDq70=; Hm_lvt_98beee57fd2ef70ccdd5ca52b9740c49=1608038737,1608092677,1608092706,1608092743; Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49=1608092743; KLBRSID=76ae5fb4fba0f519d97e594f1cef9fab|1608093479|1608092673' jar=requests.cookies.RequestsCookieJar()#新建一个RequestsCookieJar对象 headers={ 'Host':'www.zhihu.com', 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.60' } for cookie in cookies.split(';'):#使用split()方法分割 key,value=cookie.split('=',1) jar.set(key,value)#用set()方法设置好每个Cookie的key和value r=requests.get('https://www.zhihu.com',cookies=jar,headers=headers)#传入cookies参数 print(r.text)
运行结果:与上面一致
3.会话维持
使用Session对象维持同一个会话。
代码:
import requests requests.get('http://httpbin.org/cookies/set/number/123456')#设置一个cookies,内容是123456 r=requests.get('http://httpbin.org/cookies') print(r.text)
运行结果:输出
{ "cookies": {} }
从结果来看,这样并不能获取到设置的cookies。
如果使用Session对象。
代码:
import requests s=requests.Session()#创建一个Session对象 s.get('http://httpbin.org/cookies/set/number/123456') r=s.get('http://httpbin.org/cookies') print(r.text)
运行结果:输出
{ "cookies": { "number": "123456" } }
可以成功获取。
4.SLL证书验证
如果一个网站没有被官方CA机构信任,会出现证书错误的结果,以12306为示例(其实12306现在已经没有这个问题了),可以将verify参数设置为False。
代码:
import requests response=requests.get('https://www.12306.cn',verify=False) print(response.status_code)
但是运行结果会输出警告,它建议我们给他指定证书,我们可以设置忽略警告或者捕获警告到日志。
代码:
import requests import logging from requests.packages import urllib3 urllib3.disable_warnings()#忽略警告 #或使用logging.captureWarnings(True) #捕获警告到日志 response=requests.get('https://www.12306.cn',verify=False) print(response.status_code)
我们也可以指定一个本地证书用作客户端证书。
代码:
import requests response=requests.get('https://www.12306.cn',cert=('/path/server.crrt','/path/key')) #包含两个文件路径的元组,key需要是解密的 print(response.status_code)
5.代理设置
使用proxies参数设置代理。
代码:
import requests proxies={ 'http':'http://10.10.1.10:3128', 'https':'http://10.10.1.10:1080' } #若代理需要使用HTTP Basic Auth可以使用类似http://user:password@:port的语法 #proxies={ # 'http':'http://user:password@10.10.1.10:3128/', #} #还支持SOCKS协议代理 #proxies={ # 'http':'socks5://user:password@:port', # 'https':'socks5://user:password@:port' #} requests.get("https://www.taobao.com",proxies=proxies)
6.超时设置
使用timeout参数。
代码:
import requests r=requests.get("http://www.taobao.com",timeout=1) print(r.status_code)
这里将请求时间设置为1秒,如果1秒内没有响应,就抛出异常。
实际上,请求分为两个阶段连接和读取,timeout设置将用于这两个时间阶段的总和。
可以传入一个元组来分别指定这两个阶段的时间。
r=requests.get("http://www.taobao.com",timeout=(5,30))
如果不设置,直接留空或者设置为None。
7.身份验证
有的网站可能需要身份验证。可以直接传入一个元组给auth参数。
import requests r=requests.get('http://locahost:5000',auth=('username','password')) print(r.status_code)
8.Prepared Request
requests中的数据结构就叫Prepared Request。
代码:
from requests import Request,Session url='http://httpbin.org/post' data={ 'name':'germey' } headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.60' } s=Session() req=Request('POST',url,data=data,headers=headers)#构造了一个Request对象 prepped=s.prepare_request(req)#使用prepare_request()将其转换为一个Prepared Request对象 r=s.send(prepped)#调用send发送 print(r.text)
运行结果:输出
{ "args": {}, "data": "", "files": {}, "form": { "name": "germey" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "11", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.60", "X-Amzn-Trace-Id": "Root=1-5fd9da60-2467cc031aa12ebd39daf72d" }, "json": null, "origin": "183.92.251.74", "url": "http://httpbin.org/post" }
同样达到POST请求效果。
参考用书《python3网络爬虫开发实战》