requests(2):高级用法

1.文件上传

代码:

import requests

files={'file':open('favicon.ioc','rb')}#将之前保存的图标上传
r=requests.post("http://httpbin.org/post",files=files)
print(r.text)

运行结果:输出

{
  "args": {},
  "data": "",
  "files": {
    "file": "data:application/octet-stream;base64,AAABAAIAE......="
  },
  "form": {},
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Content-Length": "6665",
    "Content-Type": "multipart/form-data; boundary=33cdfa79c0730d8f84ea9bdb3501852d",
    "Host": "httpbin.org",
    "User-Agent": "python-requests/2.25.0",
    "X-Amzn-Trace-Id": "Root=1-5fd8b40b-2727d1dd67bcac6071e02b44"
  },
  "json": null,
  "origin": "183.92.250.185",
  "url": "http://httpbin.org/post"
}

 

2.Cookies

获取Cookies

代码:

import requests

r=requests.get("http://www.baidu.com")
print(r.cookies)
for key,value in r.cookies.items():
    print(key+'='+value)

运行结果:输出

<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
BDORZ=27315

(与书中不一致)

 

也可以直接使用Cookie来维持登录状态。

代码:

import requests

headers={
  'Cookie':'_xsrf=1IbxapcP037H2q4hiTOHsEGg5Ep1mgUH; d_c0="AFCgLlaSaBCPTmv-uet83q--TEfaCHzj2jU=|1574675590"; _zap=f9eba4eb-c88d-4ea6-b641-76d6dd3aeb81; r_cap_id="NDU2ZDA5N2U4NGUwNDBhNjgwZmRkNWM2NjBmOGIwNjQ=|1607844299|849a3ebc0b6b228fb8ccbe7bc956d4ab87fbb7de"; cap_id="NzY4NGQzYWQwMzVjNDZiNmEwZjc5N2M0YjFkMjE1ZWU=|1607844299|79454ea6ae9b184b66107ef5499f73a76d378dcf"; l_cap_id="YjMyMDNhYjQxYmJmNDQxMGFmNGQ5ZGI5MjlhMWQ1NWU=|1607844299|6ff49d12514843f595e6452ec70cd8c79904b72f"; auth_type=d2VjaGF0|1607844323|de686e370b55abea1d50ed497d5e75c1f11ceaf2; token="NDBfLW1xdUp5cV9uWDF5RHFBRHZGb01TRmlGSnc0RlBzaU96ekFOZXQ4Sl9zX3VBQ1laMW1NckNCcDZyWW9JaF9FOU5Za1poNENHYXQ4QUZfM3BQZEtxQ0xneFlWbTQ4NGo1akpNOG1PLXdnVWc=|1607844323|e244c2a10cc41ef5398b89afc7f132ef2f33789e"; client_id="bzNwMi1qa1RCSDh1TWQ1cGtlRWhnWF9TSVI1MA==|1607844323|ba8ea5aacdf5e242254be17b0a39013db27f02b7"; capsion_ticket="2|1:0|10:1607845046|14:capsion_ticket|44:MTM1YzU3MWI1MTcxNDQ2YTgyOTA0ZmRkNDMwNTlmMTU=|18e09e534dcd722a06d6c1cabb4256bca9429743fa8dff7847a12de33d44210d"; z_c0="2|1:0|10:1607845086|4:z_c0|92:Mi4xWV9qN0Z3QUFBQUFBVUtBdVZwSm9FQ2NBQUFDRUFsVk4zbEg5WHdDWEdaM01FUDZNV2xfa0U0V3hnejBWNXY0Vm13|5d849402605e16c9cfde29501841cd672a237595b0eada30f6ace2326d9a3885"; tst=r; Hm_lvt_98beee57fd2ef70ccdd5ca52b9740c49=1606400831,1607224141,1607844250,1608038711; Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49=1608038711; SESSIONID=HLfiVBjVfxVpMa6fEvx5rioAGM3bCCHuGAEVoPObzXq; JOID=VFoSBkJRNcGGFuW9alJO0RA7gLN5HXyl40-j0j44SZL0V4uPCWeLF9od5LpkY1japKhIBVOvrrfr_V-s5S8xMC4=; osd=VFgWAEpRN8WAHuW_blRG0RI_hrt5H3ij60-h1jgwSZDwUYOPC2ONH9of4LxsY1reoqBIB1epprfp-Vmk5S01NiY=; KLBRSID=0a401b23e8a71b70de2f4b37f5b4e379|1608038712|1608038704',
  'Host':'www.zhihu.com',
  'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.60'
}
r=requests.get('https://www.zhihu.com',headers=headers)
print(r.text)

运行结果:输出登陆后的结果。

 

或者直接设置Cookie参数,但是较繁琐。

代码:

import requests

cookies='_xsrf=1IbxapcP037H2q4hiTOHsEGg5Ep1mgUH; d_c0="AFCgLlaSaBCPTmv-uet83q--TEfaCHzj2jU=|1574675590"; _zap=f9eba4eb-c88d-4ea6-b641-76d6dd3aeb81; r_cap_id="NDU2ZDA5N2U4NGUwNDBhNjgwZmRkNWM2NjBmOGIwNjQ=|1607844299|849a3ebc0b6b228fb8ccbe7bc956d4ab87fbb7de"; cap_id="NzY4NGQzYWQwMzVjNDZiNmEwZjc5N2M0YjFkMjE1ZWU=|1607844299|79454ea6ae9b184b66107ef5499f73a76d378dcf"; l_cap_id="YjMyMDNhYjQxYmJmNDQxMGFmNGQ5ZGI5MjlhMWQ1NWU=|1607844299|6ff49d12514843f595e6452ec70cd8c79904b72f"; auth_type=d2VjaGF0|1607844323|de686e370b55abea1d50ed497d5e75c1f11ceaf2; token="NDBfLW1xdUp5cV9uWDF5RHFBRHZGb01TRmlGSnc0RlBzaU96ekFOZXQ4Sl9zX3VBQ1laMW1NckNCcDZyWW9JaF9FOU5Za1poNENHYXQ4QUZfM3BQZEtxQ0xneFlWbTQ4NGo1akpNOG1PLXdnVWc=|1607844323|e244c2a10cc41ef5398b89afc7f132ef2f33789e"; client_id="bzNwMi1qa1RCSDh1TWQ1cGtlRWhnWF9TSVI1MA==|1607844323|ba8ea5aacdf5e242254be17b0a39013db27f02b7"; capsion_ticket="2|1:0|10:1607845046|14:capsion_ticket|44:MTM1YzU3MWI1MTcxNDQ2YTgyOTA0ZmRkNDMwNTlmMTU=|18e09e534dcd722a06d6c1cabb4256bca9429743fa8dff7847a12de33d44210d"; z_c0="2|1:0|10:1607845086|4:z_c0|92:Mi4xWV9qN0Z3QUFBQUFBVUtBdVZwSm9FQ2NBQUFDRUFsVk4zbEg5WHdDWEdaM01FUDZNV2xfa0U0V3hnejBWNXY0Vm13|5d849402605e16c9cfde29501841cd672a237595b0eada30f6ace2326d9a3885"; tst=r; SESSIONID=Lw1oXe6qnCJ82o9ksVXjfwe0DNCniiBKX34UfmTXuLx; JOID=WlgTCk4yx7ADY72OYDOyr5tD2o58draOYDbQvh5RjPw0Xd-_DThCal1ms4djtwCvTf2KMHI_GjSp_ueD_caBq74=; osd=UV8RCk05wLIDYLaJYjOxpJxB2o13cbSOYz3XvB5Sh_s2Xdy0CjpCaVZhsYdgvAetTf6BN3A_GT-u_OeA9sGDq70=; Hm_lvt_98beee57fd2ef70ccdd5ca52b9740c49=1608038737,1608092677,1608092706,1608092743; Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49=1608092743; KLBRSID=76ae5fb4fba0f519d97e594f1cef9fab|1608093479|1608092673'
jar=requests.cookies.RequestsCookieJar()#新建一个RequestsCookieJar对象
headers={
  'Host':'www.zhihu.com',
  'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.60'
}
for cookie in cookies.split(';'):#使用split()方法分割
    key,value=cookie.split('=',1)
    jar.set(key,value)#用set()方法设置好每个Cookie的key和value
r=requests.get('https://www.zhihu.com',cookies=jar,headers=headers)#传入cookies参数
print(r.text)

运行结果:与上面一致

 

3.会话维持

使用Session对象维持同一个会话

代码:

import requests

requests.get('http://httpbin.org/cookies/set/number/123456')#设置一个cookies,内容是123456
r=requests.get('http://httpbin.org/cookies')
print(r.text)

运行结果:输出

{
  "cookies": {}
}

从结果来看,这样并不能获取到设置的cookies。

 

如果使用Session对象

代码:

import requests

s=requests.Session()#创建一个Session对象
s.get('http://httpbin.org/cookies/set/number/123456')
r=s.get('http://httpbin.org/cookies')
print(r.text)

运行结果:输出

{
  "cookies": {
    "number": "123456"
  }
}

可以成功获取。

 

4.SLL证书验证

如果一个网站没有被官方CA机构信任,会出现证书错误的结果,以12306为示例(其实12306现在已经没有这个问题了),可以将verify参数设置为False。

代码:

import requests

response=requests.get('https://www.12306.cn',verify=False)
print(response.status_code)

 

但是运行结果会输出警告,它建议我们给他指定证书,我们可以设置忽略警告或者捕获警告到日志。

代码:

import requests
import logging
from requests.packages import urllib3

urllib3.disable_warnings()#忽略警告
#或使用logging.captureWarnings(True)
#捕获警告到日志

response=requests.get('https://www.12306.cn',verify=False)
print(response.status_code)

 

我们也可以指定一个本地证书用作客户端证书。

代码:

import requests

response=requests.get('https://www.12306.cn',cert=('/path/server.crrt','/path/key'))
#包含两个文件路径的元组,key需要是解密的
print(response.status_code)
 

 

5.代理设置

使用proxies参数设置代理。

代码:

import requests

proxies={
  'http':'http://10.10.1.10:3128',
  'https':'http://10.10.1.10:1080'
}

#若代理需要使用HTTP Basic Auth可以使用类似http://user:password@:port的语法
#proxies={
#  'http':'http://user:password@10.10.1.10:3128/',
#}

#还支持SOCKS协议代理
#proxies={
#  'http':'socks5://user:password@:port',
#  'https':'socks5://user:password@:port'
#}

requests.get("https://www.taobao.com",proxies=proxies)

 

6.超时设置

使用timeout参数。

代码:

import requests

r=requests.get("http://www.taobao.com",timeout=1)
print(r.status_code)

这里将请求时间设置为1秒,如果1秒内没有响应,就抛出异常。

实际上,请求分为两个阶段连接和读取,timeout设置将用于这两个时间阶段的总和。

可以传入一个元组来分别指定这两个阶段的时间。

r=requests.get("http://www.taobao.com",timeout=(5,30))

如果不设置,直接留空或者设置为None。

 

7.身份验证

有的网站可能需要身份验证。可以直接传入一个元组给auth参数。

import requests

r=requests.get('http://locahost:5000',auth=('username','password'))
print(r.status_code)

 

8.Prepared Request

requests中的数据结构就叫Prepared Request。

代码:

from requests import Request,Session

url='http://httpbin.org/post'
data={
  'name':'germey'
}
headers={
  'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.60'
}
s=Session()
req=Request('POST',url,data=data,headers=headers)#构造了一个Request对象
prepped=s.prepare_request(req)#使用prepare_request()将其转换为一个Prepared Request对象
r=s.send(prepped)#调用send发送
print(r.text)

运行结果:输出

{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "name": "germey"
  },
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Content-Length": "11",
    "Content-Type": "application/x-www-form-urlencoded",
    "Host": "httpbin.org",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.60",
    "X-Amzn-Trace-Id": "Root=1-5fd9da60-2467cc031aa12ebd39daf72d"
  },
  "json": null,
  "origin": "183.92.251.74",
  "url": "http://httpbin.org/post"
}

同样达到POST请求效果。

 

参考用书《python3网络爬虫开发实战》

posted @ 2020-12-16 18:18  Hao_ran  阅读(271)  评论(0编辑  收藏  举报