requests 库
1. requests 简介
2. get 请求
3. post 请求
4. 其他请求方法
5. 高级用法
1. requests 简介
Python 中有多种库可以用来处理 http 请求,比如 urllib、requests 库等。
requests VS urllib:
- urllib 和 urllib2 是相互独立的模块,python3.0 以上把 urllib 和 urllib2 合并成一个库了,requests 库使用了 urllib3。
- requests 库的口号是“HTTP For Humans”(为人类使用 HTTP 而生),因此比起 urllib 包的繁琐,requests 库特别简洁和容易理解。
2. get 请求
1 # 使用 get 方法访问网页资源 2 >>> resp = requests.get("http://www.baidu.com") 3 4 # 返回响应对象 5 >>> resp 6 <Response [200]> 7 8 # 状态码 9 >>> resp.status_code 10 200 11 12 # 请求地址 13 >>> resp.url 14 'http://www.baidu.com/' 15 16 # 用 resp.encoding 对 resp.content 进行解码后的字符串 17 >>> print(resp.text[:100]) 18 <!DOCTYPE html> 19 <!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charse 20 21 # 请求所使用的编码 22 >>> resp.encoding 23 'ISO-8859-1' 24 25 # 以字节方式获取的响应内容 26 >>> print(resp.content[:100]) 27 b'<!DOCTYPE html>\r\n<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charse'
get 方法带请求参数:
1 # 方式1:使用字典的请求参数 2 >>> payload = {"key1":"value1", "key2":"value2"} 3 >>> resp = requests.get("http://httpbin.org/get", params=payload) 4 >>> print(resp.text) 5 { 6 "args": { 7 "key1": "value1", 8 "key2": "value2" 9 }, 10 "headers": { 11 "Accept": "*/*", 12 "Accept-Encoding": "gzip, deflate", 13 "Host": "httpbin.org", 14 "User-Agent": "python-requests/2.23.0", 15 "X-Amzn-Trace-Id": "Root=1-5fb685f6-23b6c5e864d8dc4e41e8de27" 16 }, 17 "origin": "113.116.22.63", 18 "url": "http://httpbin.org/get?key1=value1&key2=value2" 19 } 20 21 22 # 方式2:使用字典+列表的请求参数 23 >>> payload = {"key1":"value1", "key2":["value2", "value3"]} 24 >>> resp = requests.get("http://httpbin.org/get", params=payload) 25 >>> resp.url 26 'http://httpbin.org/get?key1=value1&key2=value2&key2=value3'
3. post 请求
post 请求方法有两种方式:
- 表单提交:提交字典或二维元组的数据
- 非表单提交:提交 json 格式的数据
示例一:表单提交的两种方式
1 # 方式一:使用字典 2 >>> resp = requests.post("http://httpbin.org/post", data={"key": "value"}) 3 >>> print(resp.text) 4 { 5 "args": {}, 6 "data": "", 7 "files": {}, 8 "form": { 9 "key": "value" 10 }, 11 "headers": { 12 "Accept": "*/*", 13 "Accept-Encoding": "gzip, deflate", 14 "Content-Length": "9", 15 "Content-Type": "application/x-www-form-urlencoded", 16 "Host": "httpbin.org", 17 "User-Agent": "python-requests/2.23.0", 18 "X-Amzn-Trace-Id": "Root=1-5fb67ae5-6c15961202281a1d70522539" 19 }, 20 "json": null, 21 "origin": "113.116.22.63", 22 "url": "http://httpbin.org/post" 23 } 24 25 26 # 方式二:使用二维元组 27 >>> payload = (('key1', 'value1'), ('key1', 'value2')) 28 >>> resp = requests.post("http://httpbin.org/post", data=payload) 29 >>> print(resp.text) 30 { 31 "args": {}, 32 "data": "", 33 "files": {}, 34 "form": { 35 "key1": [ 36 "value1", 37 "value2" 38 ] 39 }, 40 "headers": { 41 "Accept": "*/*", 42 "Accept-Encoding": "gzip, deflate", 43 "Content-Length": "23", 44 "Content-Type": "application/x-www-form-urlencoded", 45 "Host": "httpbin.org", 46 "User-Agent": "python-requests/2.23.0", 47 "X-Amzn-Trace-Id": "Root=1-5fb67b74-716bca001516d46950d0d762" 48 }, 49 "json": null, 50 "origin": "113.116.22.63", 51 "url": "http://httpbin.org/post" 52 }
示例二:非表单提交
1 import requests 2 3 # 方式1:使用json.dumps 4 import json 5 6 url = 'http://httpbin.org/post' 7 payload = {'some': 'data'} 8 9 resp = requests.post(url, data=json.dumps(payload)) 10 >>> print(resp.text) 11 { 12 "args": {}, 13 "data": "{\"some\": \"data\"}", 14 "files": {}, 15 "form": {}, 16 "headers": { 17 "Accept": "*/*", 18 "Accept-Encoding": "gzip, deflate", 19 "Content-Length": "16", 20 "Host": "httpbin.org", 21 "User-Agent": "python-requests/2.23.0", 22 "X-Amzn-Trace-Id": "Root=1-5fb67c87-78a1dd216e987f0226d5b97a" 23 }, 24 "json": { 25 "some": "data" 26 }, 27 "origin": "113.116.22.63", 28 "url": "http://httpbin.org/post" 29 } 30 31 32 # 方式2:使用内置参数 json 33 url = 'http://httpbin.org/post' 34 payload = {'some': 'data'} 35 36 resp = requests.post(url, json=payload)
4. 其他请求方法
1 # put:从客户端向服务器传送的数据取代指定的文档的内容 2 >>> r = requests.put('http://httpbin.org/put', data={'key':'value'}) 3 >>> print("put:", r.text) 4 put: { 5 "args": {}, 6 "data": "", 7 "files": {}, 8 "form": { 9 "key": "value" 10 }, 11 "headers": { 12 "Accept": "*/*", 13 "Accept-Encoding": "gzip, deflate", 14 "Content-Length": "9", 15 "Content-Type": "application/x-www-form-urlencoded", 16 "Host": "httpbin.org", 17 "User-Agent": "python-requests/2.23.0", 18 "X-Amzn-Trace-Id": "Root=1-5fb6808c-7842d5b450d1777139efab8e" 19 }, 20 "json": null, 21 "origin": "113.116.22.63", 22 "url": "http://httpbin.org/put" 23 } 24 25 # delete:请求服务器删除指定的页面 26 >>> r = requests.delete('http://httpbin.org/delete') 27 >>> print("delete:", r.text) 28 delete: { 29 "args": {}, 30 "data": "", 31 "files": {}, 32 "form": {}, 33 "headers": { 34 "Accept": "*/*", 35 "Accept-Encoding": "gzip, deflate", 36 "Content-Length": "0", 37 "Host": "httpbin.org", 38 "User-Agent": "python-requests/2.23.0", 39 "X-Amzn-Trace-Id": "Root=1-5fb6808e-042c04c61a6257820e4ff404" 40 }, 41 "json": null, 42 "origin": "113.116.22.63", 43 "url": "http://httpbin.org/delete" 44 } 45 46 # head:类似于get请求,只不过返回的响应中没有具体的内容,用于获取报头 47 >>> r = requests.head('http://httpbin.org/get') 48 >>> print("head:", r.text) 49 head: 50 >>> print(r.headers) 51 {'Date': 'Thu, 19 Nov 2020 14:26:23 GMT', 'Content-Type': 'application/json', 'Content-Length': '306', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'} 52 53 # options:允许客户端查看服务器的性能 54 >>> r = requests.options('http://httpbin.org/get') 55 >>> print("options:", r.text) 56 options: 57 >>> print(r.headers) 58 {'Date': 'Thu, 19 Nov 2020 14:26:24 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Content-Length': '0', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Allow': 'OPTIONS, HEAD, GET', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Methods': 'GET, POST, PUT, DELETE, PATCH, OPTIONS', 'Access-Control-Max-Age': '3600'}
5. 高级用法
5.1 获取 json 格式的响应数据
1 r = requests.get('https://api.github.com/events') 2 print(r.json()) # (将json数据转成python对象)本例返回一个列表,里面是一个字典元素 3 print(type(r.json())) # List
5.2 获取原始的 socket 响应数据
1 >>> resp = requests.get("https://api.github.com/events", stream=True) 2 >>> print(type(resp.raw)) 3 <class 'urllib3.response.HTTPResponse'> 4 >>> print(resp.raw) 5 <urllib3.response.HTTPResponse object at 0x000001E3F2A0C2B0> 6 >>> print(resp.raw.read()) # 获取流格式的响应数据 7 b'\x1f\x8b\x08\x00\x00\x00\x00\ ......
将数据流保存到文件中:
1 >>> resp = requests.get("https://api.github.com/events", stream=True) 2 >>> with open("e:\\file.txt", "wb") as f: 3 ... for chunk in resp.iter_content(1000): 4 ... f.write(chunk) 5 ... 6 2748 7 2853 8 4761 9 4835 10 4691 11 4066 12 5545 13 7525 14 4489 15 2732 16 3259 17 2115 18 >>> with open("e:\\file.txt") as f: 19 ... print(f.read(50)) 20 ... 21 [{"id":"14250730635","type":"PushEvent","actor":{"...]
5.3 设置请求头
1 >>> url = "http://api.github.com/some/endpoint" 2 >>> headers = {"user-agent": "my-app/0.0.1"} # 增加浏览器及版本信息 3 >>> r = requests.get(url, headers=headers)
5.4 上传文件
方式 1:
1 import requests 2 3 url = 'http://httpbin.org/post' 4 files = {'file': open('e:\\test.xlsx', 'rb')} 5 6 r = requests.post(url, files=files) 7 print(r.text)
方式 2:显式设置文件名、文件类型和请求头
1 import requests 2 3 url = 'http://httpbin.org/post' 4 files = {'file': ('report.xls', open('e:\\test.xlsx', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})} 5 6 r = requests.post(url, files=files) 7 print(r.text)
建议用二进制模式(binary mode)打开文件。这是因为 requests 可能会试图为你提供 Content-Length header,在它这样做的时候,这个字段值会被设为文件的字节数(bytes)。如果用文本模式(text mode)打开文件,就可能会发生错误。
5.5 状态码
1 import requests 2 3 r = requests.get('http://httpbin.org/get') 4 print(r.status_code) # 200 5 print(r.status_code == requests.codes.ok) # 状态码判断:True 6 7 # 非200时抛出异常代码 8 print(r.raise_for_status()) # None 9 10 r = requests.get('https://www.cnblogs.com/dinex.indd') 11 print(r.raise_for_status()) # 抛异常:...404 Client Error: Not Found...
5.6 获取响应头信息
1 import requests 2 3 r = requests.get('https://api.github.com/events') 4 print(r.headers) 5 print(r.headers['Content-Type']) 6 print(r.headers.get('content-type'))
5.7 获取/发送 Cookie
获取 Cookie:
1 import requests 2 3 url = 'https://www.baidu.com' 4 r = requests.get(url) 5 print(r.cookies) # 存储在字典里 # <RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]> 6 for k, v in r.cookies.items(): 7 print(k, v) # BDORZ 27315
发送 Cookie:
1 import requests 2 3 url = 'http://httpbin.org/cookies' 4 cookies = dict(cookies_are='working') 5 6 r = requests.get(url, cookies=cookies) 7 print(r.text) # {"cookies":{"cookies_are":"working"}}
设定跨多个路径的 Cookie:
1 import requests 2 3 jar = requests.cookies.RequestsCookieJar() 4 jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies') 5 jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere') 6 7 url = 'http://httpbin.org/cookies' 8 r = requests.get(url, cookies=jar) 9 print(r.text) # {"cookies":{"tasty_cookie":"yum"}}
5.8 请求超时
1 import requests 2 3 requests.get('http://github.com', timeout=0.001) # 抛超时的异常
5.9 获取重定向响应数据
1 import requests 2 3 r = requests.head('http://github.com', allow_redirects=True) 4 print(r.url) # 最终访问的url:'https://github.com/' 5 print(r.history[0].url) # 跳转前的url:http://github.com/ 6 print(r.history) # 历史响应对象的列表 # [<Response [301]>]
禁止重定向:
1 import requests 2 3 r = requests.get('http://github.com', allow_redirects=False) 4 print(r.status_code) # 301 5 print(r.history) # []
5.10 Session
会话对象让你能够跨请求保持某些参数,它也会在同一个 Session 实例发出的所有请求之间保持 Cookie。
1 import requests 2 3 s = requests.Session() 4 5 # 跨请求主体去请求 6 s.get('http://httpbin.org/cookies/set/sessioncookie/123456789') 7 # 从上一个请求中获得的cookie信息,会自动的发给下一次请求的网址。 8 r = s.get("http://httpbin.org/cookies") 9 10 print(r.text) # {"cookies": {"sessioncookie": "123456789"}}
在会话中添加默认请求头配置:
1 import requests 2 3 s = requests.Session() 4 s.auth = ('username', 'passwd') 5 # 添加的一个默认header信息 6 s.headers.update({'x-test': 'true'}) 7 8 # both 'x-test' and 'x-test2' are sent 9 r=s.get('http://httpbin.org/headers', headers={'x-test2': 'true'}) 10 print(r.text) 11 12 # both 'x-test' and 'x-test3' are sent 13 r=s.get('http://httpbin.org/headers', headers={'x-test3': 'true'}) 14 print(r.text)