Requests库详解

(一)简介

requests 是python中比较方便的HTTP库,比urllib方便很多,我们以一个简单的实例来看看:

 1 import requests
 2 
 3 response = requests.get('https://www.baidu.com/')
 4 print(type(response))
 5 print(response.status_code)
 6 print(type(response.text))
 7 print(response.text)
 8 print(response.cookies)
 9 
10 
11 》》》输出:
12 <class 'requests.models.Response'>
13 200
14 <class 'str'>
15 具体html信息
16 <RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>

相比urllib是不是简单很多,其各种属性跟urllib也差不多,接下来我们具体看如何发送各种请求。

 

(二)基本GET请求

  1.基本写法:

import requests

response = requests.get('http://httpbin.org/get')  #只需在末尾加上get即可完成get请求
print(response.text)

》》》输出:
{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.19.1"
  }, 
  "origin": "113.54.225.108", 
  "url": "http://httpbin.org/get"
}

如果想带参数直接构造字典并传入到get()里面的params参数即可,不需要转码之类的操作:

 1 import requests
 2 data = {
 3         'name' : 'Boru'
 4         'age' : '18'
 5 }
 6 response = requests.get('http://httpbin.org/get', params = data)
 7 print(response.text)
 8 
 9 
10 》》》输出:
11 {
12   "args": {
13     "age": "22", 
14     "name": "germey"
15   }, 
16   "headers": {
17     "Accept": "*/*", 
18     "Accept-Encoding": "gzip, deflate", 
19     "Connection": "close", 
20     "Host": "httpbin.org", 
21     "User-Agent": "python-requests/2.19.1"
22   }, 
23   "origin": "113.54.225.108", 
24   "url": "http://httpbin.org/get?name=germey&age=22"
25 }

  

  2.解析json

我们之前遇到json格式的返回值需要使用json.loads()方法进行转码,在requests中可以直接调用json()即可:

 1 import requests
 2 import json
 3 
 4 response = requests.get('http://httpbin.org/get')
 5 print(type(response.text))
 6 print(response.json())
 7 print(json.loads(response.text))
 8 print(type(response.json()))
 9 
10 
11 》》》输出:
12 <class 'str'>
13 {'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.19.1'}, 'origin': '113.54.225.108', 'url': 'http://httpbin.org/get'}
14 {'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.19.1'}, 'origin': '113.54.225.108', 'url': 'http://httpbin.org/get'}
15 <class 'dict'>

可以看到使用两种方法解析结果都是一样的字典形式。

   

   3.获取二进制数据

我们经常会实现图片,视频等内容的获取,而他们是一种二进制流,所以需要转码操作,先看如下实例:

 1 import requests
 2 
 3 response = requests.get('https://www.baidu.com/img/baidu_jgylogo3.gif')
 4 print(type(response.text), type(response.content))
 5 print(response.text)
 6 print(response.content)
 7 
 8 
 9 》》》输出:
10 <class 'str'> <class 'bytes'>
11 (�ɨ����t{���,w�|
12 �B�Z�aK�7|M�Ph
13 �%����n8FN&:@F��|V1~w�y��r� �9�khlO�j�!.........  ;
14 
15 二进制码省略;

然后我们可以把它写入文件当中:

 1 import requests
 2 
 3 response = requests.get('https://www.baidu.com/img/baidu_jgylogo3.gif')
 4 with open('baidu.image','wb')as f:
 5     f.write(response.content)
 6     f.close()
 7 
 8 
 9 》》》输出:
10 即可在文件中看到百度图标的图片

  

  4.添加headers

在我们需要模拟浏览器进行登陆时,headers信息必不可少,requests添加headers信息也很方便,直接将参数传给headers参数

 1 import requests
 2 
 3 headers = {
 4     'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'
 5 }
 6 response = requests.get('https://www.zhihu.com/topics', headers = headers)
 7 print(response.status_code)
 8 
 9 
10 》》》输出:
11 200

我们可以发现,不加headers信息,我们是无法打开知乎网站的,所以在加入headers信息后我们可以打开了。

 

(三)基本POST请求

我们在进行post请求时,需要传入相应数据,在requests库中,我们不需要转码成bytes类型,只需传入给data参数即可:

import requests

data = {'name': 'Boru', 'age': '18'}
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'
}
response = requests.post("http://httpbin.org/post", data=data, headers=headers)
print(response.json())

》》》输出:
{'args': {}, 'data': '', 'files': {}, 'form': {'age': '18', 'name': 'Boru'}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Content-Length': '16', 'Content-Type': 'application/x-www-form-urlencoded', 'Host': 'httpbin.org', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}, 'json': None, 'origin': '210.41.98.60', 'url': 'http://httpbin.org/post'}

 

(四)响应属性

对于response 我们看看其有哪些可以利用的属性

import requests

response = requests.get('http://www.baidu.com')
print(type(response.status_code), response.status_code)
print(type(response.headers), response.headers)
print(type(response.cookies), response.cookies)
print(type(response.url), response.url)
print(type(response.history), response.history)


》》》输出:
<class 'int'> 200
<class 'requests.structures.CaseInsensitiveDict'> {'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform', 'Connection': 'Keep-Alive', 'Content-Encoding': 'gzip', 'Content-Type': 'text/html', 'Date': 'Tue, 25 Sep 2018 14:38:34 GMT', 'Last-Modified': 'Mon, 23 Jan 2017 13:27:36 GMT', 'Pragma': 'no-cache', 'Server': 'bfe/1.0.8.18', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Transfer-Encoding': 'chunked'}
<class 'requests.cookies.RequestsCookieJar'> <RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
<class 'str'> http://www.baidu.com/
<class 'list'> []

这里对状态码的判断有一个补充:

1 import requests
2 
3 response = requests.get('http://www.jianshu.com/hello.html')
4 exit() if not response.status_code == requests.codes.not_found else print('404 Not Found')
5 
6 
7 》》》输出:
8 404 Not Found

关于状态码的补充,可以参考下面的总结:

 1 100: ('continue',),
 2 101: ('switching_protocols',),
 3 102: ('processing',),
 4 103: ('checkpoint',),
 5 122: ('uri_too_long', 'request_uri_too_long'),
 6 200: ('ok', 'okay', 'all_ok', 'all_okay', 'all_good', '\\o/', ''),
 7 201: ('created',),
 8 202: ('accepted',),
 9 203: ('non_authoritative_info', 'non_authoritative_information'),
10 204: ('no_content',),
11 205: ('reset_content', 'reset'),
12 206: ('partial_content', 'partial'),
13 207: ('multi_status', 'multiple_status', 'multi_stati', 'multiple_stati'),
14 208: ('already_reported',),
15 226: ('im_used',),
16 
17 # Redirection.
18 300: ('multiple_choices',),
19 301: ('moved_permanently', 'moved', '\\o-'),
20 302: ('found',),
21 303: ('see_other', 'other'),
22 304: ('not_modified',),
23 305: ('use_proxy',),
24 306: ('switch_proxy',),
25 307: ('temporary_redirect', 'temporary_moved', 'temporary'),
26 308: ('permanent_redirect',
27       'resume_incomplete', 'resume',), # These 2 to be removed in 3.0
28 
29 # Client Error.
30 400: ('bad_request', 'bad'),
31 401: ('unauthorized',),
32 402: ('payment_required', 'payment'),
33 403: ('forbidden',),
34 404: ('not_found', '-o-'),
35 405: ('method_not_allowed', 'not_allowed'),
36 406: ('not_acceptable',),
37 407: ('proxy_authentication_required', 'proxy_auth', 'proxy_authentication'),
38 408: ('request_timeout', 'timeout'),
39 409: ('conflict',),
40 410: ('gone',),
41 411: ('length_required',),
42 412: ('precondition_failed', 'precondition'),
43 413: ('request_entity_too_large',),
44 414: ('request_uri_too_large',),
45 415: ('unsupported_media_type', 'unsupported_media', 'media_type'),
46 416: ('requested_range_not_satisfiable', 'requested_range', 'range_not_satisfiable'),
47 417: ('expectation_failed',),
48 418: ('im_a_teapot', 'teapot', 'i_am_a_teapot'),
49 421: ('misdirected_request',),
50 422: ('unprocessable_entity', 'unprocessable'),
51 423: ('locked',),
52 424: ('failed_dependency', 'dependency'),
53 425: ('unordered_collection', 'unordered'),
54 426: ('upgrade_required', 'upgrade'),
55 428: ('precondition_required', 'precondition'),
56 429: ('too_many_requests', 'too_many'),
57 431: ('header_fields_too_large', 'fields_too_large'),
58 444: ('no_response', 'none'),
59 449: ('retry_with', 'retry'),
60 450: ('blocked_by_windows_parental_controls', 'parental_controls'),
61 451: ('unavailable_for_legal_reasons', 'legal_reasons'),
62 499: ('client_closed_request',),
63 
64 # Server Error.
65 500: ('internal_server_error', 'server_error', '/o\\', ''),
66 501: ('not_implemented',),
67 502: ('bad_gateway',),
68 503: ('service_unavailable', 'unavailable'),
69 504: ('gateway_timeout',),
70 505: ('http_version_not_supported', 'http_version'),
71 506: ('variant_also_negotiates',),
72 507: ('insufficient_storage',),
73 509: ('bandwidth_limit_exceeded', 'bandwidth'),
74 510: ('not_extended',),
75 511: ('network_authentication_required', 'network_auth',
76 'network_authentication'),
状态码

 

(五)高级操作

    1.文件上传:

如果我们想把读取下来的图片又上传到http就可以把文件以二进制形式打开再上传到服务器即可

1 import requests
2 
3 files = {'file': open('favicon.ico', 'rb')}
4 response = requests.post("http://httpbin.org/post", files=files)
5 print(response.text)
6 
7 》》》输出:
8 即可在相应网址上看到该图片

    

    2.获取cookie:

通过返回的cookies属性获取,可通过键值对获取相关cookies信息

 1 import requests
 2 
 3 response = requests.get("https://www.baidu.com")
 4 print(response.cookies)
 5 for key, value in response.cookies.items():
 6     print(key + '=' + value)
 7 
 8 》》》输出:
 9 <RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
10 BDORZ=27315

    

    3.证书验证:

当我们访问有些网址时,会出现证书验证的问题,比如运行如下代码:

1 import requests
2 
3 response = requests.get('https://www.12306.cn')
4 print(response.status_code)
5 
6 
7 
8 》》》输出:
9 requests.exceptions.SSLError: HTTPSConnectionPool(host='www.12306.cn', port=443): Max retries exceeded with url: / 。。。。。。。

就会出现SSLError这个错误,如何消除呢,我们可以通过设置verify为false解决此问题

1 import requests
2 from requests.packages import urllib3
3 urllib3.disable_warnings()
4 response = requests.get('https://www.12306.cn', verify=False)
5 print(response.status_code)
6 
7 
8 》》》输出:
9 200

    

    4.代理设置:

如果我们要设置代理,也很方便,如果是httpa代理,只需传入一个proxies参数即可:

 1 import requests
 2 
 3 proxies = {
 4   "http": "http://127.0.0.1:9743",
 5   "https": "https://127.0.0.1:9743",
 6 }
 7 
 8 response = requests.get("https://www.taobao.com", proxies=proxies)
 9 print(response.status_code)
10 
11 
12 》》》输出:
13 200

如果有密码就加上user:password@账号:密码的相关信息即可:

 1 import requests
 2 
 3 proxies = {
 4     "http": "http://user:password@127.0.0.1:9743/",
 5 }
 6 response = requests.get("https://www.taobao.com", proxies=proxies)
 7 print(response.status_code)
 8 
 9 》》》输出:
10 200

如果不是https代理,而是socks代理则先安装这个包,再进行相同的操作:

 1 #现在cmd下安装socks:
 2 #pip3 install 'requests[socks]'
 3 
 4 
 5 import requests
 6 
 7 proxies = {
 8     'http': 'socks5://127.0.0.1:9742',
 9     'https': 'socks5://127.0.0.1:9742'
10 }
11 response = requests.get("https://www.taobao.com", proxies=proxies)
12 print(response.status_code)
13 
14 》》》输出:
15 200

    

    5.超时设置

只需设置一个timeout即可,如设为1,即为在1s内必须得到应答,否则报timeout超时--Readtimeout异常,这时我们可以再加上异常处理即可捕捉这个异常:

 1 import requests
 2 from requests.exceptions import ReadTimeout
 3 
 4 try:
 5     response = requests.get("http://httpbin.org/get", timeout = 0.5)
 6     print(response.status_code)
 7 except ReadTimeout:
 8     print('Timeout')
 9 
10 
11 》》》输出:
12 Timeout

    

 

    6.认证设置

当我们需要访问某些网站时,需要登陆才能有访问权限,这时候我们加入一个auth参数,以一个元组形式即可:

1 import requests
2 
3 r = requests.get('http://120.27.34.24:9001', auth=('user', '123'))
4 print(r.status_code)
5 
6 》》》输出:
7 20017:18:40

    

    7.异常处理

我们可以访问API文档查看相关异常,这里我们取一些子类的异常,最后再接受一个父类异常,可以帮助我们看看在访问过程中哪些地方出现问题了:

 1 import requests
 2 from requests.exceptions import ReadTimeout, ConnectionError, RequestException
 3 
 4 try:
 5     response = requests.get('http://httpbin.org/get', timeout=0.5)
 6     print(response.status_code)
 7 except ReadTimeout:
 8     print('Timeout')
 9 except ConnectionError:
10     print('Connection error')
11 except RequestException:
12     print('Error')
13 
14 
15 》》》输出;
16 Connection error

 

posted @ 2018-09-26 17:19  A-handsome-cxy  阅读(499)  评论(0编辑  收藏  举报