python学习之-requests模块基础

安装版本：2.18

模块导入：import requests

l 发送请求

发送GET请求：

获取GITHUB的公共时间线

r = requests.get(url='https://api.github.com/events')

现在r为response对象，从这个对象可以获取想要的信息

发送POST请求

r = requests.post(url='http://httpbin.org/post', data={'key':'value'})

发送put请求

r = requests.put(url='http://httpbin.org/put', data={'key':'value'})

发送delete请求

r = requests.delete(url='http://httpbin.org/delete’)

发送head请求

r = requests.head(url='http://httpbin.org/get')

发送options请求

r = requests.optinos(url='http://httpbin.org/get')

以上为request的基本用法。

l 传递URL参数

Requests允许使用params关键字参数，以一个字符串字典来提供，比如：传递 key1=value1 和 key2=value2 到 httpbin.org/get，代码如下：

Pload = {'key1':'value1','key2':'value2'}

r = requests.get('http://httpbin.org/get', params=Pload)

输入新的url：print(r.url)

http://httpbin.org/get?key1=value1&key2=value2

注意：以上字典里如果出现值为None，那么健不会被添加到URL的查询字符串里

将一个列表作为值传入

Pload = {'key1':'value1','key2':['value2','value3']}

r = requests.get('http://httpbin.org/get', params=Pload)

输入新的url：print(r.url)

http://httpbin.org/get?key1=value1&key2=value2&key2=value3

l 响应内容

读取服务器响应的内容，以GITHUB时间线为例

import requests

r = requests.get(url='https://api.github.com/events')

print(r.text)

[{"id":"7610277004","type":"IssuesEvent","actor":{"id":1049678,"login":"tkurki","display_login":"tkurki","gravatar_id":"","url":"https://api.github.com/users/tkurki","avatar_url":"https://avatars.githubusercontent.com/u/1049678?"},"repo":{"id":58462216,"name":"vazco/uniforms","url":"https://api.github.com/repos/vazco/uniforms"},"payload":{"action":"opened",…………

Requests会自动解码来自服务器的内容，大多数unicode字符集都能被无缝的解码。

请求发出后，requests会基于HTTP头部对响应的编码作出有根据的推测，当你访问r.text之时，request会基于其推断的文本进行编码，你可以找出request使用了什么编码，并且能够使用r.encoding属性修改它

r = requests.get(url='https://api.github.com/events')

print(r.encoding)

输出默认编码：utf-8

r.encoding='ISO-8859-1'

print(r.encoding)

输出修改后使用的编码：ISO-8859-1

当修改编码后，每当使用r.text，requests都将使用r.encoding的新值。

比如：HTTP,XML自身可以指定编码，这样的话，可以通过r.content来找到编码，然后设置 r.encoding 为相应的编码，这样就能使用正确的编码解析r.text

r = requests.get(url='http://www.etongbao.com.cn')

r.content

b'<!DOCTYPE html>\n<html lang="zh-CN">\n <head>\n <meta charset="utf-8">\n …..

以BYTES类型打印页面所有内容

l 二进制响应内容

r.content

b'<!DOCTYPE html>\n<html lang="zh-CN">\n <head>\n <meta charset="utf-8">\n …..

Requests会自动为你解码gzip和deflate传输编码的响应数据。

例如：以请求返回的二进制数据创建一张图片，可以使用如下：

from PIL import Image

from io import BytesIO

i = Image.open(BytesIO(r.content))

l JSON响应内容

Requests中有一个内置的JSON解码器，可阻你处理json数据。

import requests

r = requests.get(url='https://api.github.com/events')

print(type(r.json()))

print(r.json())

输出：

{'message': "API rate limit exceede

如果r.jsnon解析失败，r.json会抛出一个异常，如：ValueError: No JSON object could be decoded 异常，但是，有个服务器在失败的响应中也包含一个json对象，这种json会被解码返回，要检查请求是否成功，请使用：

r.raise_for_status() 或者检查r.status_code 是否和期望值相同。

l 原始响应内容

获取来自服务器的原始套接字响应，需要在初始请求中设置：stream=True

import requests

r = requests.get(url='https://api.github.com/events', stream = True)

print(r.raw)

返回：<urllib3.response.HTTPResponse object at 0x023C12F0> 对象

print(r.raw.read(10))

返回：b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03' 返回原始10字节内容

一般情况，以下面方式保存文本流

import requests

r = requests.get(url='https://api.github.com/events', stream = True)

with open('test', 'wb') as fb:

    for chunk in r.iter_content(chunk_size):

        fb.write(chunk)

使用r.iter_content将会处理大量你直接使用r.raw不得不处理的内容。

l 定制请求头

为请求添加HTTP头部，通过传递一个dict给headers参数即可。

url = 'https://api.github.com/events'

headers = {'user-agent':'my-app/1.0.0'}

r = requests.get(url, headers=headers)

注意：定制header的优先级低于某些特定的信息源。例如：

l 如果在.netrc中设置了用户认证信息，使用headers=设置的授权就不会生效。而如果设置了auth=参数，’’.netrc’’的设置就无效了

l 如果被重定向到别的主机，授权header就会被删除

l 代理授权header会被URL中提供的代理身份覆盖掉

l 在我们能判断内容长度的情况下，header的content-length会被改写

更进一步将，requests不会基于定制header的具体情况改变自己的行为，只不过在最后的请求中，所有的header信息都会被传递进去。

注意：所有的header值必须是string,bytestring或者unicode。尽管传递unicode header也是运行的，但不建议这样做。

l 更加复杂的POST请求

发送编码为表单形式的数据，只需将一个字典传递给data参数，数据字典在发送请求时会自动编码为表单形式：

import requests

payload = {'key1':'value1','key2':'value2'}

r = requests.post(url='http://httpbin.org/post', data=payload)

print(r.text)

输出：

"form": {

"key1": "value1",

"key2": "value2"

可以为data参数传入一个元祖参数，在表单中多个元素使用同一个key的时候，方式如下：

import requests

payload = (('key1', 'value1'),('key1','value2'))

r = requests.post(url='http://httpbin.org/post', data=payload)

print(r.text)

输出：

"form": {

"key1": [

"value1",

"value2"

]

如果传递的是字符串，非dict，那么数据会被直接发送出去。

例如：github api v3接受编码为json的POST/PATCH数据

import requests,json

url = 'https://api.github.com/some/endpoint'

payload = {'sone':'data'}

r = requests.post(url, data=json.dumps(payload))

使用json参数直接传递，然后它就会被自动编码，这是2.4.2版新加功能

import requests,json

url = 'https://api.github.com/some/endpoint'

payload = {'sone':'data'}

r = requests.post(url, json=payload)

这里payload会被自动转化为json格式，

data=json.dumps(payload) == json=payload 这2个是相同的结果

l POST一个多部分编码(Multipart-Encoded)的文件

Requests使上传多部分编码文件变得简单

import requests,json

url = 'http://httpbin.org/post'

files = {'file':open('t1','rb')}

r = requests.post(url, files=files)

print(r.text)

输出：

{ "files": {

"file": "zhaoyong\r\nzhaoyong\r\nzhaoyong"

}

显示设置文件名，文件类型和请求头

import requests,json

url = 'http://httpbin.org/post'

files = {'file':('t1', open('t1','rb'), 'application/vnd.ms-excel', {'Expires': '0'})}

r = requests.post(url, files=files)

print(r.text)

也可以发送作为文件来接收的字符串

import requests,json

url = 'http://httpbin.org/post'

files = {'file':('t2','zhaoyong,zhoayong,zhaoyong')}

r = requests.post(url, files=files)

print(r.text)

输出：

"files": {

"file": "zhaoyong,zhoayong,zhaoyong"

如果发送非常大的文件作为 multipart/form-data请求，默认情况requests不支持做成数据流，有个第三方包：requests-toolbelt支持，参阅：toolbelt文档，http://toolbelt.readthedocs.io/en/latest/

一个请求发送多文件参考：http://docs.python-requests.org/zh_CN/latest/user/advanced.html#advanced

警告：一定要用二进制模式打开文件，因为requests可能会试图为你提供Content-Length header，这个值会被设为文件的字节数，如果用文本模式打开，可能会发生错误。

l 响应状态码

检测响应状态码：

import requests,json

r = requests.get(url='http://httpbin.org/get')

print(r.status_code)

输出：200

一个错误请求，使用raise_for_status()来抛出异常，无异常输出为None

import requests,json

bad_r = requests.get('http://httpbin.org/status/404')

bad_r.status_code

输出：404

bad_r.raise_for_status()

输出：

Traceback (most recent call last):

File "D:/AutoCobbler/dellIdrac/idrac_api.py", line 35, in <module>

bad_r.raise_for_status()

File "C:\Python36-32\lib\site-packages\requests\models.py", line 935, in raise_for_status

raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 404 Client Error: NOT FOUND for url: http://httpbin.org/status/404

l 响应头

r.headers : 会以python字典形式展示服务器的响应头

    'content-encoding': 'gzip',

    'transfer-encoding': 'chunked',

    'connection': 'close',

    'server': 'nginx/1.0.4',

    'x-runtime': '148ms',

    'etag': '"e1ca502697e5c9317743dc078f67693f"',

    'content-type': 'application/json'

注：HTTP头部大小写不敏感

因此，可以使用任意大写形式访问这些响应头字段

url = 'http://httpbin.org/post'

files = {'file':('t2','zhaoyong,zhoayong,zhaoyong')}

r = requests.post(url, files=files)

print(r.headers['content-type'])  # 以字典形式打印

print(r.headers.get('content-type'))　　# 通过get获取数据

特殊点，服务器可以多次接受同一header，每次都使用不同的值，但requests会将它们合并，这样他们就可以用一个映射来表示出来。

l   Cookie

获取cookie

import requests

url = 'http://example.com/some/cookie/setting/url'

r = requests.get(url)

r.cookies['example_cookie_name']

发送cookeis到服务器

import requests

url = 'http://httpbin.org/cookies'

cookies = dict(cookies_are = 'working')

r = requests.get(url, cookies = cookies)

print(r.text)

输出：

  "cookies": {

    "cookies_are": "working"

Cookies的返回对象为RequestsCookieJar,它和字典类似，适合跨域名跨路劲使用，可以把cookiejar传到requests中。

import requests

jar = requests.cookies.RequestsCookieJar()

jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')

jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere')

url = 'http://httpbin.org/cookies'

r = requests.get(url, cookies=jar)

print(r.text)

输出：

  "cookies": {

    "tasty_cookie": "yum"

l   重定向与请求历史

除了使用head, requests自动处理重定向，还可用history来追踪重定向。

Response.history是一个response对象的列表，为了完成请求而创建了这些对象，这个对象的列表按照从最老到最近的请求进行排序。

import requests

r = requests.get(url='http://github.com')

print(r.url)

print(r.status_code)

print(r.history)

输出：

https://github.com/

[<Response [301]>]

如果使用的是：GET,POST,OPTIONS,PUT,PATCH或者DELETE，可以通过allow_redirects参数禁用重定向处理。

import requests

r = requests.get(url='http://github.com', allow_redirects=False)

print(r.url)

print(r.history)

输出：

http://github.com/

[]

如果使用HEAD，也可以启动重定向

import requests

r = requests.head(url='http://github.com', allow_redirects=True)

print(r.url)

print(r.history)

输出：

https://github.com/

[<Response [301]>]

l   超时

Requests以timeout参数设定的秒数时间之后停止等待响应，如果不设定，程序有可能永远失去响应。

import requests

r = requests.head(url='http://github.com', timeout=0.001)

输出：

requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='github.com', port=80): Max retries exceeded with url:......................

注意：timeout只对连接过程有效，与响应体的下载无关，timeout并不是整个下载响应的时间限制，而是如果服务器在timeout秒内没有应答，将会引发一个异常（精确的说，在timeout秒内没有从基础套接字上接收到任何字节的数据时）

l   错误与异常

如遇网络问题（DNS查询失败，拒绝连接等）时，requests会抛出一个ConnectionError异常。

如果HTTP请求返回了不成功的状态码，response.raise_for_status()会抛出一个HTTPError异常。

若请求超时，则抛出一个timeout异常。

若请求超过了设定的最大重定向次数，则会抛出一个TooManyRedirects异常。

所有requests显示抛出的异常都继承自requests.exceptions.RequestException

posted @ 2018-05-01 22:26 十年如一..bj 阅读(7014) 评论(0) 编辑收藏举报

刷新页面返回顶部

十年如一..bj

python学习之-requests模块基础

公告