Python requests模块使用

requests介绍

python操作网络，也就是打开一个网站，或者请求一个http接口，使用urllib模块。
urllib模块是一个标准模块，直接import urllib即可，在python3里面只有urllib模块，在python2里面有urllib模块和urllib2模块。

import json
from urllib import request
from urllib import parse
pay_url = 'http://szz.xxxx.cn/pay'
balance_url = 'http://szz.xxxx.cn/get_balance'
balance_data = {'user_id':1}
new_balance_data = parse.urlencode(balance_data)#把请求参数变成xx=11这样的
balance_req = request.urlopen(balance_url+'?'+new_balance_data)#发送get请求
print(balance_req.read().decode())
#获取接口返回的结果,返回的结果是bytes类型的,需要使用decode方法解码，变成一个字符串
pay_data ={"user_id":1,"price":"999"}
new_pay_data = parse.urlencode(pay_data)#把请求参数变成xx=11这样的
pay_req = request.urlopen(pay_url,new_pay_data.encode())#发送post请求，传入参数的话
#参数必须是bytes类型，所以需要先encode一下，变成bytes类型
print(pay_req.read().decode())#获取接口返回的结果,返回的结果是bytes类型的
#需要使用decode方法解码，变成一个字符串
res = json.loads(pay_req.read().decode())
#因为返回的是一个json传，想把json串转成字典的话，久使用json模块转成一个字典
print(res)

上面是使用python自带的urllib模块去请求一个网站，或者接口，但是urllib模块太麻烦了，传参数的话，都得是bytes类型，返回数据也是bytes类型，还得解码，想直接把返回结果拿出来使用的话，还得用json，发get请求和post请求，也不通，使用比较麻烦，还有一个比较方便的模块，比urllib模块方便很多，就是requests模块，它是基于python自带的urllib模块封装的，用来发送http请求和获取返回的结果，操作很简单，它使用比较方便，需要安装，pip install requests即可。
有了更为强大的库 requests，Cookies、登录验证、代理设置等操作都不是事儿。

安装环境

pip install requests

官方地址：http://docs.python-requests.org/zh_CN

request使用

首先，先举个简单的例子：

# 首先导入请求模块
import requests
# 现在，让我们尝试获得一个网页，在这个例子中，让我们获得 GitHub 的公共时间表:
r = requests.get('https://api.github.com/events')
# 现在，我们有一个名为 r 的 Response 对象，我们可以从这个对象中获取所需的所有信息。
# 输出返回值
print(r.json())

requests请求方式

requests提供的各个请求方式：

r = requests.get('https://api.github.com/events') #  HTTP GET 请求
r = requests.post('https://httpbin.org/post', data = {'key':'value'})  # HTTP POST 请求
r = requests.put('https://httpbin.org/put', data = {'key':'value'}) # HTTP PUT 请求
r = requests.delete('https://httpbin.org/delete') # HTTP DELETE 请求
r = requests.head('https://httpbin.org/get') # HTTP HEAD 请求
r = requests.options('https://httpbin.org/get') # HTTP OPTIONS 请求

在url 中传递参数

通常希望在 URL 的查询字符串中发送某种类型的数据。如果你手工构建 URL，这些数据会在 URL 中的一个问号后面作为键/值对给出，例如 httpbin.org/get?key=val。Request 允许您使用 params 关键字参数将这些参数作为字符串字典提供。举个例子，如果你想传递 key1 = value1和 key2 = value2到 httpbin.org/get，你可以使用以下代码:

payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get('https://httpbin.org/get', params=payload)

通过打印 URL，URL 已经被正确编码:

print(r.url)
# https://httpbin.org/get?key2=value2&key1=value1

注意，任何值为 None 的字典键都不会添加到 URL 的查询字符串中。

例：

import requests

payload = {'key1': 'value1', 'key2': ['value2', 'value3'], 'key3': None}
r = requests.get('https://httpbin.org/get', params=payload)
print(r.url)
# https://httpbin.org/get?key1=value1&key2=value2&key2=value3

Response Content 响应内容

我们可以读取服务器响应的内容，如GitHub 时间表:

import requests

r = requests.get('https://api.github.com/events')
print(r.status_code)  # 获取返回状态码
# 200
print(r.content)  # 获取返回的内容，二进制格式,一般下载图片、视频用这个
# b'[{"id":"15091789643","type":"WatchEvent","actor":{"id":77262497,"login":"dcryptic","display_login"'……
print(r.text)  # 获取返回的内容，字符串格式
# [{"id":"15091789643","type":"WatchEvent","actor":{"id":77262497,"login":"dcryptic","display_login"……
print(r.json())  # 获取返回的内容，json格式,这个必须是返回的是json才可以使用，否则会报错
print(r.headers)  # 获取响应头
# {'Date': 'Sat, 06 Feb 2021 17:04:41 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Server': 'GitHub.com'
print(r.cookies)  # 获取返回的cookie
# <RequestsCookieJar[]>
print(r.encoding)  # 获取返回的字符集
# utf-8

读取原始返回信息

import requests

r = requests.get('https://api.github.com/events', stream=True)

print(r.raw)
# <urllib3.response.HTTPResponse object at 0x101194810>
print(r.raw.read(10))
# '\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'

# 通过以下方式读取返回信息
filename = 'tmp.txt'
with open(filename, 'wb') as fd:
    for chunk in r.iter_content(chunk_size=128):
        fd.write(chunk)

请求头信息

如果您希望向请求添加 HTTP 头，只需将 dict 传递到头参数即可
例：我们在前面的例子中指定我们的用户代理

import requests


url = 'https://api.github.com/some/endpoint'
headers = {'user-agent': 'my-app/0.0.1'}
r = requests.get(url, headers=headers)
print(r.request.headers)
# {'user-agent': 'my-app/0.0.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'

}

提交表单

通常，希望发送一些表单编码的数据ーー非常类似于 HTML 表单。为此，只需向数据参数传递一个字典。当发出请求时，您的数据字典将自动以表格形式编码:

import requests

payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.post("https://httpbin.org/post", data=payload)
print(r.text)

{
  ...
  "form": {
    "key2": "value2",
    "key1": "value1"
  },
  ...
}

数据参数还可以为每个键提供多个值。这可以通过使数据成为元组列表或以列表作为值的字典来实现。当表单中有多个元素使用同一个键时，这一点尤其有用:

import requests
payload_tuples = [('key1', 'value1'), ('key1', 'value2')]
r1 = requests.post('https://httpbin.org/post', data=payload_tuples)
payload_dict = {'key1': ['value1', 'value2']}
r2 = requests.post('https://httpbin.org/post', data=payload_dict)
print(r1.text)
print(r1.text == r2.text)

{
  ...
  "form": {
    "key1": [
      "value1",
      "value2"
    ]
  },
  ...
}

True

提交json

import json
import requests

url = 'https://httpbin.org/post'
payload = {'some': 'data'}
r = requests.post(url, data=json.dumps(payload))
print(r.text)

{
  "args": {}, 
  "data": "{\"some\": \"data\"}", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "16", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.25.0", 
    "X-Amzn-Trace-Id": "Root=1-601fe95c-2c0d0fa64cbe4b864ffacf27"
  }, 
  "json": {
    "some": "data"
  }, 
  "origin": "115.205.12.169", 
  "url": "https://httpbin.org/post"
}

也可以直接使用 json 参数传递它(在2.4.2版本中添加) ，它会自动编码:

import requests

url = 'https://httpbin.org/post'
payload = {'some': 'data'}
r = requests.post(url, json=payload)
print(r.text)

在请求中使用 json 参数将标题中的 Content-Type 更改为 application/json。

上传文件

import requests

url = 'https://httpbin.org/post'
files = {'file': open('report.xls', 'rb')}
r = requests.post(url, files=files)
print(r.text)

{
  "args": {}, 
  "data": "", 
  "files": {
    "file": "data:application/octet-......
  }, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "8194", 
    "Content-Type": "multipart/form-data; boundary=e961c3051731ba84d4ecac7697a015f6", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.25.0", 
    "X-Amzn-Trace-Id": "Root=1-601feb60-73cbbd406614abd75f343d59"
  }, 
  "json": null, 
  "origin": "115.205.12.169", 
  "url": "https://httpbin.org/post"
}

你可以明确地设置文件名，content _ type 和 header:

import requests

url = 'https://httpbin.org/post'
files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}
r = requests.post(url, files=files)
print(r.text)

{
  "args": {}, 
  "data": "", 
  "files": {
    "file": "data:application/vnd.ms-......
  }, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "8246", 
    "Content-Type": "multipart/form-data; boundary=a9e77a61afc30daf48a9009ab7f36919", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.25.0", 
    "X-Amzn-Trace-Id": "Root=1-601fec11-33c5b2ca0ae27f0457f9b81b"
  }, 
  "json": null, 
  "origin": "115.205.12.169", 
  "url": "https://httpbin.org/post"
}

如果你愿意，你可以发送字符串作为文件接收:

import requests

url = 'https://httpbin.org/post'
files = {'file': ('report.csv', 'some,data,to,send\nanother,row,to,send\n')}
r = requests.post(url, files=files)
print(r.text)

{
  "args": {}, 
  "data": "", 
  "files": {
    "file": "some,data,to,send\nanother,row,to,send\n"
  }, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "184", 
    "Content-Type": "multipart/form-data; boundary=9d25ccc26787bbb3947354fd9fde5b24", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.25.0", 
    "X-Amzn-Trace-Id": "Root=1-601fec95-170554e81ee8cf923e3b02df"
  }, 
  "json": null, 
  "origin": "115.205.12.169", 
  "url": "https://httpbin.org/post"
}

禁用/启用重定向功能

GitHub 将所有 HTTP 请求重定向到 HTTPS:

import requests

r = requests.get('http://github.com/')
print(r.url)
# 'https://github.com/'
print(r.status_code)
200
print(r.history)
# [<Response [301]>]

可以使用 allow _ redirections 参数禁用重定向处理:

import requests

r = requests.get('http://github.com/', allow_redirects=False)
print(r.url)
# http://github.com/
print(r.status_code)
# 301
print(r.history)
# []

设置超时时间

可以使用 timeout 参数告诉 Requests 在给定的秒数之后停止等待响应。几乎所有的生产代码都应该在几乎所有的请求中使用这个参数。如果不这样做，可能会导致程序无限期挂起:

import requests

requests.get('https://github.com/', timeout=0.001)
'''
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)
'''

超时值将应用于连接和读取超时。如果要单独分开设置值，请指定一个元组:
r = requests.get('https://github.com', timeout=(3.05, 27))
如果远程服务器非常慢，您可以通过传递 None 作为超时值，告诉 request 永远等待响应。
r = requests.get('https://github.com', timeout=None)

大多数对外部服务器的请求应该附加一个超时，以防服务器没有及时响应。默认情况下，除非显式设置超时值，否则请求不会超时。如果没有超时，代码可能会挂起分钟或更长时间。

错误和异常

如果发生网络问题(例如 DNS 失败，拒绝连接等) ，请求将引发 ConnectionError 异常。

ConnectionError：网络问题(如DNS失败、拒绝连接等)。
HTTPError: 比较罕见的无效HTTP响应时。
timeout：请求超时。
tooManyRedirects：超过了设定的最大重定向次数。
requests.exceptions.RequestException是所有具体异常的基类。

Session

Session 对象允许您跨请求持久保存某些参数。它还持久化来自 Session 实例的所有请求的 cookie，并将使用 urllib3的连接池。因此，如果您向同一台主机发出多个请求，底层的 TCP 连接将被重用，这可能会导致性能显著提高

import requests

s = requests.Session()
s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get('https://httpbin.org/cookies')
print(r.text)
# '{"cookies": {"sessioncookie": "123456789"}}'

也可以使用session向请求方法提供缺省数据。这是通过向 Session 对象的属性提供数据来实现的:

import requests

s = requests.Session()
s.auth = ('user', 'pass')
s.headers.update({'x-test': 'true'})

# both 'x-test' and 'x-test2' are sent
r = s.get('https://httpbin.org/headers', headers={'x-test2': 'true'})
print(r.request.headers)
'''
{'User-Agent': 'python-requests/2.25.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'x-test': 'true', 'x-test2': 'true', 'Authorization': 'Basic dXNlcjpwYXNz'}
'''

但是，请注意，方法级别的参数也不会在请求之间持久化。这个例子只发送第一个请求的 cookies，而不是第二个请求:

import requests

s = requests.Session()

r = s.get('https://httpbin.org/cookies', cookies={'from-my': 'browser'})
print(r.text)
# '{"cookies": {"from-my": "browser"}}'

r = s.get('https://httpbin.org/cookies')
print(r.text)
# '{"cookies": {}}'

也可以用上下文管理器,这将确保在 with 块退出时立即关闭会话，即使发生了未处理的异常。

with requests.Session() as s:
    s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')

有时您会希望从 dict 参数中省略会话级键。为此，只需在方法级别参数中将该键的值设置为 None。它将自动被省略。

请求和响应对象

每当调用 requests.get ()和 friends 时，您都在做两件主要的事情。首先，您正在构建一个 Request 对象，该对象将被发送到服务器以请求或查询某些资源。其次，一旦 request 从服务器获得响应，就会生成 Response 对象。Response 对象包含服务器返回的所有信息，还包含您最初创建的 Request 对象。下面是一个从维基百科服务器获取一些非常重要信息的简单请求:
>>> r = requests.get('https://en.wikipedia.org/wiki/Monty_Python')
如果我们想访问服务器发回给我们的头，我们可以这样做:

>>> r.headers
{'content-length': '56170', 'x-content-type-options': 'nosniff', 'x-cache':
'HIT from cp1006.eqiad.wmnet, MISS from cp1010.eqiad.wmnet', 'content-encoding':
'gzip', 'age': '3080', 'content-language': 'en', 'vary': 'Accept-Encoding,Cookie',
'server': 'Apache', 'last-modified': 'Wed, 13 Jun 2012 01:33:50 GMT',
'connection': 'close', 'cache-control': 'private, s-maxage=0, max-age=0,
must-revalidate', 'date': 'Thu, 14 Jun 2012 12:59:39 GMT', 'content-type':
'text/html; charset=UTF-8', 'x-cache-lookup': 'HIT from cp1006.eqiad.wmnet:3128,
MISS from cp1010.eqiad.wmnet:80'}

然而，如果我们想得到我们发送给服务器的头，我们只需访问请求，然后访问请求的头:

>>> r.request.headers
{'Accept-Encoding': 'identity, deflate, compress, gzip',
'Accept': '*/*', 'User-Agent': 'python-requests/1.2.0'}

预请求

当从API或会话调用接收Response对象时，request属性实际上是PreparedRequest。如果你需要修改body或header，可以如下方式进行处理：

from requests import Request, Session

s = Session()

req = Request('POST', url, data=data, headers=headers)
prepped = req.prepare()

# do something with prepped.body
prepped.body = 'No, I want exactly this as the body.'

# do something with prepped.headers
del prepped.headers['Content-Type']

resp = s.send(prepped,
    stream=stream,
    verify=verify,
    proxies=proxies,
    cert=cert,
    timeout=timeout
)

print(resp.status_code)

这里没有对Request对象进行特殊处理，而是修改PreparedRequest对象。然后用requests.* 或Session.*发送。

上述代码没有Request Session。Session层状态，如cookie不会使用。用Session.prepare_request()替换Request.prepare()即可增加状态支持：

from requests import Request, Session

s = Session()
req = Request('GET',  url, data=data, headers=headers)

prepped = s.prepare_request(req)

# do something with prepped.body
prepped.body = 'Seriously, send exactly these bytes.'

# do something with prepped.headers
prepped.headers['Keep-Dead'] = 'parrot'

resp = s.send(prepped,
    stream=stream,
    verify=verify,
    proxies=proxies,
    cert=cert,
    timeout=timeout
)

print(resp.status_code)

证书验证

HTTPS 请求会验证 SSL 证书，就像 web 浏览器一样。默认情况下，启用了 SSL 验证，如果 request 无法验证证书，它将抛出 SSLError:

>>> requests.get('https://requestb.in')
requests.exceptions.SSLError: hostname 'requestb.in' doesn't match either of '*.herokuapp.com', 'herokuapp.com'

可以使用受信任 CA 的证书传递到 CA _ bundle 文件或目录的验证路径:

>>> requests.get('https://github.com', verify='/path/to/certfile')
```Python
或者是持久的:

s = requests.Session()
s.verify = '/path/to/certfile'

>如果 verify 设置为目录的路径，则必须使用 OpenSSL 提供的 c _ rehash 实用程序处理该目录。

这个可信 ca 列表也可以通过 REQUESTS _ ca _ bundle 环境变量指定。如果没有设置 REQUESTS _ ca _ bundle，那么 CURL _ ca _ bundle 将被用作后备。
如果将 verify 设置为 False，请求也可以忽略对 SSL 证书的验证:

requests.get('https://kennethreitz.org', verify=False)
<Response [200]>

>请注意，当 verify 设置为 False 时，请求将接受服务器提供的任何 TLS 证书，并忽略主机名不匹配和/或过期证书，这将使您的应用程序容易受到中间人(man-in-the-middle，MitM)攻击。在本地开发或测试期间，将 verify 设置为 False 可能很有用。默认情况下，verify 被设置为 True。

#客户端证书
您还可以将本地证书指定为客户端证书、单个文件(包含私钥和证书)或两个文件路径的元组:

requests.get('https://kennethreitz.org', cert=('/path/client.cert', '/path/client.key'))
<Response [200]>

或者是持久的:

s = requests.Session()
s.cert = '/path/client.cert'

如果你指定了一个错误的路径或者一个无效的证书，你会得到一个 SSLError:

requests.get('https://kennethreitz.org', cert='/wrong_path/client.pem')
SSLError: [Errno 336265225] _ssl.c:347: error:140B0009:SSL routines:SSL_CTX_use_PrivateKey_file:PEM lib

>本地证书的私钥必须未加密。目前，请求不支持使用加密密钥。

##主体内容工作流
默认情况下，当您发出请求时，响应主体会立即被下载。您可以覆盖这种行为并推迟下载响应正文，直到访问带有 stream 参数的 Response.content 属性:

tarball_url = 'https://github.com/psf/requests/tarball/master'
r = requests.get(tarball_url, stream=True)

这时，只有响应头被下载，连接保持打开，因此允许我们使内容检索有条件:

if int(r.headers['content-length']) < TOO_LONG:
content = r.content
...

您可以使用 Response.iter _ content ()和 Response.iter _ lines ()方法进一步控制工作流。或者，您可以从底层 urllib3 urllib3.HTTPResponse 的 Response.raw 中读取未解码的主体。
如果在发出请求时将流设置为 True，那么除非使用所有数据或调用 Response.close，否则 request 无法将连接释放回池。这可能导致连接效率低下。如果您发现自己在使用 stream = True 时部分读取请求函数体(或者根本没有读取它们) ，那么您应该在 with 语句中发出请求，以确保它始终处于关闭状态:

with requests.get('https://httpbin.org/get', stream=True) as r:
# Do things with the response here.

##Streaming Uploads
请求支持流式上传（Streaming Uploads），允许您发送大型流或文件，而不需要将它们读入内存。要传输和上传，只需为你的body提供一个类似文件的对象:

with open('massive-body', 'rb') as f:
requests.post('http://some.url/streamed', data=f)

>强烈建议您以二进制模式打开文件。这是因为 Requests 可能会尝试为您提供 Content-Length 标头，如果提供了，这个值将被设置为文件中的字节数。如果以文本模式打开文件，可能会发生错误。


下面是requests模块的一些实例
```Python
import requests        
pay_url = 'https://www.cnblogs.com/feng0815/pay'
balance_url = 'https://www.cnblogs.com/feng0815/get_balance' 
balance_data = {'user_id':1} 
pay_data ={"user_id":1,"price":"999"} 
balance_res = requests.get(balance_url,balance_data).text #发送get请求，并获取返回结果，text获取的结果是一个字符串 
print(balance_res) 
balance_res = requests.get(balance_url,balance_data).json() #发送get请求，并获取返回结果，json()方法获取的结果直接是一个字典 
print(balance_res) 
pay_res = requests.post(pay_url,pay_data).json()#调用post方法 
print(pay_res)
 #====入参是json串的==== 
url = 'http://api.xxxxx.cn/getmoney' 
data = {"userid":1} 
res = requests.post(url,json=data).json()#指定入参json
print(res) 
#======添加cookie===== 
url = 'http://api.xxxxxx.cn/setmoney2' 
data = {'userid':1,"money":9999} 
cookie = {'token':"token12345"} 
res = requests.post(url,data,cookies=cookie).json()#使用cookies参数指定cookie
print(res) 
#=====添加权限验证===== 
url = 'http://api.xxxxxx.cn/setmoney' 
data = {'userid':1,"money":91999}
res = requests.post(url,data,auth=('admin','123456')).json() #使用auth参数指定权限验证的账号密码，auth传的是一个元组 
print(res) 
#======发送文件===== 
url = 'http://api.xxx.cn/uploadfile' 
res = requests.post(url,files={'file':open('api11.py')}).json() #指定files参数，传文件，是一个文件对象 
print(res) 
#=====发送header====== 
url = 'http://api.xxx.cn/getuser2' 
data = {'userid':1} 
header = {'Content-Type':"application/json"} 
res = requests.post(url,headers=header).json() #指定headers参数，添加headers 
print(res)

import requests
 
req = requests.get('http://www.xxx.cn',data={'username':'xxx'},cookies={'k':'v'},
                   headers={'User-Agent':'Chrome'},verify=False,timeout=3)  #发送get请求，data是请求数据，
                        # cookies是要发送的cookies，headers是请求头信息，verify=False是https请求的时候要加上，要不然会报错。
                        #timeout参数是超时时间，超过几秒钟的话，就不再去请求它了，会返回timeout异常
                        #这些都可以不写，如果有的话，可以加上
req2 = requests.post('http://www.xxx.cn',data={'username':'xxx'},cookies={'k':'v'},
                    headers={'User-Agent':'Chrome'},files={'file':open('a.txt')},timeout=3) #发送post请求，data是请求数据，
                    # cookies是要发送的cookies，headers是请求头信息，files是发送的文件，verify=False是https请求的时候要加上，
                    # 要不然会报错,timeout参数是超时时间，超过几秒钟的话，就不再去请求它了，会返回timeout异常
                    #这些都可以不写，如果有的话，可以加上

http权限认证

有一些网站，比如说下载东西的时候有http的权限验证，没有验证话就返回401 请求未经授权这种错误的。一般都是需要http权限验证，下面是怎么添加http权限验证。

当然这个http权限认证是http本身的，和你那些登陆请求那些不一样，比如说你要调一个登陆接口，传入的账号密码，和那个不是一回事，要区别开。

举个例子呢就是商场的大门上的锁就是这个http权限验证，这个锁是人家商场的，而你的店铺的锁才是你登陆接口，你输入的账号密码。一般你一打开一个网站直接弹出来一个窗口让你输入账号密码，你都看不到页面，这种就是http权限验证。而那种你打开网站之后，直接就能看到页面，你要登录的时候，输入账号密码然后点登录的，这种的就是正常的登陆请求。这种http权限验证的比较少见。

import requests
from requests.auth import HTTPBasicAuth
#导入HTTPBasicAuth
 
req = requests.post('http://www.cnblogs.com',data={'username':'xxx'},auth=HTTPBasicAuth('username','password'))
#使用的时候加上auth参数，然后使用HTTPBasicAuth，传入账号和密码即可。其他的都是和以前一样用
print(req.status_code)

http会话保持

什么是会话保持，就是有一些操作需要登录之后才操作的，你得先登录，然后才能做其他的操作。那咱们做的时候怎么做，先发送登陆的请求，获取到登录的cookie信息，（因为登录之后它的身份验证信息都是放在cookie里面的），然后把cookie传给下一个你要请求的url，这样就ok了，看下面代码。

正常的话咱们要

import requests
r1=requests.post('https://www.cnblogs.com/feng0815/login',data={'username':'chenshifeng','password':'123456'})#登陆请求
login_cookies = r1.cookies #获取到登陆请请求返回的cookie
r2 = requests.post('https://www.cnblogs.com/feng0815/create_user',
                   data={'title':'测试测试','content':'发送文章测试'},
                   cookies=login_cookies)
    #把登陆获取到的cookie，传给发表文章的这个请求，就ok了
print(r2.text)

这么做，requests模块给咱们提供了更简单的方式，就是使用requests.session这个方法，它会自动帮咱们管理cookie，不需要咱们自己再获取到登陆的cookie，传给创建文件的请求，代码如下：

import requests
r=requests.session()
login_req = r.post('https://www.cnblogs.com/feng0815/login',data={'username':'chenshifeng','password':'123456'}) #发送登陆的请求
r1 = r.post('https://www.cnblogs.com/feng0815/create_user',
                   data={'title':'测试测试','content':'发送文章测试'}) #发送创建文件的请求
print(r1.text)#获取返回的结果

http代理设置

我们在写爬虫的时候，如果都用同一个ip访问多次访问某个网站，经常ip会被封，这样我们就访问不了了，那怎么解决呢，就得用ip代理了，代理的意思就是咱们把请求先发到代理上，然后再由代理帮咱们把请求发送出去，这样最终访问网站的ip就不是咱们自己的ip了。网上代理有很多，大多数收费的代理网站上每天都会有几个免费的代理，我先从https://www.kuaidaili.com/free/inha/ 这个里面找了几个免费的代理，设置代理的代码如下

import requests
#不带账号密码的代理
posix = {
'http':'http://119.187.75.46:9000', #http
'https':'http://112.95.18.133:9000',#https
}
res = requests.get('http://www.xxx.cn',proxies=posix).text
print(res)
 
#带账号密码的代理
posix = {
'http':'http://user:password@127.0.0.1:9000', #http
'https':'http://user:password@127.0.0.1:9000',#https
}
res = requests.get('http://www.cnblogs.com',proxies=posix).text
print(res)

posted @ 2017-11-18 15:21 尘世风阅读(1226) 评论(0) 编辑收藏举报

刷新页面返回顶部

尘世风

纸上得来终觉浅，绝知此事要躬行！