requests小技巧
问题:
抓取一网站,同一个账号需要同一个ip才能抓取,不然被封。所以做了简单的账户对应ip的类去使用
会话对象让你能够跨请求保持某些参数。它也会在同一个 Session 实例发出的所有请求之间保持 cookie, 期间使用 urllib3
的 connection pooling 功能。所以如果你向同意主机发送多个请求,底层的 TCP 连接将会被重用,从而带来显著的性能提升。 (参见 HTTP persistent connection).
例如个别网站需要上一次请求的登录认证信息等
import requests from requests.adapters import HTTPAdapter from urllib3.util import Retry requests.packages.urllib3.disable_warnings() # requests 使用 verify=False 取消控制台警告 data_list = [ {'username': '18866478714', 'password': '18866478714', "ip": 'http://admin:admin.@10.22.22.1:2845'}, {'username': '17109324203', 'password': '17109324203', "ip": 'http://admin:admin.@10.22.21.13:2345'}, {'username': '17109324197', 'password': '17109324197', "ip": 'http://admin:admin.@10.33.3.74:2345'} ] class LoginSession: def __init__(self, number): self.username = data_list[number]["username"] self.password = data_list[number]["password"] self.proxies = {"https": data_list[number]["ip"], "http": data_list[number]["ip"]} # print(self.username) # print(self.password) # print(self.proxies) self.session = self.loging() def loging(self): post_url = 'https://Example/Form' s = requests.session() retry = Retry(connect=5, backoff_factor=1) adapter = HTTPAdapter(max_retries=retry) s.mount('https://', adapter) s.keep_alive = False ss = s.post(url=post_url, headers=headers, data={'username': self.username, 'password': self.password}, proxies=self.proxies, verify=False ) print(ss.json()) return s if __name__ == '__main__': request_session_list = [LoginSession(_i) for _i in range(len(data_list))] # 得到对应对象,使用的时候随机选择使用
遇到问题
使用requests下载日志出现HTTPSConnectionPool(host='***', port=443): Max retries exceeded with url: ******(Caused by SSLError(SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:579)'),))
分析是ssl证书报错,解决办法:
requests.packages.urllib3.disable_warnings() # requests 使用 verify=False 取消控制台警告
from requests.adapters import HTTPAdapter from urllib3.util import Retry s = requests.session() retry = Retry(connect=5, backoff_factor=1) # 增加连接重试次数 adapter = HTTPAdapter(max_retries=retry)
s.mount('https://', adapter) s.keep_alive = False # 关闭多余的连接;requests使用了urllib3库,默认的http connection是keep-alive的,requests设置False关闭。
verify=False # 不适用ssl认证