骑骡子赶猪  

requests小技巧

问题:

抓取一网站,同一个账号需要同一个ip才能抓取,不然被封。所以做了简单的账户对应ip的类去使用

会话对象让你能够跨请求保持某些参数。它也会在同一个 Session 实例发出的所有请求之间保持 cookie, 期间使用 urllib3 的 connection pooling 功能。所以如果你向同意主机发送多个请求,底层的 TCP 连接将会被重用,从而带来显著的性能提升。 (参见 HTTP persistent connection).

例如个别网站需要上一次请求的登录认证信息等

复制代码
import requests
from requests.adapters import HTTPAdapter
from urllib3.util import Retry

requests.packages.urllib3.disable_warnings()  # requests 使用 verify=False  取消控制台警告

data_list = [
    {'username': '18866478714', 'password': '18866478714',
     "ip": 'http://admin:admin.@10.22.22.1:2845'},
    {'username': '17109324203', 'password': '17109324203',
     "ip": 'http://admin:admin.@10.22.21.13:2345'},
    {'username': '17109324197', 'password': '17109324197',
     "ip": 'http://admin:admin.@10.33.3.74:2345'}
]


class LoginSession:
    def __init__(self, number):
        self.username = data_list[number]["username"]
        self.password = data_list[number]["password"]
        self.proxies = {"https": data_list[number]["ip"], "http": data_list[number]["ip"]}
        # print(self.username)
        # print(self.password)
        # print(self.proxies)
        self.session = self.loging()

    def loging(self):
        post_url = 'https://Example/Form'
        s = requests.session()
        retry = Retry(connect=5, backoff_factor=1)
        adapter = HTTPAdapter(max_retries=retry)
        s.mount('https://', adapter)
        s.keep_alive = False
        ss = s.post(url=post_url, headers=headers,
                    data={'username': self.username, 'password': self.password},
                    proxies=self.proxies,
                    verify=False
                    )
        print(ss.json())
        return s


if __name__ == '__main__':
request_session_list = [LoginSession(_i) for _i in range(len(data_list))]  # 得到对应对象,使用的时候随机选择使用
复制代码

遇到问题

使用requests下载日志出现HTTPSConnectionPool(host='***', port=443): Max retries exceeded with url: ******(Caused by SSLError(SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:579)'),))

分析是ssl证书报错,解决办法:

requests.packages.urllib3.disable_warnings()  # requests 使用 verify=False  取消控制台警告
复制代码
from requests.adapters import HTTPAdapter
from urllib3.util import Retry


s = requests.session()
retry = Retry(connect=5, backoff_factor=1)  # 增加连接重试次数
adapter = HTTPAdapter(max_retries=retry)  
s.mount('https://', adapter)
s.keep_alive = False  # 关闭多余的连接;requests使用了urllib3库,默认的http connection是keep-alive的,requests设置False关闭。
verify=False  # 不适用ssl认证
posted on 2019-11-11 18:21  骑骡子赶猪  阅读(225)  评论(0编辑  收藏  举报