巧用 Cookie
复用 Cookie:
如果我们使用 Selenium 模拟登录操作,当然是可行的,但是有些登录操作比较复杂,并且现在网站有相当多的登录验证都得人工进行操作才可以(比如图片识别...),用 Selenium 模拟登录通常来说是一个费力不讨好的事情,因为无论多复杂的登录操作,目的就是为了获取得到相应的 Cookie,而 Selenium 是有提供 Cookie 操作的 API 哦,那其实我们完全可以手动进行登录,然后直接从浏览器开发者工具抓取到需要的 Cookie 字符串,设置到 Selenium 中即可。具体代码如下所示:
# 切割字符串,获取每条 Cookie 键值 def str2Cookie(cookieStr): def getCookieInfo(cookie): return cookie.split('=', maxsplit=1) for cookie in cookieStr.split(';'): name, value = getCookieInfo(cookie.strip()) yield {'name': name, 'value': value} def imitateLogin(driver): # 手动抓取的 Cookie 字符串 cookieStr = r'__yadk_uid=vJVlSDVQ4aq4hdF3A0DbFmiTdt76cbOB; _ga=GA1.2.1004530019.1590339033; _gid=GA1.2.1890133014.1595150121; remember_user_token=W1syMjIyOTk3XSwiJDJhJDEwJE9XSC5RdnhmNDRKWkRVZS9rRWtrOC4iLCIxNTk1NzM4MDY5LjMxODAxOTIiXQ%3D%3D--7807e1b7a5480d4883e8884a74c2d18dbffb20d9; read_mode=day; default_font=font2; locale=zh-CN; _m7e_session_core=e0ee38cfe3ad01ef5263a1fc7d8e4a26; Hm_lvt_0c0e9d9b1e7d617b3e6842e85b9fb068=1595234295,1595234452,1595754538,1595785636; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%222222997%22%2C%22%24device_id%22%3A%221724796c82722-05b58554ba041f-d373666-1049088-1724796c828e0%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_utm_source%22%3A%22recommendation%22%2C%22%24latest_utm_medium%22%3A%22seo_notes%22%2C%22%24latest_utm_campaign%22%3A%22maleskine%22%2C%22%24latest_utm_content%22%3A%22user%22%2C%22%24latest_referrer_host%22%3A%22%22%7D%2C%22first_id%22%3A%221724796c82722-05b58554ba041f-d373666-1049088-1724796c828e0%22%7D; Hm_lpvt_0c0e9d9b1e7d617b3e6842e85b9fb068=1595789596' # 必须先访问页面,然后才能操作 Cookie driver.get('https://www.jianshu.com/') # 清空 Cookie driver.delete_all_cookies() # 解析 cookieStr,并添加到 selenium 当前会话的的 Cookie 中 for cookie in str2Cookie(cookieStr): driver.add_cookie(cookie) # 刷新当前页面,使 Cookie 生效 driver.refresh() if __name__ == '__main__': driver = webdriver.Chrome() imitateLogin(driver)
结果如下图所示:
持久化 Cookie:上面内容我们是直接手动获取 Cookie,这种做法可能存在 Cookie 抓取不完全,导致某些页面无法访问。其实更好地做法是对 Cookie 进行持久化,我们只需使用 Selenium 模拟一个登录,然后持久化此时的 Cookie,下次再次登录时,直接加载这些 Cookie,无需进行真实登录操作。具体持久化方法如下所示:
# 持久化 Cookie def saveCookies(cookies,filename='cookies.json'): with open(filename,mode='w',encoding='utf-8') as file: import json file.write(json.dumps(cookies)) # 加载 Cookie def loadCookies(filename='cookies.json'): cookies = None try: with open(filename,mode='r',encoding='utf-8') as file: import json cookies = json.loads(file.read()) except FileNotFoundError: cookies = None except PermissionError: cookies = None return cookies if __name__ == '__main__': driver = webdriver.Chrome() # 需要先访问下网址 driver.get('https://www.jianshu.com/') # 先获取持久化的 Cookie cookies = loadCookies() # Cookie 存在 if cookies is not None: # 清空重置 Cookie driver.delete_all_cookies() # 添加 Cookie for cookie in cookies: driver.add_cookie(cookie) # 刷新一下 driver.refresh() # Cookie 不存在,则进行真实登录 else: # 此处进行真正的登录操作 # 登录成功后,持久化此时的 Cookie saveCookies(driver.get_cookies()) # 现在就是已登录状态