puppeteer无头模式下反反爬配置集合
思路来源于此篇文章,归纳的非常全面,但在具体绕过方案上有些部分讲的并不够明晰,本文着重讲具体如何配置。
因为puppeteer的page.evaluateOnNewDocument在selenium中并无对应功能,所以selenium无法使用相同方案。
const browser = await puppeteer.launch({ignoreDefaultArgs: ["--enable-automation"], headless: true}); //去除自动化测试的提醒 const page = await browser.newPage(); await page.evaluateOnNewDocument(() => { //在每个新页面打开前执行以下脚本 const newProto = navigator.__proto__; delete newProto.webdriver; //删除navigator.webdriver字段 navigator.__proto__ = newProto; window.chrome = {}; //添加window.chrome字段,为增加真实性还需向内部填充一些值 window.chrome.app = {"InstallState":"hehe", "RunningState":"haha", "getDetails":"xixi", "getIsInstalled":"ohno"}; window.chrome.csi = function(){}; window.chrome.loadTimes = function(){}; window.chrome.runtime = function(){}; Object.defineProperty(navigator, 'userAgent', { //userAgent在无头模式下有headless字样,所以需覆写 get: () => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36", }); Object.defineProperty(navigator, 'plugins', { //伪装真实的插件信息 get: () => [{"description": "Portable Document Format", "filename": "internal-pdf-viewer", "length": 1, "name": "Chrome PDF Plugin"}] }); Object.defineProperty(navigator, 'languages', { //添加语言 get: () => ["zh-CN", "zh", "en"], }); const originalQuery = window.navigator.permissions.query; //notification伪装 window.navigator.permissions.query = (parameters) => ( parameters.name === 'notifications' ? Promise.resolve({ state: Notification.permission }) : originalQuery(parameters) ); await page.goto("https://www.aabbccc.com"); //... await browser.close();
该配置足以应付绝大部分针对无头浏览器的检测。